There is an immense amount to be learned simply by tinkering with things. - Henry Ford
Hey Pythonistas, in this challenge you will learn how to work with PDF documents. Enjoy!
For the NLTK challenge (PCC58/59) we stumbled upon a hurdle: episode 1-150 of Tim Ferriss' transcripts are PDF files. And we're not alone, in the comments somebody stated:
These are much appreciated. I do wonder, however, why they are all not a downloadable PDF and only the first 150. Perhaps just a marketing thing, but it would be nice to be able to grab them all to have an easily searchable database. Ah well, you have to work for what you want!
Challenge accepted! You can try this too or use another data set, it's up to you!
Googling for this challenge we stumbled upon a Pycon proposal: Liberating tabular data from the clutches of PDFs:
Budget Documents are moral documents that represent the priorities and values of the states and its governing bodies. Unfortunately these documents are published in unstructured PDF formats which makes it difficult for researchers, economists and general public to analyse and use this crucial data. In this session will delve into how we can create a data pipeline and leverage computer vision techniques to parse these documents into clean machine-readable formats by leveraging libraries like OpenCV, numpy, pandas, PyPDF2, tabula and poppler-pdf-to-text
Which goes to show that:
If you can't find a use case for data extraction, feel free to do the inverse: generate a nice looking PDF file from a bunch of data sources.
Have fun and use Python!
Last but not least: there is no best solution, only learning more and better Python. Good luck!
At PyBites you get to master Python through Code Challenges:
Subscribe to our blog (sidebar) to periodically get new PyBites Code Challenges (PCCs) in your inbox.
Apart from this blog code challenge we have a growing collection which you can check out on our platform.
Prefer coding bite-sized Python exercises, using effective Test-Driven Learning, and in the comfort of your browser? Try our growing collection of Bites of Py on our platform.
Want to do the #100DaysOfCode but not sure what to work on? Take our course and/or start logging your 100 Days progress using our Progress Grid Feature on our platform (you can also use the Grid to do 100 Bite exercises in 100 days, earning a die hard PyBites Ninja Certificate!)
>>> from pybites import Bob, Julian Keep Calm and Code in Python!
Do you want to get 250+ concise and applicable Python tips in an ebook that will cost you less than 10 bucks (future updates included), check it out here.
"The discussions are succinct yet thorough enough to give you a solid grasp of the particular problem. I just wish I would have had this book when I started learning Python." - Daniel H
"Bob and Julian are the masters at aggregating these small snippets of code that can really make certain aspects of coding easier." - Jesse B
"This is now my favourite first Python go-to reference." - Anthony L
"Do you ever go on one of those cooking websites for a recipe and have to scroll for what feels like an eternity to get to the ingredients and the 4 steps the recipe actually takes? This is the opposite of that." - Sergio S