There is an immense amount to be learned simply by tinkering with things. - Henry Ford
It's time for another code challenge! In this two part challenge we're going to do some natural language processing on podcast transcript data. Prepare to have fun expanding your data science skills!
Here are the steps you would follow:
- Pick your favorite podcast, make sure it is one with quite some data (episodes) and transcripts.
- Make a script to get all the transcripts. As this could involve retrieving data from Github (Talk Python), feed parsing / web crawling, even extracting data from PDFs (Tim Ferriss Show - episodes 1-150), we decided to split this into two challenges.
- Store the results somewhere, for example an (sqlite) database.
- This week you can PR this prep work via this link
- Make a virtual environment and pip install NLTK / Natural Language Toolkit.
- Read up on how to use the library.
- From here on we leave you totally free to find the patterns in the data that you are interested in: sentiments, book recommendations, you name it.
- Show the results in a notebook or in any way you like.
- PR link to be released next Monday ...
Good luck and have fun!
Ideas and feedback
Last but not least: there is no best solution, only learning more and better Python. Good luck!
Become a Python Ninja
At PyBites you get to master Python through Code Challenges:
Subscribe to our blog (sidebar) to get a new PyBites Code Challenge (PCC) in your inbox every week.
Apart from this blog code challenge we have a growing collection of 50+, check them out on our platform.
Prefer coding bite-sized Python exercises in the comfort of your browser? Try our growing collection of Bites of Py.
>>> from pybites import Bob, Julian Keep Calm and Code in Python!
See an error in this post? Please submit a pull request on Github.