There is an immense amount to be learned simply by tinkering with things. - Henry Ford
It's time for another code challenge! In this two part challenge we're going to do some natural language processing on podcast transcript data. Prepare to have fun expanding your data science skills!
Here are the steps you would follow:
- Pick your favorite podcast, make sure it is one with quite some data (episodes) and transcripts.
- Make a script to get all the transcripts. As this could involve retrieving data from Github (Talk Python), feed parsing / web crawling, even extracting data from PDFs (Tim Ferriss Show - episodes 1-150), we decided to split this into two challenges.
- Store the results somewhere, for example an (sqlite) database.
- This week you can PR this prep work via this link
- Make a virtual environment and pip install NLTK / Natural Language Toolkit.
- Read up on how to use the library.
- From here on we leave you totally free to find the patterns in the data that you are interested in: sentiments, book recommendations, you name it.
- Show the results in a notebook or in any way you like.
- PR link to be released next Monday ...
Good luck and have fun!
Ideas and feedback
If you have ideas for a future challenge or find any issues, open a GH Issue or reach out via Twitter, Slack or Email.
Last but not least: there is no best solution, only learning more and better Python. Good luck!
Become a Python Ninja
At PyBites you get to master Python through Code Challenges:
Subscribe to our blog (sidebar) to get new PyBites Code Challenges (PCCs) in your inbox.
Apart from this blog code challenge we have a growing collection of 50+, check them out on our platform.
Prefer coding bite-sized Python exercises in the comfort of your browser? Try our growing collection of Bites of Py.
Want to do the #100DaysOfCode but not sure what to work on? Take our course and/or start logging your 100 Days progress using our Progress Grid Feature on our platform.
>>> from pybites import Bob, Julian
Keep Calm and Code in Python!