Click here to code!

Code Challenge 03 - PyBites Blog Tag Analysis

Posted by PyBites on Mon 23 January 2017 in Challenges • 2 min read

This week, each one of you has a homework assignment ... - Tyler Durden (Fight club)

Given our RSS feed what tags does PyBites mostly use and which tags should be merged (based on similarity)?

Example output:

$ python

* Top 10 tags:
python               10
learning             7
tips                 6
tricks               5
github               5
cleancode            5
best practices       5
pythonic             4
collections          4
beginners            4

* Similar tags:
game                 games
challenge            challenges
generator            generators

Get ready

Start coding by forking our challenges repo:

$ git clone

If you already forked it sync it:

# assuming using ssh key
$ git remote add upstream [email protected]:pybites/challenges.git 
$ git fetch upstream
# if not on master: 
$ git checkout master 
$ git merge upstream/master

Use one of the templates:

$ cd 03
$ cp
# or:
$ cp
# code

# run the unittests (optional)
$ python
Ran 3 tests in 0.155s


Requirements / steps

  • As we update our blog regularly we provided a recent copy of our feed in the 03 directory: rss.xml. We also provided a copy of tags.html for verification (used by unittests in

  • Both templates provide 3 constants you should use:

    TOP_NUMBER = 10
    RSS_FEED = 'rss.xml'
    SIMILAR = 0.87
  • Rest is documented in the methods docstrings. Again use if you need more guidance, is for the more experienced and/or if you want more freedom. Same goes for tests: use them if you need them.

  • Talking about freedom feel free to use our live feed but then the tests will probably break.

  • Hint: for word similarity feel free to use NLTK, or your favorite language processing tool. However, stdlib does provide a nice way to do this. Using this method we came to 0.87 as a threshold to for example not mark 'python' and 'pythonic' as similar.

Good luck!

Remember: there is no best solution, only learning more and better Python.

Enjoy and we're looking forward reviewing on Friday all the cool / creative / Pythonic stuff you come up with.

Have fun!

Again to start coding fork our challenges repo or sync it.

About PyBites Code Challenges

More background in our first challenge article.

See an error in this post? Please submit a pull request on Github.

Join our community and grab our Become a Better Python Developer cheat sheet. Learn Python. Receive bonus material. Challenge yourself!