Code Challenge 04 - Twitter data analysis Part 1: get the data - Review

Posted by PyBites on Fri 03 February 2017 in Challenges

It's Friday again so we review the code challenge of this week. It's never late to join, just fork our challenges repo and start coding.

A possible solution

See here and detailed below:

  • stdlib imports, pip install tweepy

    from collections import namedtuple
    import csv
    import os
    import tweepy
  • we generated our keys through the Twitter API and put them in

    from config import CONSUMER_KEY, CONSUMER_SECRET
    from config import ACCESS_TOKEN, ACCESS_SECRET
  • we define some constants:

    DEST_DIR = 'data'
    EXT = 'csv'
    NUM_TWEETS = 100
  • we build a tweepy api object. First we had this in the constructor (init), but second thought we set it up as a constant:

    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
    API = tweepy.API(auth)
  • we use a namedtuple to define Tweet:

    Tweet = namedtuple('Tweet', 'id_str created_at text')

    Namedtuples are awesome for simple classes to store data without behaviour!

  • we define the class, Python3 best practice is to explicitly inherit from object:

    class UserTweets(object):
  • the constructor gets the handle and an optional max_id, latter is to get a fixed set of tweets which we used in

        def __init__(self, handle, max_id=None):
            self.handle = handle
            self.max_id = max_id
            self.output_file = '{}.{}'.format(os.path.join(DEST_DIR, self.handle), EXT)
            self._tweets = list(self._get_tweets())
  • we get the tweets with the _get_tweets() helper. It returns a generator of Tweet namedtuple objects containing only the get id_str, created_at and text attributes (you get a lot more returned from the Twitter API!):

        def _get_tweets(self):
            tweets = API.user_timeline(self.handle, count=NUM_TWEETS, max_id=self.max_id)
            return (Tweet(s.id_str, s.created_at, s.text.replace('\n', '')) for s in tweets)
  • the helper _save_tweets saves tweets to a CSV file. We choose to do it in the constructor, but you can of course take the underscore (_) out and call it explicitly: obj.save_tweets():

        def _save_tweets(self):
            with open(self.output_file, 'w') as f:
                writer = csv.writer(f)
  • implementing len and getitem lets you iterate over the tweets (see our data model post) as done in the for loop under __main__:

        def __len__(self):
            return len(self._tweets)
        def __getitem__(self, pos):
            return self._tweets[pos]
    if __name__ == "__main__":
        for handle in ('pybites', 'techmoneykids', 'bbelderbos'):
            print('--- {} ---'.format(handle))
            user = UserTweets(handle)
            for tw in user[:5]:
  • running the tests:

    $ python
    Ran 3 tests in 0.001s

    TODO: twitter data changes and you don't want to call the API (slows tests down, unittests should be fast), need to look at mock ...

Any issues or feedback?

What did you learn this challenge? Feel free to share you code in the comments below.

How are you experiencing these challenges? You like the format? What can we do differently and/or better?


Next week we use this pre-work to load in tweets of various Twitter users and determine who are most similar using NLP techniques. See you on Monday ...

Again to start coding fork our challenges repo or sync it if you already forked it.

>>> next(PyBites)

Get our 'Become a Better Python Developer' cheat sheet sent straight to your inbox by signing up for our Newsletter: