How to Parse Common Data Formats in Python

Posted by PyBites on Tue 16 May 2017 in Learning • 3 min read

One of the biggest jumps you make in your Python learning is when you start dealing with external data.

With this post we wanted to demonstrate a few ways you can work with the more common data formats. Why? Because it’s a big deal when you’re starting out! Furthermore, unless you do it often enough it’s easy to forget how so bookmark this baby and reference it!

The links below are to articles and scripts we’ve actually written as well as to external resources we’ve found helpful.


1. CSV

If you’re going to play with CSV files, DictReader is your friend. It converts each row into an OrderedDict (Hallelujah!).

Reading the contents of a CSV file:

Code Link

for entry in csv.DictReader(f, fieldnames=FIELDS):
    yield entry


Opening and reading the CSV using a with statement:

Code Link

def read_csv(cf=CSV_FILE):
    with open(cf, 'r') as csvfile:
        return list(csv.DictReader(csvfile))


2. JSON

JSON is a must these days, especially if you want to work with APIs.

Simple read of JSON data pulled down by requests:

Code Link

data = json.loads(r.text)


One of our first articles used a with statement to load in JSON data:

Article/Code Link

def load_json(json_file):    
    with open(json_file) as f:        
        return json.loads(f.read())


Our Challenge 07 review used yield to return the JSON data:

Article/Code Link

def get_tweets(input_file):
    with open(input_file) as f:
        for line in f.readlines():
            yield json.loads(line)


Note the .json() method on requests.get:

Code Link

data = requests.get(API_URL.format(city, API_KEY)).json()


Resources


3. SQLite

We’ve learned to love SQLite recently and have found ourselves using it all the time. It’s worth picking up as it’s such an easy and great way of getting a persistent DB!

Recent use to convert a CSV of movies to an sqlite DB:

Code Link.


Resources


4. XML

XML! The data format of choice for RSS feeds. Can be a bit troublesome at times but always worth the effort.

Example of using xml.etree.ElementTree to parse the Safari RSS feed:

Code Link - Worth checking out the full code but the gist of it is…

for item in doc.iterfind('channel/item'):
    ...


Using feedparser to pull specific XML tags and add to a list:

Code Link

feed = feedparser.parse(FEED_FILE)
    for entry in feed['entries']:
        Game = (entry['title'], entry['link'])
            games_list.append(Game)


Challenge Solutions

We’ve had numerous challenges over the past few months where the solutions involved these data formats. Here are a few of the noteworthy ones:

Code Challenge 04


Code Challenge 07

Code Challenge 17 Review

This was definitely a great challenge. Check out the multiple community contributions for some examples of using sqlite and XML in functional scripts written by your fellow Pythonistas.


Learn By Doing

Now that you have the info, as we said in our Learn By Doing article, open up a vim session and get coding!

One awesome, shameless plug of a way to do this would be to come up with a solution for Code Challenge 19. Playing with an API means you’ll more than likely need to use quite a few of these formats.

We’d love to hear if you have any Pythonic tips on using these formats too so leave a comment!

And as always, Keep Calm and Code in Python!

-- Julian and Bob

>>> next(PyBites)

Get our 'Become a Better Python Developer' cheat sheet sent straight to your inbox by signing up for our Newsletter: