One of the biggest jumps you make in your Python learning is when you start dealing with external data.
With this post we wanted to demonstrate a few ways you can work with the more common data formats. Why? Because it’s a big deal when you’re starting out! Furthermore, unless you do it often enough it’s easy to forget how so bookmark this baby and reference it!
The links below are to articles and scripts we’ve actually written as well as to external resources we’ve found helpful.
1. CSV
If you’re going to play with CSV files, DictReader
is your friend. It converts each row into an OrderedDict
(Hallelujah!).
Reading the contents of a CSV file:
for entry in csv.DictReader(f, fieldnames=FIELDS):
yield entry
Opening and reading the CSV using a with
statement:
def read_csv(cf=CSV_FILE):
with open(cf, 'r') as csvfile:
return list(csv.DictReader(csvfile))
2. JSON
JSON is a must these days, especially if you want to work with APIs.
Simple read of JSON data pulled down by requests
:
data = json.loads(r.text)
One of our first articles used a with
statement to load in JSON data:
def load_json(json_file):
with open(json_file) as f:
return json.loads(f.read())
Our Challenge 07 review used yield
to return the JSON data:
def get_tweets(input_file):
with open(input_file) as f:
for line in f.readlines():
yield json.loads(line)
Note the .json()
method on requests.get
:
data = requests.get(API_URL.format(city, API_KEY)).json()
Resources
- You can use
dump
to write to a file as per this Stack Overflow question.
3. SQLite
We’ve learned to love SQLite recently and have found ourselves using it all the time. It’s worth picking up as it’s such an easy and great way of getting a persistent DB!
Recent use to convert a CSV of movies to an sqlite
DB:
Resources
-
This Python Cookbook chapter details working with Relational Databases (Amazon Link).
-
We enjoyed this thorough
sqlite
Python tutorial by Sebastian Raschka too.
4. XML
XML! The data format of choice for RSS feeds. Can be a bit troublesome at times but always worth the effort.
Example of using xml.etree.ElementTree
to parse the Safari RSS feed:
Code Link – Worth checking out the full code but the gist of it is…
for item in doc.iterfind('channel/item'):
...
Using feedparser
to pull specific XML tags and add to a list:
feed = feedparser.parse(FEED_FILE)
for entry in feed['entries']:
Game = (entry['title'], entry['link'])
games_list.append(Game)
Challenge Solutions
We’ve had numerous challenges over the past few months where the solutions involved these data formats. Here are a few of the noteworthy ones:
This was definitely a great challenge. Check out the multiple community contributions for some examples of using sqlite
and XML in functional scripts written by your fellow Pythonistas.
Learn By Doing
Now that you have the info, as we said in our Learn By Doing article, open up a vim session and get coding!
One awesome, shameless plug of a way to do this would be to come up with a solution for Code Challenge 19. Playing with an API means you’ll more than likely need to use quite a few of these formats.
We’d love to hear if you have any Pythonic tips on using these formats too so leave a comment!
And as always, Keep Calm and Code in Python!
— Julian and Bob