PyBites Module of the Week – Requests-cache for Repeated API Calls

By on 14 March 2017

Today a quick article on a nice caching module when working with APIs: Requests-cache.

I stumbled upon this excellent article by RealPython when looking for a solution to limit API requests. I needed this when I was playing with the Github API to check changes to forks of our Challenges repo (you can also see this in the repo, under Graphs > Network, but I was just playing around).

This is not a script that would typically need caching, because I probably would run it once a week and then it would make just a couple of requests (at this time: ~100 forks / 30 results per call). However when I was coding this up, I did not want to call the API over and over again:

For unauthenticated requests, the rate limit allows you to make up to 60 requests per hour.
Github API documentation

It was also a good exercise to test this module out for a future use case where this does matter.

Using requests_cache

First I thought: lets write the output to a file. However that adds more code. Maybe use a decorator to sleep between requests? However that slows down my coding/testing. As usual somebody already invented the wheel.

Enter Requests-cache. It has an easy / friendly interface:

import requests_cache

requests_cache.install_cache('cache_filename', backend='backend', expire_after=expiration_in_seconds)

Verify with curl

  • Start API rate limit (already did some calls):

    $ curl -i https://api.github.com/users/whatever 2>/dev/null |grep 'X-RateLimit-Remaining:'
    X-RateLimit-Remaining: 42
    
  • First time around: cache result. DB got created. Cost = 6 calls (1x curl, 5x by script)

    $ python commits.py 2>&1 > /dev/null
    $ lt cache.sqlite
    -rw-r--r--  1 bbelderb  staff   516K Mar 14 08:03 cache.sqlite
    $ curl -i https://api.github.com/users/whatever 2>/dev/null |grep 'X-RateLimit-Remaining:'
    X-RateLimit-Remaining: 36
    
  • Second call = cached, cost down to 1 (= curl)

    $ python commits.py 2>&1 > /dev/null
    $ curl -i https://api.github.com/users/whatever 2>/dev/null |grep 'X-RateLimit-Remaining:'
    X-RateLimit-Remaining: 35
    

Keep in mind

Two noteworthy things that were commented on mentioned article:

  • Check the documentation of the API you are working with. Maybe they already provide a way to use caching. In case of the GH API this would be Conditional requests:

    Making a conditional request and receiving a 304 response does not count against your Rate Limit, so we encourage you to use it whenever possible.

    Something to try on the next iteration …

  • You might want to define an output directory for the cache file instead of the default current directory to not end up with multiple files if working from a different folder.

More info

See the module’s documentation for more info.

Have you used this module? And/or what do you use for caching API requests?


Keep Calm and Code in Python!

— Bob

Want a career as a Python Developer but not sure where to start?