Today a quick article on a nice caching module when working with APIs: Requests-cache.
I stumbled upon this excellent article by RealPython when looking for a solution to limit API requests. I needed this when I was playing with the Github API to check changes to forks of our Challenges repo (you can also see this in the repo, under Graphs > Network, but I was just playing around).
This is not a script that would typically need caching, because I probably would run it once a week and then it would make just a couple of requests (at this time: ~100 forks / 30 results per call). However when I was coding this up, I did not want to call the API over and over again:
For unauthenticated requests, the rate limit allows you to make up to 60 requests per hour. Github API documentation
It was also a good exercise to test this module out for a future use case where this does matter.
First I thought: lets write the output to a file. However that adds more code. Maybe use a decorator to sleep between requests? However that slows down my coding/testing. As usual somebody already invented the wheel.
Enter Requests-cache. It has an easy / friendly interface:
import requests_cache requests_cache.install_cache('cache_filename', backend='backend', expire_after=expiration_in_seconds)
- where backend has these options.
Verify with curl
Start API rate limit (already did some calls):
$ curl -i https://api.github.com/users/whatever 2>/dev/null |grep 'X-RateLimit-Remaining:' X-RateLimit-Remaining: 42
First time around: cache result. DB got created. Cost = 6 calls (1x curl, 5x by script)
$ python commits.py 2>&1 > /dev/null $ lt cache.sqlite -rw-r--r-- 1 bbelderb staff 516K Mar 14 08:03 cache.sqlite $ curl -i https://api.github.com/users/whatever 2>/dev/null |grep 'X-RateLimit-Remaining:' X-RateLimit-Remaining: 36
Second call = cached, cost down to 1 (= curl)
$ python commits.py 2>&1 > /dev/null $ curl -i https://api.github.com/users/whatever 2>/dev/null |grep 'X-RateLimit-Remaining:' X-RateLimit-Remaining: 35
Keep in mind
Two noteworthy things that were commented on mentioned article:
Check the documentation of the API you are working with. Maybe they already provide a way to use caching. In case of the GH API this would be Conditional requests:
Making a conditional request and receiving a 304 response does not count against your Rate Limit, so we encourage you to use it whenever possible.
Something to try on the next iteration ...
You might want to define an output directory for the cache file instead of the default current directory to not end up with multiple files if working from a different folder.
See the module's documentation for more info.
Have you used this module? And/or what do you use for caching API requests?
Keep Calm and Code in Python!
See an error in this post? Please submit a pull request on Github.