Posted by Bob on Tue 27 December 2016 in Tools • 2 min read

I was looking at an effective way to organize my Kindle highlights. I started looking at parsing the Kindle's My Clippings.txt file. However I had not much luck with existing PyPi modules and it is a bit cumbersome to always have to manually copy it via USB cable.

Starting point: Cloud + Bookcision

Then I found a much better starting point: = cloud. OK, this only works for Kindle purchased books, but using Amazon's Whispersync really makes this convenient. Also, the Kindle site lets you filter / adjust your highlights and notes before exporting.

For export I use the nice Bookcision JS bookmarklet which - when used in Chrome - gives you the ability to dowload the highlights JSON format.


I wrote a script to convert the Bookcision JSON download into a static HTML page (for blog use, inspired by Sivers).

Code is here.

Some things to note:

  • Use json.loads(fh) to convert JSON into dict:

    def load_json(json_file):
        with open(json_file) as f:
            return json.loads(
  • Template strings: in PAGE defines the whole page, I use embedded CSS to make this a standalone solution. QUOTE defines a list item (highlight). Variables are defined with $ so: $title, $author, etc. In the main script I can substitute these variable placeholders with a dict:

    def get_highlights(highlights):
        for hl in highlights:
            yield QUOTE.safe_substitute({
                'text' : hl['text'],
                'note' : ' / note: ' + hl['note'] if hl['note'] else '',
                'url' : hl['location']['url'],
                'location': hl['location']['value'],

Note the 'yield' makes get_highlights() a generator. If this is new, check out this SO thread about Iterables -> Generators -> Yield [1]

  • Use list() to consume all generator's values in one go:

    highlights = get_highlights(content['highlights'])
        'content': '\n'.join(list(highlights)),
  • You can give the script one or more JSON files simply by using a slice on sys.argv:

    for json_file in sys.argv[1:]:
  • So you can batch process JSON downloads:

    $ ls *json
    anything-you-want.json  arnold.json     choose-yourself.json    the-circle.json
    $ python *json
    anything-you-want.html created
    arnold.html created
    choose-yourself.html created
    the-circle.html created


Here is what an output looks like:

resulting html page

As the HTML contains everything you can just copy it to your blog, example.

Keep Calm and Code in Python!

-- Bob

[1] Generators save memory by not materializing the values of an iterable in memory = better performance. Here we don't really need that, yet I stil find the yield syntax more elegant (it's shorter) than building and returning a local collection (list).

