Python’s data model by example

By on 25 January 2017

After last post on OOP a logical follow-up is Python’s data model. We use the great Fluent Python book to code up an example of our own, showing the powerful way you can leverage this data model. You can download the notebook here.

One of the best qualities of Python is its consistency. After working with Python for a while, you are able to start making informed, correct guesses about features that are new to you.

However, if you learned another object-oriented language before Python, you may have found it strange to use len(collection) instead of collection.len(). This apparent oddity is the tip of an iceberg that, when properly understood, is the key to everything we call Pythonic. The iceberg is called the Python data model, and it describes the API that you can use to make your own objects play well with the most idiomatic language features. – Fluent Python

In [1]:

# for simplicity mock up some tweets
from collections import namedtuple
import random

Tweet = namedtuple('Tweet', 'time text likes')

tweets = (
    Tweet('2017-01-25 08:45:00', 'Teaching Python today, feels great', 3),
    Tweet('2017-01-25 09:45:00', 'Writing a post on the Python data model', 2),
    Tweet('2017-01-25 07:45:00', 'from __future__ import braces ... not a chance', 10),
    Tweet('2017-01-25 10:45:00', 'Doing code challenge 03, learning a lot', 5),
    Tweet('2017-01-25 12:45:00', 'Done with code challenge 03', 1),
)
print(tweets)
(Tweet(time='2017-01-25 08:45:00', text='Teaching Python today, feels great', likes=3), Tweet(time='2017-01-25 09:45:00', text='Writing a post on the Python data model', likes=2), Tweet(time='2017-01-25 07:45:00', text='from __future__ import braces ... not a chance', likes=10), Tweet(time='2017-01-25 10:45:00', text='Doing code challenge 03, learning a lot', likes=5), Tweet(time='2017-01-25 12:45:00', text='Done with code challenge 03', likes=1))
In [2]:

# lets make twitter handle a bit more sophisticated, we are going to work with a parent-child relation for this example

class Handle(object):

    def __init__(self, handle, shared_handle=None):
        self.handle = handle
        self.shared_handle = shared_handle

    def __str__(self):
        shared = ''
        if self.shared_handle is None:
            shared = ''
        else:
            shared = ' (shared handle: {})'.format(self.shared_handle)
        return '{}{}'.format(self.handle, shared)
In [3]:

# a TwitterUser has a handle name and a bunch of tweets, note the two dunder methods after the constructor ...

class TwitterUser(object):

    def __init__(self, handle, tweets):
        self.handle = handle
        self._tweets = tweets

    def __len__(self):
        return len(self._tweets)

    def __getitem__(self, position):
        return self._tweets[position]
In [4]:

bob = TwitterUser(
    Handle('bbelderbos', shared_handle='pybites'), tweets
)
In [5]:

# implementing len we can call it on the object like this:

len(bob)
Out[5]:

5
In [6]:

# implementing getitem we can get tweets by index

bob[0]
Out[6]:

Tweet(time='2017-01-25 08:45:00', text='Teaching Python today, feels great', likes=3)
In [7]:

# or with a slice

bob[-2:]
Out[7]:

(Tweet(time='2017-01-25 10:45:00', text='Doing code challenge 03, learning a lot', likes=5),
 Tweet(time='2017-01-25 12:45:00', text='Done with code challenge 03', likes=1))
In [8]:

# wow implementing __getitem__ bob turns into an iterable!

for tw in bob:
    print(tw)
Tweet(time='2017-01-25 08:45:00', text='Teaching Python today, feels great', likes=3)
Tweet(time='2017-01-25 09:45:00', text='Writing a post on the Python data model', likes=2)
Tweet(time='2017-01-25 07:45:00', text='from __future__ import braces ... not a chance', likes=10)
Tweet(time='2017-01-25 10:45:00', text='Doing code challenge 03, learning a lot', likes=5)
Tweet(time='2017-01-25 12:45:00', text='Done with code challenge 03', likes=1)
In [9]:

# and can be passed as a sequence object to other builtins

random.choice(bob)
Out[9]:

Tweet(time='2017-01-25 07:45:00', text='from __future__ import braces ... not a chance', likes=10)
In [10]:

# or give it to sorted so we can use its key arg to sort by most likes
# easter eggs are well received :)

for tw in sorted(bob, key=lambda x: x.likes, reverse=True):
    print(tw)
Tweet(time='2017-01-25 07:45:00', text='from __future__ import braces ... not a chance', likes=10)
Tweet(time='2017-01-25 10:45:00', text='Doing code challenge 03, learning a lot', likes=5)
Tweet(time='2017-01-25 08:45:00', text='Teaching Python today, feels great', likes=3)
Tweet(time='2017-01-25 09:45:00', text='Writing a post on the Python data model', likes=2)
Tweet(time='2017-01-25 12:45:00', text='Done with code challenge 03', likes=1)

 Conclusion

A real eye-opener reading Fluent Python: by only implementing __len__() and __getitem__() we got nice features like slicing and iteration out of the box!

But there is more …

In [11]:

# making another user
tweets2 = (
    Tweet('2017-01-25 10:46:00', 'Writing a blog post on a cool new module I discovered', 5),
    Tweet('2017-01-25 12:46:00', 'Learning some Python today, feeling great', 15),
)
julian = TwitterUser(
    Handle('techmoneykids', shared_handle='pybites'), tweets2
)
In [12]:

# I want to be able to merge tweets, just as we can do with lists: [1] + [2,3] = [1,2,3]
# however this does not work out of the box

bob + julian

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
      2 # however this does not work out of the box
      3
----> 4 bob + julian

TypeError: unsupported operand type(s) for +: 'TwitterUser' and 'TwitterUser'
In [13]:

# we can make this work implementing __add__()

class IncompatibleHandle(Exception):
    pass

class TwitterUser(object):

    def __init__(self, handle, tweets):
        self.handle = handle
        self.tweets = tweets  # making interfacce public as we need it in __add__

    def __len__(self):
        return len(self.tweets)

    def __getitem__(self, position):
        return self.tweets[position]

    def __add__(self, other):
        if self.handle.shared_handle != other.handle.shared_handle:
            raise IncompatibleHandle('Not the same shared handle, cannot merge tweets')
        all_tweets = self.tweets + other.tweets
        return TwitterUser(self.handle.shared_handle, all_tweets)

    # adding object string representation methods
    def __repr__(self):
        return 'TwitterUser(%r, %r)' % (self.handle, self.tweets)

    # difference between the repr and str:
    # http://stackoverflow.com/questions/1436703/difference-between-str-and-repr-in-python
    def __str__(self):
        return '{} => likes: {} for {} tweets = {:.1f} avg'.format(
            self.handle, self.total_likes(),
            len(self), self.avg_likes()
        )

    # adding some public methods to show later on that the merged object behaves just as its parts
    def total_likes(self):
        return sum(tw.likes for tw in self.tweets)

    def avg_likes(self):
        return self.total_likes() / len(self)
In [14]:

# need to create bob, julian again using the rewritten class

bob = TwitterUser(
    Handle('bbelderbos', shared_handle='pybites'), tweets
)
julian = TwitterUser(
    Handle('techmoneykids', shared_handle='pybites'), tweets2
)
# lets also add a not-compatible handle
stranger = TwitterUser(
    Handle('someblogger', shared_handle='stranger'), tweets2
)
In [15]:

# now it works, thanks to __add__

pybites = bob + julian
In [16]:

# our tweets are merged, glad to have Julian most liked tweet now ;)

for tw in sorted(pybites, key=lambda x: x.likes, reverse=True):
    print(tw)
Tweet(time='2017-01-25 12:46:00', text='Learning some Python today, feeling great', likes=15)
Tweet(time='2017-01-25 07:45:00', text='from __future__ import braces ... not a chance', likes=10)
Tweet(time='2017-01-25 10:45:00', text='Doing code challenge 03, learning a lot', likes=5)
Tweet(time='2017-01-25 10:46:00', text='Writing a blog post on a cool new module I discovered', likes=5)
Tweet(time='2017-01-25 08:45:00', text='Teaching Python today, feels great', likes=3)
Tweet(time='2017-01-25 09:45:00', text='Writing a post on the Python data model', likes=2)
Tweet(time='2017-01-25 12:45:00', text='Done with code challenge 03', likes=1)
In [17]:

# but stranger is not part of pybites, so custom exception is raised (as implemented in __add__)

bob + stranger

---------------------------------------------------------------------------
IncompatibleHandle                        Traceback (most recent call last)
 in ()
      1 # but stranger is not part of pybites, so custom exception is raised (as implemented in __add__)
      2
----> 3 bob + stranger

 in __add__(self, other)
     18     def __add__(self, other):
     19         if self.handle.shared_handle != other.handle.shared_handle:
---> 20             raise IncompatibleHandle('Not the same shared handle, cannot merge tweets')
     21         all_tweets = self.tweets + other.tweets
     22         return TwitterUser(self.handle.shared_handle, all_tweets)

IncompatibleHandle: Not the same shared handle, cannot merge tweets
In [18]:

# print on object calls the underlying __str__ which we used to print some stats
# also notes it embeds the __str__ of the Handle object

print(bob)
bbelderbos (shared handle: pybites) => likes: 21 for 5 tweets = 4.2 avg
In [19]:

# Julian is definitely more influential than me :)

print(julian)
techmoneykids (shared handle: pybites) => likes: 20 for 2 tweets = 10.0 avg
In [20]:

# ... improving the average

print(pybites)
pybites => likes: 41 for 7 tweets = 5.9 avg

Want a career as a Python Developer but not sure where to start?