Personal tools
Play these (Python) strings until my fingers are raw

What Do You Want from Me

Sep 10, 2014

Play these (Python) strings until my fingers are raw

This blog post is about the tricky task of subclassing immutable types in Python. Once you get it right, you will end up with superpowered objects

In one project I had to subclass the Python string type (namely str) in order to get some additional features.

Why I decided to do that?

Because I needed something:

  • supporting almost all the methods of the standard strings
  • with some custom attributes, additional methods
  • that could be compared and mixed with strings.

I had almost no choice. But subclassing str is a task that should be handled with special care because it is a so called immutable type.

I will show how to achieve this with a couple of examples.

Example 1: a lowercase string

Let's consider a simple, but very helpful in many circumstances, use case: the implementation of a "lowercase string" type.

To create a similar object, a developer could write something like that:

class BrokenLowerCaseString(str):
    ''' This is going to fail!
    '''
    def __init__(self, value):
        ''' Return a string instance
        '''
        value = str(value).lower()
        str.__init__(self, value)

This code is going to silently fail in Python 2:

>>> BrokenLowerCaseString('Alice')
'Alice'

Even if the code runs smoothly, the string case is not lowered at all.

In Python 3 it will not even run:

>>> BrokenLowerCaseString('Alice')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "test.py", line 27, in __init__
    str.__init__(self, value)
TypeError: object.__init__() takes no parameters

The right way to implement this type is to override the __new__ operator instead of the __init__ one.

This is generally true for all the immutable types [1].

class LowerCaseString(str):
    ''' Provides an object that is like a string
    but that will always be converted to lowercase
    '''
    def __new__(cls, value):
        ''' Return a string instance
        '''
        value = str(value).lower()
        return str.__new__(cls, value)

This time we got the expected result:

>>> LowerCaseString('Alice')
'alice'

This latter class is working because the __new__ operator returns a new instance of a string object created with an already lowered string! The __init__ method, instead, pretends to modify an already created immutable instance.

Once we have got this concept we can give more "superpowers" to our subclassed types.

Example 2: an email string

The next example shows a simple Email type, a string with:

  • a constraint
  • new attributes
  • a property.
class Email(str):
    ''' Provides an object that is like a string
    but with additional attributes
    '''
    @staticmethod
    def _is_valid(value):
        ''' Very simple validation
        '''
        return '@' in str(value)

    def __new__(cls, value, firstname='', lastname=''):
        ''' Return a string instance
        '''
        if not cls._is_valid(value):
            raise ValueError(value)
        return str.__new__(cls, value)

    def __init__(self, value, firstname='', lastname=''):
        ''' Add some attributes to the instance
        '''
        self.firstname = str(firstname)
        self.lastname = str(lastname)

    @property
    def fullname(self):
        ''' This property returns the name of the string
        '''
        return " ".join((self.firstname, self.lastname))

The static method _is_valid accepts only objects that contain a '@' in it, so we can pass any type of object to the constructor:

>>> Email(['@'])
"['@']"

Of course the validator could be improved, but for this post it is enough that it raises an error on invalid strings:

>>> Email('Sample string')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "test.py", line 114, in __new__
    raise ValueError(value)
ValueError: Sample string

I will now construct an Email instance:

>>> alice_email = Email('alice.burton@example.com', 'Alice', 'Burton')

This instance is also an instance of a string:

>>> isinstance(alice_email, str)
True

Email instances have a property called fullname:

>>> alice_email.fullname
'Alice Burton'

And the additional attributes can be modified:

>>> alice_email.lastname = 'Cooper'
>>> alice_email.fullname
'Alice Cooper'

There are some things to bear in mind when a similar operation is done. The subclassed object compares perfectly to a string:

>>> alice_email == 'alice.burton@example.com'
True

The firstname and lastname attributes, in fact, are not taken into account during comparison:

>>> alice_email_noname = Email(u'alice.burton@example.com')
>>> alice_email_noname.fullname == alice_email.fullname
False
>>> alice_email_noname == alice_email
True

If you want to compare also the custom attributes, you should implement a custom __cmp__ method [2]. This is generally true when subclassing.

Conclusions and prospects

Subclassing strings (and other immutable types) has to be done in a peculiar way, but when you have to do it, this can give you a lot of power and functionality with very little amount of code. On the next post I will show you production code released on GitHub and pypi showing that, the same technique, applied to the int type, leads to a very elegant and simple solution for a complex problem.

Footnotes

[1] See this document for further details: http://python-history.blogspot.it/2010/06/inside-story-on-new-style-classes.html
[2] See this in the Python data model documentation: https://docs.python.org/2/reference/datamodel.html#object.__cmp__

Credits

The picture of David Gilmour playing strings is taken from wikimedia.

The post title is inspired by the Divison bell song "What Do You want from me".

Filed under: , ,
comments powered by Disqus