Beauty of Python’s collections module#

Python is famed for embracing duck typing. It is nice idea in a number of ways, allowing functions to work equally well with lists, sets, dictionary keys, or whatever user-defined object if it provides __iter__ method. However sometimes we want to distinguish between types of arguments. Good example is recursive operations on nested structures. Consider function, that recursively deletes all dictionary items with None values.

Here is how it could be written for reduced non-recursive case:

def without_empty(data):
    return {key: value
            for key, value in data.items()
            if value is not None}

Now if we want to remove None values in nested dictionaries, we will add recursion:

def without_empty(data):
    return {key: without_empty(value)
            for key, value in data.items()
            if value is not None}

Obviosly this new function will not work, because if value is not a dict, AttributeError will be raised in call to items method.

So function need some way to distinguish between dictionaries and other values. Naive duck-typing solution could look like this:

def without_empty(data):
    if hasattr(data, 'items'):
        return {key: without_empty(value)
                for key, value in data.items()
                if value is not None}
    else:
        return data

So far so good, checking with nested dictionary gives correct result:

>>> without_empty({1: {2: None, 4: 5}, 3: None})
{1: {4: 5}}

However, what if function has to support lists of dictionaries too? Current version will just ignore None values in lists:

>>> without_empty([{1: None}])
[{1: None}]

Let’s add check for lists:

def without_empty(data):
    if hasattr(data, 'items'):
        return {key: without_empty(value)
                for key, value in data.items()
                if value is not None}
    elif hasattr(data, '__iter__'):
        return [without_empty(item)
                for item in data]
    else:
        return data

Function works fine, giving expected result for complex case:

>>> without_empty({1: [6, {2: None, 7: 8}], 3: None})
{1: [6, {7: 8}]}

But code now is hard to follow and tricky to update. Because second condition is depending on the first one - both dict and list provide __iter__ method. Also first check uses public method items, but second - internal helper __iter__, which looks like hacking.

And here comes collections.abc (Just collections in Python 2) providing number of abstract base classes that can be used to test whether a class provides a particular interface; for example, whether it is hashable or whether it is a sequence.

Here is version of our function, that utilizes abstract classes:

from collections import Mapping, Set, Sequence

def without_empty(data):
    if isinstance(data, Mapping):
        return {key: without_empty(value)
                for key, value in data.items()
                if value is not None}
    elif isinstance(data, (Sequence, Set)):
        return [without_empty(item)
                for item in data]
    else:
        return data

At this point you may have question, how is it possible that built-in dict class is an instance of some deep burried collections.abc.Mapping? The answer is easy: it is not. And that’s why I decided to write about this module.

In Python it is type’s responsiblity to decide if some object instantiates it. The mechanics are descrived in detail in documentation to abc module.

In short abstract base class defines must define method __subclasshook__, which may look something like this for Mapping class:

@classmethod
def __subclasshook__(cls, C):
    if cls is Mapping:
        if any('items' in B.__dict__
               for B in C.__mro__):
            return True
    return NotImplemented

isinstance calls this method and magic happens - object becomes an instance of a class that it never heard of.