Software Engineering
python multiple-inheritance mixins
Updated Fri, 22 Jul 2022 19:19:25 GMT

Python multiple inheritance or decorators for composable behaviours


I recently discovered (or rather realised how to use) Python's multiple inheritance, and am afraid I'm now using it in cases where it's not a good fit. I want to have some starting data source (NewsCacheDB,TwitterStream) that gets transformed in various ways (Vectorize,SelectKBest,SelectPercentile).

I found myself writing the following sort of code (Example 1) (the actual code is a bit more complex but the idea is the same). The point being that for ExperimentA and ExperimentB I can define exactly what self.data is, by just relying on class inheritance. Is this really a useful way of achieving the desired behaviour?

I could also use decorators (Example 2). Using the decorators would be less code.

Which approach is preferable? I'm not looking for arguments of the "I like writing decorators better" kind, but rather arguments about

  1. readability
  2. maintainability
  3. testability
  4. pythonicity (yes it's a word).

EXAMPLE 1

class NewsCacheDB(object):
    """Play back cached news articles from a database""" 
    def __init__(self):
        super(NewsArticleCache, self).__init__()
    @property
    def data(self):
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here
class TwitterCacheDB(object):
    """Play back cached tweets from a database""" 
    def __init__(self):
        super(TwitterCache, self).__init__()
    @property
    def data(self):
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here
class TwitterStream(object):
    def __init__(self):
        super(TwitterStream, self).__init__()
    @property
    def data(self):
        # setup access to live twitter stream
        while stream.isalive():
            yield stream.next()
class Vectorize(object):
    """Turn raw data into numpy vectors"""
    def __init__(self):
        super(Vectorize, self).__init__()
    @property
    def data(self):
        for item in super(Vectorize, self).data:
            transformed = vectorize(item) # slight simplification here
            yield transformed
class SelectKBest(object):
    """Select K best features based on some metric"""
    def __init__(self):
        super(SelectKBest, self).__init__()
    @property
    def data(self):
        for item in super(SelectKBest, self).data:
            transformed = select_kbest(item)  # slight simplification here
            yield transformed
class SelectPercentile(object):
    """Select the top X percentile features based on some metric"""
    def __init__(self):
        super(SelectPercentile, self).__init__()
    @property
    def data(self):
        for item in super(SelectPercentile, self).data:
            transformed = select_kbest(item)  # slight simplification here
            yield transformed
class ExperimentA(SelectKBest, Vectorize, TwitterCacheDB):
    # lots of control code goes here
class ExperimentB(SelectKBest, Vectorize, NewsCacheDB):
    # lots of control code goes here
class ExperimentC(SelectPercentile, Vectorize, NewsCacheDB):
    # lots of control code goes here

EXAMPLE 2

def multiply(fn):
    def wrapped(self):
        return fn(self) * 2
    return wrapped
def twitter_cacheDB(fn):
    def wrapped(self):
        user, pass = fn(self)
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here
    return wrapped
def twitter_live(fn):
    def wrapped(self):
        user, pass = fn(self)
        # setup access to data base
        while stream.isalive():
            yield stream.next() # slight simplification here
    return wrapped
def news_cacheDB(fn):
    def wrapped(self):
        user, pass = fn(self)
        # setup access to data base
        while db.isalive():
            yield db.next() # slight simplification here
    return wrapped
def vectorize(fn):
    def wrapped(self):
        for item in fn():
            transformed = do_vectorize(item)  # slight simplification here
            yield transformed
    yield wrapped
def select_kbest(fn):
    def wrapped(self):
        for item in fn():
            transformed = do_selection(item)  # slight simplification here
            yield transformed
    yield wrapped
class ExperimentA():
    @property
    @select_kbest
    @vectorize
    @twitter_cacheDB
    def a(self):
        return 'me','123' # return user and pass to connect to DB
class ExperimentB():
    @property
    @select_kbest
    @vectorize
    @news_cacheDB
    def a(self):
        return 'me','123' # return user and pass to connect to DB



Solution

Less code, as long as it's readable is better than more code

From a code size point of view I always go with the solution that requires the least amount of code that is still readable and maintainable. Less code means less chance for defects and less code to maintain.

Multiple Inheritance is not a good choice for Composition

From a design stand point I would not use multiple inheritance the way you describe for the following reasons:

  • attribute/method overloading

You are changing the way data is behaving in the different classes. While it doesn't directly violate the Open/Closed Principle of OO with the initial implementation, any changes in the future have a good chance of modifying the behaviors in one or more locations. You are also relying on behavior pulled through super which will only works correctly if you have the base classes ordered correctly in the class definition.

  • fragile tight (vertical) coupling

Relying on the class definition to specify the correct ordering of classes create a fragile system. It's fragile because you can't choose classes that have particular interfaces defined, you actually have to know the implemented logic so the super calls get executed in the correct order. It's also an extremely tight coupling as a result. Since it's using class inheritance we also get vertical coupling which basically means there are implicit dependencies not just in individual methods, but potentially between the different layers (classes).

  • multiple inheritance pitfalls

Multiple inheritance in any language often has many pitfalls. Python does some work to fix some issues with inheritance, however there are numerous ways of unintentionally confusing the method resolution order (mro) of classes. These pitfalls always exist, and they are also a prime reason to avoid using multiple inheritance.

Alternatives

Alternatively I would leave data source specific logic in the classes (ie. *_CacheDB). Then use either decorator or functional composition to add the generalized logic to automatically apply the transformations.





Comments (1)

  • +0 – Your suggested solution of having the different data sources as MixIn classes and have the data transformations be decorators is exactly what I ended up implementing. The main reason being that it makes it a bit more explicit what happens to the data, and makes it easier to initialise the different data transformers (parametric decorators instead of a huge number of constructor parameters). — Apr 09, 2013 at 15:40