How to tame your Python codebase

You start out really small, perhaps a Proof of Concept, a small app or data engineering pipeline. Or you want to go full Domain Driven Design, with all the bells and whistles? Sooner or later you will reach the point where you realise: I’ve created a mess, or at least, contributed to it. You change one thing here and ten other things start failing over there. Welcome to the Big Ball of Mud.

And time for one of our core principles:

Global organisation, local chaos.

Your project has undergone multiple epochs of evolutionary growth, by you and developers before you. You added features. Many, many, features, but nobody looked at the whole thing. At this stage two things are evident:

You need to “grow wings”, get into bird’s eye view-mode and draw boundaries, aka define separate modules.
Now that you know your boundaries, you have to make sure the boundaries are respected.

Point 1 you most likely cannot learn from a blog post, but point 2 is definitively something we can tackle here.

Chaotic code bases are difficult to tame, and once tamed, difficult to keep in that state. As example we’ll take one guideline of Domain Driven Design (or DDD in short):

the domain model has no dependencies (except the most basic and essential ones)

You can see the domain model as “module” containing all business logic. Spreading your business logic across the modules or layers of an application is, in 99.9% of the cases, a recipe for disaster. Conceptually a rule to protect our domain module could be written like this:

for each file in the domain model module
forbid imports from other modules
but allow imports from the domain module itself

The test to validate something like this could be written in this way:

from pytest_archon import archrule

def test_domain_model():
    (
        archrule("protect the domain model")
        .match("app.domain_model*")        # (1)
        .should_not_import("app*")         # (2)
        .may_import("app.domain_model*")   # (3)
        .check("app")
    )

The test above is already totally valid pytest-archon code. So what is pytest-archon? It tries to help you with the question:

How can I codify the boundaries by which I develop and extend my application?

So it is a pytest plugin that helps you define (architectural) rules (archon means ruler, but it also sounds a bit like the arch in architecture) for your application. We created it at one of our innovation days at Xebia. Architecture rules are defined in simple Pytest test cases and can run as part of a CI/CD pipeline. It scratches our own itch: as consultants we know right at the start of an assignment that our time will be limited. How can we still ensure that our initiatives and efforts towards code quality stay, even after we leave?

Guard your architecture

Traditionally Python code bases are not concerned a lot with architectural questions. Most applications are using an already opinionated framework, such as Django or FastAPI, or don’t have the size to reap the benefits of a clear architecture. Hence, minimal effort is put into architecture. But if you grow, you will reach a tipping point at which you benefit greatly from paying attention to architectural concerns.

Figure 1: (A) A simple architecture for a python web or CLI app. (B) For larger apps, it makes sense to create a service layer as abstraction between the web or CLI framework. (C) This pattern makes it possible to grow the app even further into multi-module architectures.

A common approach is the Model-View-Controller design pattern (figure 1A). As logical next step, you might want to add a service layer that serves as an abstraction layer for your domain model and database parts (figure 1B). Consider the service layer-based setup as building block of a scaffold for your app (figure 1C). Which architecture you choose and what fits for your application is highly context dependent. For pytest-archon it does not matter what you choose, only what to guard: the dependencies (red arrows) and the absence of dependencies (invisible arrows 😉). How does it look in pytest-archon? To make it concrete imagine you are building an app to book flight tickets, with order, price_calculation and reservation modules.

src
└── flight_ticket
    ├── common
    ├── order
    │   ├── data
    │   └── domain
    ├── price_calculation
    │   ├── data
    │   └── domain
    └── reservation
        ├── data
        └── domain

For architecture A (figure 1), you only need to make sure that

the domain model does not depend on other modules of the app.

def test_fig1a():
    (
        archrule("fig 1a: domain model has no dependencies")
        .match("flight_ticket.order.domain*")
        .should_not_import("flight_ticket*")
        .may_import("flight_ticket.order.domain*")
        .check("flight_ticket")
    )

For architecture B (figure 1), you need to make sure that

the controller, CLI or other modules only interact with the service layer
the domain model does not depend on other modules of the app


def test_fig1b1():
    (
        archrule("fig 1b (1): other modules only uses service level")
        .match("flight_ticket*")
        .exclude("flight_ticket.order.*")
        .may_import("flight_ticket.order")
        .should_not_import("flight_ticket.order.*")
        .check("flight_ticket")
    )

This rule deserves some explanation: Target is the order module. We exclude all sub-modules flight_ticket.order.*, because they need to import each other. Everybody else is allowed to import the main module flight_ticket.order (which contains the API/service), but not any sub-modules flight_ticket.order.*.

And option two is already outlined in “architecture A”.

For architecture C (figure 1), you need to make sure that

for every module: (a) the controller (or CLI) only interacts with the service layer (b) the domain model does not depend on other modules of the app
modules do not depend on each other

Here we will only sketch the solution, the implementation is left as exercise for the reader. The idea is simple: iterate through every module, and apply the same architectural rules.

@pytest.mark.parametrize("module", ['order', 'price_calculation', 'reservation'])
def test_fig1c(module):
    (
        archrule("domain model has no dependencies")
        .match(f"flight_ticket.{module}.domain*")
        .should_not_import("flight_ticket*")
        .may_import(f"flight_ticket.{module}.domain*")
        .check("flight_ticket")
    )

Depending on your app structure, you could tackle option two either by

make sure that only the controller or app imports a module or
select a module A and check if other modules B, C import module A. Then take the next module B and check if A or C import B, etc.

Side Note In case you ask yourself: how the hell should I make sure that the database only uses the domain model, but not vice versa, you can get inspiration from the cosmic python book (ORM depends on model and the repository pattern). Depending on your preference, you could also split the repository definition into interface, which goes into the domain module, and an implementation of the interface, which resides in the data or db module. The domain model would then exclusively uses the interface. The app can instantiate an implementation and supply it to the domain model as argument, effectively decoupling domain and data layer.

Architecture has two natural enemies: laziness and architecture astronauts

The rules above are pretty global, on purpose. We just want to define a few rules, the boundaries, in our application. Just enough to keep its architecture clear and avoid surprises.

Additionally, these rules can help when you want to use your app as library, too. If you think: a command-line interface (CLI) would be a nice to have, but I don’t want to start all the web-server machinery to just run a simple command, python-archon can help you. You build rules to prevent importing web-server-related code for your CLI.

To return to the title of this section:

please don’t overdo it. pytest-archon is a crash barrier!

Think about pytest-archon as guard rails for (or against?) driving down the cliff. Don’t fall into the the trap of trying to nail down every aspect of your code/module structure.

The opposite of too much restriction is laziness: you find a bug in the domain module. Easy to solve, you think: simply reuse a function from the database module. However, the database module contains specific code for database communication and is not supposed to be imported everywhere. The quick hack of importing the database module in the domain module would make the logic dependent on the database. You’ll only need a few of those “quick fixes” for your code base to become a mess.

Conclusion

pytest-archon is a convenient way to write architectural boundaries, simply in Python. No need to learn a special syntax. No YAML files that live out of reach for formatting and linting. You can guide new developers into the right direction and keep your laziness at bay.

pytest-archon can be found in the Python Package Index (PyPI). Sources are on Github. If you find an issue, please tell us. If you like it, tell others 😊.

(You can find this post also on the personal blog of Joachim Bargsten)

Photo by Amiya Chaturvedi on Unsplash

How to tame your Python codebase

Guard your architecture

Architecture has two natural enemies: laziness and architecture astronauts

Conclusion

Get in touch with us to learn more about the subject and related solutions

Explore related posts