Hexagonal Architecture in Python using Flask and SqlAlchemy
The source code for the project described in this post is available on Github. It's a basic JSON API that could power a blog, storing posts and associated metadata in a Postgres database.
Hey, I'm Alex. I've been a software engineer for about 5 years now at a few tech companies of various sizes. I'm a full-stack engineer, but my primary experience is backend development with Python and Django. I'm passionate about writing high-quality, maintainable code, and I enjoy learning about software architecture and comparing and contrasting different ways to write programs. Today, I'd like to challenge some common practices in the Python community and propose an alternative.
The problem with typical Python architectures
Python is a language that's known for several good backend web frameworks - Django and Flask being the most popular among them. Django is complex and full-featured, with a builtin admin interface, request routing, an ORM, a template engine, and much more. Flask is significantly more lightweight but is often paired with other libraries (notably the SqlAlchemy ORM) to provide a similar experience.
After 3 years working on a large Django app with a team of engineers that grew from 10 to 200+, I've seen firsthand just how crazy things can get. 1000-line methods, slow API endpoints that are nearly impossible to profile and improve, and a test suite that takes over a minute just to start up and run a single test. These issues aren't necessarily caused by the framework itself, but your choice of technology certainly guides the outcome of your codebase and I've found that many of these web frameworks often point you in the wrong direction right out of the gate.
In particular, some things that I think they get wrong:
- ORMs don't adequately separate data and behavior. They make it too easy to access the database from anywhere in your codebase - with simple "dot" access, you can issue a complex query that slows down your whole application and you might not even realize it! All of a sudden, your view code, your model code, and even helper functions are making database calls that eventually cause a bottleneck.
- Most frameworks don't give you good guidance as to where the bulk of your code should actually go. Once you have a non-trivial app, your views start to get really big. There are different schools of thought on how to grow your application (see: Fat Models and Service Objects) but the Python community doesn't seem to discuss them often, and the frameworks themselves don't mention much about architecture at all in their documentation.
- Some generally bad conventions. One that comes to mind is overriding ORM methods and hooks in Django. When calling
save()on an object can potentially mutate it or have side effects on other models, and you can also have a post_save hook with even more behavior that no one could figure out where else to put, it's easy to write confusing, messy code. Obviously, no one sets out to write something poorly, but when you're working with a large team it's easy to add a couple lines of code in a place that seems right and realize later that it's really hard to understand.
Enter Hexagonal Architecture
Hexagonal Architecture, also known by the name of “ports and adapters”, is a method of application design that ensures separation of concerns between different parts of your software. It works best for web applications but I’m sure you could adapt the same principles for any type of program.
At the core, the idea is that there are “ports”, or interfaces into your application, and “adapters” which interface between your code and any downstream dependencies. Some common ports would be things like a JSON API or web interface, or even the test runner! Adapters would be things like your database or cache, or any third-party APIs you are depending on. The actual bulk of your application is called the domain layer, and it’s where your business logic lives.
What does this look like in practice? To ensure the clean division of the different modules, the domain layer doesn’t import from any of the other parts of the application. Logically, this makes sense - your business shouldn’t depend on whether it’s being served as a web or mobile application, or whether you decide to use Postgres or MySQL. The clean separation helped redefine (for me) what “business logic” actually is: it’s the part of the application that dictates what your program should actually do. In the past, I’ve heard that term applied to pretty much any kind of backend code, but now when I look at the domain layer it reads more like something that I could show to a non-programmer and they would understand what’s going on.
Pros and Cons of Hexagonal Architecture
There are many benefits to this approach, but one of the main ones is testability. Now, when you’re testing your API code (i.e. parsing requests and serializing responses), you won’t be hitting the database. This both ensures you’re testing what you actually mean to as well as speeds up your tests dramatically. When running a setup like this on a microservice in production, our entire test suite (a few hundred tests - it was a small service) ran in around 10-20ms. As the application grows, this benefit becomes huge! It’s now realistic to run your test suite every time you save, versus if your tests take a minute or more to run it just becomes another onerous step to complete before you open a pull request.
The main downside to hexagonal architecture is the amount of boilerplate code you end up writing, especially for simple programs. A basic CRUD endpoint goes from a single line of code to dozens of lines across 3 files and 4 separate tests (API layer, domain layer, database, and an integration test) that aren’t very meaningful as they’re effectively testing a single line of code that calls one of the other modules. If you’re building an MVP and time to market is the most important factor in the success of your business, then maybe this approach isn’t for you. However, I think there are some good takeaways regardless.
For this example, I chose to use Flask and SqlAlchemy Core (just the DB API, not the ORM). I used Alembic for migrations and
inject (a lesser-known but great library!) for dependency injection. Because of the way the application is laid out, I didn't end up using any of the connector libraries (like
flask-sqlalchemy) that help combine these libraries into a full-featured framework. But that's kinda the point of the whole exercise - maybe the part of your application that handles web requests shouldn't know the intimate details of where your data is being stored!
I also used the
typing library available in Python 3.5+ (and earlier versions through type comments) as well as
mypy for static type analysis. I leveraged dataclasses to model the data stored in the database. They replaced ORM objects in my application with the added benefit of removing easy access to the database.
Challenges I faced
The main challenge I faced getting the project finished was working with the
typing library. Python's type hinting system is still very new, and mypy is very slow and support for third-party libraries isn't really there yet.
In addition, it can be hard to find good guidance and documentation online for things like SqlAlchemy Core. Most people are using the ORM, so the vast majority of information that's available caters to questions about that. In general, taking an approach that isn't commonly adopted in the community can make it difficult to get help if you get stuck.
One criticism of this approach is that it might be seen as not Pythonic, but rather better fit for a language like Java or Go. That's probably true overall, but I think the issues given in the introduction deserve to be addressed regardless of programming language. The popularity of dynamically-typed languages like Python for writing backend code makes good conventions even more important, as the language itself provides less safety.
Final thoughts and next steps
Even if you don't adopt this exact architecture for your next project, think hard about the separation of all the different functions of your application. It takes a little bit more time and effort up front, but it more than pays off in the long run. It's a great feeling working on a codebase that's written in a way that's easy to test. After experiencing real confidence that your changes are working as expected and not breaking any existing functionality, you'll never want to go back!
If you're interested in discussing Hexagonal Architecture, testing, or even Python in general, please leave a comment or shoot me an email! I'm also available for consulting, code/architecture review or hire :)