May 11, 2020

Avoid Direct Access to Models in Django Apps

A computer program in its simplest form is a set of data and a collection of transformations on that data. A program might be as simple as getting two integers and adding them together, or as complex as a search engine for the Web.

When we write programs, we think in more abstract terms. We might have a person object with name and age, not just a string and an integer. We like to write our programs in a way that our source code shows what to accomplish instead of how to accomplish it. This allows us to mostly forget about where our data lives and how to keep it sound.

Suppose we want to write a web API using Django. We will start by creating a Django app, defining some models and writing a few views to handle incoming requests. This works fairly well for the most part, until we start adding more apps that use each other's models, and all of a sudden we realize that our software spends more time talking about how to do things instead of what things to do.

We might be looking into caching some parts of our data for faster access, which means that we no longer should be using our Django models directly. Now we will have to refactor our code to avoid using those models directly.

I think the root cause of these issues stems from the fact that the default way in which we develop Django applications sends us down a path where our database is front and center in our codebase instead of being just another implementation detail, and we tend to write our logic in view classes, as the number of views increases the usage of our models gets spread out in our code.

I'd like to propose a different approach to the development of Django apps. Instead of relying on the models themselves to create, update, and fetch data, we might benefit from writing controller modules that implement higher level actions on the data. This means that we need to plan more carefully what our Django apps do and make sure that our data is being accessed only through our controller modules.

Our view classes should be responsible for the HTTP side of the communication. A request comes in to perform an action on a set of resources, the view function then communicates with the controller module to perform the action, and then it prepares the results received from the controller to fit the format the client expects.

For example, let’s create a new Django project and a new Django app called myapp. Let’s create a new model called Resource with a single boolean field named homepage.

from django.db import models

class Resource(models.Model):
    homepage = models.BooleanField()

We are now going to create a new view function that displays the ids of all the Resources with homepage=True.

from django.http import HttpResponse
from .models import Resource

def homepage(request):
    resources = Resource.objects.filter(homepage=True)
    return HttpResponse(
        '<br>'.join([f'Resource: {}' for x in resources])

The view function above is extremely basic, but we still can see that it’s performing two separate actions; it first fetches Resource objects from the database, and then applies some formatting rules to the Resources before it gives them back to the client.

Let’s move the logic part (i.e: fetching the Resource objects) into another class.

We create a file inside our myapp directory like so:

from .models import Resource

class MyAppController:
    def get_homepage_resources(self):
        return Resource.objects.filter(homepage=True).values()

Here we use .values() to return a list of dictionaries instead of a QuerySet so as not to leak the model instances to the calling code.

Then we modify the file to use the controller instead.

from django.http import HttpResponse
from .controller import MyAppController

ctrl = MyAppController()

def homepage(request):
    resources = ctrl.get_homepage_resources()
    return HttpResponse(
        '<br>'.join([f'Resource: {x["id"]}' for x in resources])

Now that the homepage view does not need to know where the Resource objects come from, the controller class could even be extended to support multiple databases, redis caching, etc, without having to change the view function.

I feel this will greatly improve our data integrity, help us better understand and maintain our invariants, and provide an easier path to maintaining and improving our data backends (e.g: it should be easier to add a caching system when the controller module is the only thing accessing that data).

"Avoid Direct Access to Models in Django Apps" by Juan Cabrera is licensed under CC BY SA. Source code examples are licensed under MIT. Categorized under research & learning.

Have questions?
Check out our FAQ to learn more about how we work and what we can do for you.
Read our FAQ