Complex Outer Joins with Django’s FilteredRelations

Javier Ayres
March 30, 2020

Django's ORM is great, and it keeps getting better. Today we will look at a problem that is trivial to solve with plain SQL but, until recently, couldn't be solved with the ORM without resorting to more complex (and often less performant) approaches. ​

Setting Up a Test Environment

​ To demonstrate this problem I'm going to create a new Django project with a polls app. This app will have just 2 models as follows: ​

from django.db import models


class Voter(models.Model):
    name = models.CharField(max_length=100)


class Vote(models.Model):
    value = models.CharField(max_length=100)
    voter = models.ForeignKey(Voter, on_delete=models.CASCADE)
    year = models.PositiveIntegerField()

​ So we have a Voter model and the actual votes in Vote. Let's populate it with some data: ​

juan = Voter.objects.create(name='Juan')
gustavo = Voter.objects.create(name='Gustavo')
kelian = Voter.objects.create(name='Kelian')
micaela = Voter.objects.create(name='Micaela')

Vote.objects.create(voter=Voter.objects.get(name='Juan'), value='C++', year=2019)
Vote.objects.create(voter=Voter.objects.get(name='Juan'), value='Javascript', year=2020)
Vote.objects.create(voter=Voter.objects.get(name='Gustavo'), value='Javascript', year=2020)
Vote.objects.create(voter=Voter.objects.get(name='Kelian'), value='C#', year=2019)

​ Okay, we see that Juan voted for C++ in 2019 and then changed his mind to Javascript in 2020, nothing wrong with that! Gustavo voted for Javascript in 2020, Kelian voted for C# in 2019, and Micaela hasn't voted yet. ​

The Requirement

​ Management is not happy. They want to see the results for 2020, but they want to include registered voters that didn't vote too. They say they can't do that with the current reports. So we set up to write a query for this, using our beloved ORM of course. ​ The first requirement is that we need to show all voters, despite having voted or not. This is a hint that we can't simply inner join the two tables, we need an outer join to keep the records in Voter that don't have a match in Vote. We can make Django do a left join by starting the queryset from the reverse side of the relation: Voter. ​

In [33]: from polls.models import *
In [34]: from django.db.models import F
In [35]: for v in Voter.objects.annotate(value=F('vote__value'), year=F('vote__year')).values('name', 'value', 'year'):
    ...:     print(v)
    ...:
{'name': 'Juan', 'vote__value': 'C++', 'vote__year': 2019}
{'name': 'Juan', 'vote__value': 'Javascript', 'vote__year': 2020}
{'name': 'Gustavo', 'vote__value': 'Javascript', 'vote__year': 2020}
{'name': 'Kelian', 'vote__value': 'C#', 'vote__year': 2019}
{'name': 'Micaela', 'vote__value': None, 'vote__year': None}

​ Alright! Doesn't look so bad, except we have votes from every year. No biggie, we just forgot to restrict this to 2020 votes: ​

In [36]: for v in Voter.objects.annotate(value=F('vote__value'), year=F('vote__year')).values('name', 'value', 'year').filter(year=2020):
    ...:     print(v)
    ...:
{'name': 'Juan', 'vote__value': 'Javascript', 'vote__year': 2020}
{'name': 'Gustavo', 'vote__value': 'Javascript', 'vote__year': 2020}

​ Hmm, wait, now we lost Micaela and Kelian because they have no votes in 2020. We know they should come up with None in their Vote columns, so maybe we need to filter by 2020 OR null. ​

In [37]: from django.db.models import Q
In [38]: for v in Voter.objects.annotate(value=F('vote__value'), year=F('vote__year')).values('name', 'value', 'year').filter(Q(year=2020) | Q(year__isnull=True)):
    ...:     print(v)
    ...:
{'name': 'Juan', 'value': 'Javascript', 'year': 2020}
{'name': 'Gustavo', 'value': 'Javascript', 'year': 2020}
{'name': 'Micaela', 'value': None, 'year': None}

​ Nope! We got Micaela back, but Kelian is still missing because he does have a vote, but it's from 2019. ​

Our problem here is that the order in which the filters are being applied doesn't work for our requirement. Let's take a look a the query: ​

In [39]: print(Voter.objects.annotate(value=F('vote__value'), year=F('vote__year')).values('name', 'value', 'year').filter(Q(year=2020) | Q(year__isnull=True)).query)
SELECT "polls_voter"."name", "polls_vote"."value" AS "value", "polls_vote"."year" AS "year" FROM "polls_voter" LEFT OUTER JOIN "polls_vote" ON ("polls_voter"."id" = "polls_vote"."voter_id") WHERE ("polls_vote"."year" = 2020 OR "polls_vote"."year" IS NULL)

​ Not a complex query, just a left join with a WHERE clause. Let's think about the order of the filters:

1) We have the voters and we join them with the votes when the foreign key on Vote matches. This leaves Micaela's row out, and Juan and Kelian have a match using a 2019 vote.

2) Our join clause is actually a left join, so all the voters that didn't have a match are brought back in, using NULL for the vote columns. Micaela's row is back in.

3) We filter by year = 2020 OR year IS NULL. Kelian's vote year is neither 2020 nor NULL, so his row if filtered out.

What we need is a little bit different. In step 1, we need to match the rows by the foreign key AND the year being 2020. This would leave Micaela and Kelian's rows out in the first step, and they would be included back in on step 2 with NULL in their vote columns. Then our step 3 would work perfectly! ​

This can be done by applying our filters in the ON clause. In SQL you can simply extend the ON clause of the left join with as many conditions as you need. In Django, sadly, all .filter operations are applied on a WHERE clause, so there is no way to achieve this... until Django 2.0! ​

Django 2.0 introduced FilteredRelation() objects. With these, you can add extra conditions in the join's ON clause and they will be ANDed together with the foreign key condition, which is included by default. Let's try it out: ​

In [41]: from django.db.models import FilteredRelation
In [41]: for v in Voter.objects.annotate(votes2020=FilteredRelation('vote', condition=Q(vote__year=2020))).values('name', 'votes2020__value', 'votes2020__year'):
    ...:     print(v)
    ...:
{'name': 'Juan', 'votes2020__value': 'Javascript', 'votes2020__year': 2020}
{'name': 'Gustavo', 'votes2020__value': 'Javascript', 'votes2020__year': 2020}
{'name': 'Micaela', 'votes2020__value': None, 'votes2020__year': None}
{'name': 'Kelian', 'votes2020__value': None, 'votes2020__year': None}

​ Yes! We have our 4 voters and only those who voted in 2020 have their results. Let's double check the query: ​

In [42]: print(Voter.objects.annotate(votes2020=FilteredRelation('vote', condition=Q(vote__year=2020))).values('name', 'votes2020__value', 'votes2020__year').query)
SELECT "polls_voter"."name", votes2020."value", votes2020."year" FROM "polls_voter" LEFT OUTER JOIN "polls_vote" votes2020 ON ("polls_voter"."id" = votes2020."voter_id" AND (votes2020."year" = 2020))

​ In effect, our new filter is being applied in the ON clause and there's no WHERE in sight. Management will be very pleased, all thanks to Django 2.0!

"Complex Outer Joins with Django’s FilteredRelations" by Javier Ayres is licensed under CC BY SA. Source code examples are licensed under MIT.

Photo by Dhru J.

Categorized under research & learning.

We are Sophilabs

A software design and development agency that helps companies build and grow products by delivering high-quality software through agile practices and perfectionist teams.