November 8, 2016

SQL Window Functions in Django

What are Window Functions?

Window functions can be seen as close relatives to our well-known aggregate functions. They perform a calculation over a set of rows and return a single value. Unlike aggregate functions, they do not group the input rows into a single output row, thus keeping the original information intact.

Window functions are part of the standard since version SQL:2003.

Setting up a test environment

Start your favorite editor with SQL support (make sure your RDBMS supports window functions) and create a table with some random data so we can go through all the subjects with working examples. We'll be using a table that represents connections from users to our system. Create a new database with a table as follows:

1
2
3
4
5
6
7
CREATE TABLE connections (
  id INT,
  user_id INT,
  source_ip VARCHAR(15), 
  connection_timestamp TIMESTAMP,
  bytes_transferred INT
)

Now populate it with:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
INSERT INTO connections VALUES
(1, 1, '192.168.1.100', '2016-10-01T13:30:00', 15720000),
(2, 1, '192.168.1.100', '2016-10-01T00:40:16', 19500000),
(3, 2, '192.168.1.100', '2016-10-01T19:09:48', 9840000),
(4, 3, '192.168.1.200', '2016-10-01T22:40:58', 10000),
(5, 3, '192.168.1.200', '2016-10-01T13:15:03', 20310000),
(6, 3, '192.168.1.200', '2016-10-01T10:10:48', 2010000),
(7, 4, '192.168.1.200', '2016-10-01T23:25:21', 20310000),
(8, 4, '192.168.1.200', '2016-10-01T04:06:49', 810000),
(9, 1, '192.168.1.200', '2016-10-01T16:07:10', 91280000)

How to use them

Window functions are declared just as an aggregate function followed by an OVER clause, which indicates exactly how rows are being grouped, plus a few other details that we'll dive into later.

We could query the average of bytes transferred per source IP across the connections from our table like this:

1
SELECT *, AVG(bytes_transferred) OVER (PARTITION BY source_ip) FROM connections

...which results in

id user_id source_ip connection_timestamp bytes_transferred avg
1 1 192.168.1.100 2016-10-01 13:30:00.000000 15720000 15020000
2 1 192.168.1.100 2016-10-01 00:40:16.000000 19500000 15020000
3 2 192.168.1.100 2016-10-01 19:09:48.000000 9840000 15020000
4 3 192.168.1.200 2016-10-01 22:40:58.000000 10000 22455000
5 3 192.168.1.200 2016-10-01 13:15:03.000000 20310000 22455000
6 3 192.168.1.200 2016-10-01 10:10:48.000000 2010000 22455000
7 4 192.168.1.200 2016-10-01 23:25:21.000000 20310000 22455000
8 4 192.168.1.200 2016-10-01 04:06:49.000000 810000 22455000
9 1 192.168.1.200 2016-10-01 16:07:10.000000 91280000 22455000

Notice that we still have the original 9 rows with the result added as a new column. Our window function didn't alter the original input at all.

Also notice that we are using the good old AVG function here. This is because all existing aggregate functions can be used as window functions. There are, though, a few new functions that can only be used in the latter form.

PARTITIONing vs GROUPing

The group of rows onto which the window function is applied is called "partition". In its basic form, a partition is no different than a normal aggregate function group: simply a set of rows considered "equal" by some criteria, and the function will be applied over all these rows to return a single result. However, a window function is not executed once for each partition, but rather for each row, and the result may vary from row to row...

Enter the "window frame"

If my row belongs to a partition that doesn't change and the window function executes across all the rows within the partition, how can the result vary?

It can vary because the first part isn't totally true; the window function actually executes over a subset of the partition called "window frame". In our previous example the window frame was equal to the partition so this behavior was not exhibited, but we can alter the window frame like this:

1
SELECT *, AVG(bytes_transferred) OVER (PARTITION BY source_ip ORDER BY bytes_transferred) FROM connections
id user_id source_ip connection_timestamp bytes_transferred avg
3 2 192.168.1.100 2016-10-01 19:09:48.000000 9840000 9840000
1 1 192.168.1.100 2016-10-01 13:30:00.000000 15720000 12780000
2 1 192.168.1.100 2016-10-01 00:40:16.000000 19500000 15020000
4 3 192.168.1.200 2016-10-01 22:40:58.000000 10000 10000
8 4 192.168.1.200 2016-10-01 04:06:49.000000 810000 410000
6 3 192.168.1.200 2016-10-01 10:10:48.000000 2010000 943333.333333333333
5 3 192.168.1.200 2016-10-01 13:15:03.000000 20310000 8690000
7 4 192.168.1.200 2016-10-01 23:25:21.000000 20310000 8690000
9 1 192.168.1.200 2016-10-01 16:07:10.000000 91280000 22455000

Now the avg column shows different results for rows of the same partition because the window frame is different for each row. When using ORDER BY, the window frame contains all the rows from the start of the partition to our current row, including all subsequent rows that are equal to the current one according to the ORDER BY clause. Remember this behavior, else you will probably end up with results that don't seem to make sense. Of course, once you grasp the concept of the window frame you'll realize it's not a "strange behavior" but a powerful feature that will allow you to precisely define what rows a window function can see.

For example, let's look at the first 2 rows: we know the partition is defined by the source_ip field, so the first 3 rows belong to it. In the first row, the window frame consists of only that row, so the average is equal to that row's bytes_transferred. In the second row, the window frame goes from the first row to the current row, so the average is calculated with both their values.

Now take a look at row #7: the window frame here goes from row #4 (the first of this partition) to row #8, which is past our current row, because that row's bytes_transferred is equal to the current row's according to the ORDER BY criteria. This results in rows #7 and #8 having the same window frame, and thus the same avg.

The window frame can be defined with a "frame clause" as well, but we won't dive into its details here. Postgresql documentation has a great explanation of how it works. If neither this nor the ORDER BY clause is used, the window frame is the whole partition.

Window functions execution order

Window functions execute after WHERE, HAVING and GROUP BY clauses, meaning that we can't use the results of window functions in any of these (without resorting to sub-queries) and also that these constructs will affect the input rows that end up forming our partitions.

Aggregate functions are also executed before window functions, which means that we can't use the results of a window function inside an aggregate function but we can do it the other way around.

Window functions vs typical SQL queries

Say we want to fetch the last connection from each user including all the columns of the table, we have a few alternatives before resorting to window functions. Let's review them:

Alternative 1: sub-query with aggregate function

Here we use a sub-query which returns the maximum connection_timestamp from the current user and compares it against the current row's connection_timestamp to check if they're equal. The drawback is that the sub-query is executed once per row, causing considerable performance issues when dealing with larger numbers of rows (~20s in my PostgreSQL setup with 13k rows vs ~35ms using the solution at the end with window functions).

1
2
3
4
SELECT * FROM connections WHERE connection_timestamp = (
  SELECT max(connection_timestamp) FROM connections AS sub
  WHERE sub.user_id = connections.user_id
)

Alternative 2: joining the table with itself

Now we join the table against itself on the condition that the user_id is the same on both rows and that the row "on the left" has a smaller connection_timestamp than the one "on the right". This query returns all connections for which another more recent connection by the same user exists. Finally, we select all the connections that don't exist in that set, and we have what we want.

The downside with this approach is readability (you really have to look into it to figure what it does) and the inability to fetch the "N" latest connections dynamically, as it would require to join the table against itself as many times as "latest connections" we want.

1
2
3
4
5
6
SELECT * FROM connections WHERE id NOT IN (
  SELECT c1.id
  FROM connections AS c1
  JOIN connections AS c2 USING (user_id)
  WHERE c1.connection_timestamp < c2.connection_timestamp
)

With window functions

Finally, with the help of the row_number function and a sub-query, we can achieve our goal like this:

1
2
3
4
SELECT * FROM (
  SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY connection_timestamp DESC)
  FROM connections
) AS sub WHERE row_number = 1

The row_number window function simply returns the index of the current row within its partition starting from 1, so we are asking for all the connections that are first when ordered by the larger connection_timestamp, partitioning by user_id. Since we can't use this row_number inside the WHERE clause, we wrap the whole thing in a sub-query and do the filtering in the outer-query.

Django is in the title but you don't talk about it

Right! Well, window functions have the advantage that their syntax is contained in a single statement that goes in the SELECT list where any other column would go (or in the ORDER BY clause, where they are also allowed), so we don't need any special support to use them with Django's ORM.

A simple annotate call with a RawSQL expression is enough to make use of window functions.

1
2
3
4
5
6
from django.db.models.expressions import RawSQL
from connections.models import Connection
Connection.objects.annotate(row_number=RawSQL(
    'row_number() OVER (PARTITION BY user_id ORDER BY connection_timestamp DESC)',
    []
))

Further reading