Since PostgreSQL 9.3 we have a new LATERAL option for FROM-clause subqueries and function calls.
What is it?
According to the documentation what it does is:
The LATERAL keyword can precede a sub-SELECT FROM item. This allows the sub-SELECT to refer to columns of FROM items that appear before it in the FROM list. (Without LATERAL, each sub-SELECT is evaluated independently and so cannot cross-reference any other FROM item.)
Basically, what it does is that for each row in the main select, it evaluates the sub-select using the main select row as a parameter. Quite similar to a for loop that iterates through the rows returned by a SQL query.
TOP N results
Let’s think about an example. We need to fetch the most recent items of each tag in a list.
Imagine that we have a table tags
and a table movies
, with a one-to-many relationship. To make things easier, a movie can only have one tag. I want to fetch the most recent videos of each tag. How would you do it?
Let’s create our tables and populate them with some data, to help us visualize.
CREATE TABLE tags (
id serial PRIMARY KEY,
name VARCHAR(255)
);
CREATE TABLE movies (
id serial PRIMARY KEY,
name VARCHAR(255),
tag_id int NOT NULL,
created_at timestamp NOT NULL DEFAULT NOW(),
FOREIGN KEY (tag_id) REFERENCES tags(id) ON UPDATE CASCADE
);
CREATE INDEX movies_tag_id_index ON movies (tag_id);
-- Genres
INSERT INTO "tags"("name") VALUES('Action');
INSERT INTO "tags"("name") VALUES('Animation');
INSERT INTO "tags"("name") VALUES('Sci-Fi');
-- Movies
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('The Matrix', (SELECT id FROM "tags" where "name" = 'Action'), '1999-05-21');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('Tenet', (SELECT id FROM "tags" where "name" = 'Action'), '2020-10-29');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('Wonder Woman 1984', (SELECT id FROM "tags" where "name" = 'Action'), '2020-12-25');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('Toy Story', (SELECT id FROM "tags" where "name" = 'Animation'), '1995-12-22');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('Monsters Inc.', (SELECT id FROM "tags" where "name" = 'Animation'), '2001-11-14');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('Finding Nemo', (SELECT id FROM "tags" where "name" = 'Animation'), '2003-07-4');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('Arrival', (SELECT id FROM "tags" where "name" = 'Sci-Fi'), '2016-10-24');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('Minority Report', (SELECT id FROM "tags" where "name" = 'Sci-Fi'), '2002-08-02');
INSERT INTO "movies"("name", "tag_id", "created_at") VALUES('The Midnight Sky', (SELECT id FROM "tags" where "name" = 'Sci-Fi'), '2020-12-23');
Using row_number to solve it
One way that we can achieve the desired result is to use the row_number()
window function combined with an WITH query (common table expressions).
It will number the rows according to the partition config we specify in the over clause. That way, when we fetch the entries, we can limit the number of movies from each tag.
Let’s see how it works.
SELECT
tag_id,
name,
created_at,
ROW_NUMBER() OVER(PARTITION BY tag_id ORDER BY tag_id, created_at DESC)
FROM movies;
tag_id | name | created_at | row_number |
---|---|---|---|
1 | Wonder Woman 1984 | 2020-12-25 00:00:00 | 1 |
1 | Tenet | 2020-10-29 00:00:00 | 2 |
1 | The Matrix | 1999-05-21 00:00:00 | 3 |
2 | Finding Nemo | 2003-07-04 00:00:00 | 1 |
2 | Monsters Inc. | 2001-11-14 00:00:00 | 2 |
2 | Toy Story | 1995-12-22 00:00:00 | 3 |
3 | The Midnight Sky | 2020-12-23 00:00:00 | 1 |
3 | Arrival | 2016-10-24 00:00:00 | 2 |
3 | Minority Report | 2002-08-02 00:00:00 | 3 |
Because we specified the OVER(PARTITION BY tag_id ORDER BY tag_id, created_at DESC)
, the results are partitioned by tag, and ordered both by tag and created_at from newer to older.
This is closer to what we want, but we only want two of each tag, so let’s use it with a CTE and include a condition to only fetch the top two of each one.
with movies_by_tags (tag_id, name, created_at, rank) as (
SELECT
tag_id,
name,
created_at,
ROW_NUMBER() OVER(PARTITION BY tag_id ORDER BY tag_id, created_at DESC)
FROM movies
)
select *
from movies_by_tags mbt
where mbt.rank < 3
;
tag_id | name | created_at | rank |
---|---|---|---|
1 | Wonder Woman 1984 | 2020-12-25 00:00:00 | 1 |
1 | Tenet | 2020-10-29 00:00:00 | 2 |
2 | Finding Nemo | 2003-07-04 00:00:00 | 1 |
2 | Monsters Inc. | 2001-11-14 00:00:00 | 2 |
3 | The Midnight Sky | 2020-12-23 00:00:00 | 1 |
3 | Arrival | 2016-10-24 00:00:00 | 2 |
Using the lateral keyword to solve it
Now that we understand what we need to do better. A much more simple way to solve this problem would be using the lateral
keyword.
SELECT *
FROM tags t
JOIN LATERAL (
SELECT m.*
FROM movies m
WHERE m.tag_id = t.id
ORDER BY m.created_at DESC
FETCH FIRST 2 ROWS ONLY
) e1 ON true
;
What’s happening here, is that because of the lateral keyword, we can reference the tags table in the sub-select, and fetch only the movies from that tag, already limiting the results on that step.
If we try to do it without it, we wouldn’t be able to run that query, and it would raise an error.
# SELECT *
# FROM tags t
# JOIN (
# SELECT m.*
# FROM movies m
# WHERE m.tag_id = t.id
# ORDER BY m.created_at DESC
# FETCH FIRST 2 ROWS ONLY
# ) e1 ON true
# ;
ERROR: invalid reference to FROM-clause entry for table "t"
LINE 6: WHERE m.tag_id = t.id
^
HINT: There is an entry for table "t", but it cannot be referenced from this part of the query.
Comparing the performance of both solutions
Let’s run an explain analyze
and talk about both solutions.
Using the row_number
explain analyze
with movies_by_tags (tag_id, name, created_at, rank) as (
SELECT
tag_id,
name,
created_at,
ROW_NUMBER() OVER(PARTITION BY tag_id ORDER BY tag_id, created_at DESC)
FROM movies
)
select *
from movies_by_tags mbt
where mbt.rank < 3
;
Subquery Scan on mbt (cost=16.39..21.29 rows=47 width=536) (actual time=0.066..0.080 rows=6 loops=1)
Filter: (mbt.rank < 3)
Rows Removed by Filter: 3
-> WindowAgg (cost=16.39..19.54 rows=140 width=536) (actual time=0.064..0.076 rows=9 loops=1)
-> Sort (cost=16.39..16.74 rows=140 width=528) (actual time=0.048..0.049 rows=9 loops=1)
Sort Key: movies.tag_id, movies.created_at DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on movies (cost=0.00..11.40 rows=140 width=528) (actual time=0.012..0.015 rows=9 loops=1)
Planning Time: 0.203 ms
Execution Time: 0.165 ms
Using the lateral join
explain analyze
SELECT *
FROM tags t
JOIN LATERAL (
SELECT m.*
FROM movies m
WHERE m.tag_id = t.id
ORDER BY m.created_at DESC
FETCH FIRST 2 ROWS ONLY
) e1 ON true
;
Nested Loop (cost=8.17..1159.05 rows=140 width=1052) (actual time=0.026..0.040 rows=6 loops=1)
-> Seq Scan on tags t (cost=0.00..11.40 rows=140 width=520) (actual time=0.006..0.007 rows=3 loops=1)
-> Limit (cost=8.17..8.18 rows=1 width=532) (actual time=0.008..0.009 rows=2 loops=3)
-> Sort (cost=8.17..8.18 rows=1 width=532) (actual time=0.007..0.008 rows=2 loops=3)
Sort Key: m.created_at DESC
Sort Method: quicksort Memory: 25kB
-> Index Scan using movies_tag_id_index on movies m (cost=0.14..8.16 rows=1 width=532) (actual time=0.003..0.004 rows=3 loops=3)
Index Cond: (tag_id = t.id)
Planning Time: 0.161 ms
Execution Time: 0.068 ms
We can see that both solutions have similar results, with a few milliseconds of difference between them, and we can also see that the LATERAL
join performed better.
What happens when we populate the database with more movies? Let’s try.
-- Generates 3_000_000 movies
INSERT INTO "movies"("name", "tag_id")
SELECT
generate_series(1,1000000) as "name",
(SELECT id FROM "tags" where "name" = 'Action')
;
INSERT INTO "movies"("name", "tag_id")
SELECT
generate_series(1,1000000) as "name",
(SELECT id FROM "tags" where "name" = 'Animation')
;
INSERT INTO "movies"("name", "tag_id")
SELECT
generate_series(1,1000000) as "name",
(SELECT id FROM "tags" where "name" = 'Sci-Fi')
;
# select count(*) from movies;
-[ RECORD 1 ]--
count | 3000009
Now that we have a lot of data, let’s run the explain analyze
again and see how it goes.
Using the row_number
Subquery Scan on mbt (cost=172089.82..181453.23 rows=89175 width=536) (actual time=2207.043..8589.191 rows=6 loops=1)
Filter: (mbt.rank < 3)
Rows Removed by Filter: 3000003
-> WindowAgg (cost=172089.82..178109.15 rows=267526 width=536) (actual time=2207.041..8318.158 rows=3000009 loops=1)
-> Sort (cost=172089.82..172758.63 rows=267526 width=528) (actual time=1619.227..2092.200 rows=3000009 loops=1)
Sort Key: movies.tag_id, movies.created_at DESC
Sort Method: external merge Disk: 96624kB
-> Seq Scan on movies (cost=0.00..21784.26 rows=267526 width=528) (actual time=0.038..530.664 rows=3000009 loops=1)
Planning Time: 0.127 ms
Execution Time: 8595.917 ms
Let’s break it down.
Taking a closer look into the output, we can see that the WindowAgg
and the Subquery Scan
on the CTE are the two most expensive actions in the query.
Starting from the bottom, first we do a sequential scan
on the movies table. This means that we scan through every page of data sequentially, read the data, apply the filter (if exists) and then discard the rows that don’t fit. This kind of scan always reads everything in the table. Unless we need a large portion of the table, this is generally inefficient.
Then we apply the sort
, and we see that we are using the external merge
instead of the quicksort
from the first explain analyze
. This happens when the data does not fit in memory, and we use the disk to process it. This is probably the slowest sort method since there is a lot of back and forth to the disk.
After all that, we read from the CTE and apply our filter mbt.rank < 3
that removes 3_000_003 rows.
Using the lateral join
Nested Loop (cost=44189.51..6186549.45 rows=280 width=542) (actual time=315.465..928.772 rows=6 loops=1)
-> Seq Scan on tags t (cost=0.00..11.40 rows=140 width=520) (actual time=0.011..0.014 rows=3 loops=1)
-> Limit (cost=44189.51..44189.52 rows=2 width=22) (actual time=309.581..309.581 rows=2 loops=3)
-> Sort (cost=44189.51..46689.52 rows=1000003 width=22) (actual time=309.578..309.578 rows=2 loops=3)
Sort Key: m.created_at DESC
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using movies_tag_id_index on movies m (cost=0.43..34189.48 rows=1000003 width=22) (actual time=0.033..193.517 rows=1000003 loops=3)
Index Cond: (tag_id = t.id)
Planning Time: 0.165 ms
Execution Time: 928.809 ms
Let’s break it down.
Starting from the bottom, we can see that we are doing an index scan
using the index that we created when we created the tables. This means that we can read only some of the rows in the table, and since we want a small set of rows, Postgres can directly ask the index. Since we don’t need to read the whole table, and we only want a few rows from it, this is faster than the seq scan
.
Then we apply the sort
, and we see that we are using the top-N heapsort
in this case. This sort is used when you only want a couple of sorted rows. Which fits perfectly for our case of fetching the 2 most recent movies.
After that, we limit and perform the join using a nested loop (where the right relation is scanned once for every row found in the left relation), and then return.
Conclusion
We can see from the output that the execution time using the row_number
was around 8.5s, and the lateral
was around 0.9s.
The row_number
solution has to do a lot more work, reading through all the table rows and removing a lot of rows, so it can return the desired output. This is visible by its execution time. While the lateral
join solution is able to make use of indexes and a better sort method.
There is probably a few things we could do to optimize both the row_number
solution and the lateral
one. But, in general, for this TOP-N case, the lateral join
is a better fit.
I hope that helps!