For each loops with LATERAL Joins

It is easy to join data with SQL. However, sometimes dependent joins returning just a tiny subset of the joined table's data is badly needed. With LATERAL Joins, the familiar for each loop can be replicated in SQL.

When joining multiple tables, the rows of both tables are linked together based on some conditions. However, when the result should e.g. be limited to the last three bought products for every customer, the standard join clause will not work: The joined bought products table can not be limited to only include three rows for every customer. That is not how joins do work, all rows matching the join criteria are included. But with lateral joins, a subquery will be joined instead of a table which is executed once for every customer resulting in a for-each loop equivalent in SQL.

Usage

MySQL / PostgreSQL

SELECT customers.*, recent_sales.*
FROM customers
LEFT JOIN LATERAL (
    SELECT *
    FROM sales
    WHERE sales.customer_id = customers.customer_id
    ORDER BY created_at DESC
    LIMIT 3
) AS recent_sales ON true;

Detailed Explanation

A join is usually executed in a constant way: Metaphorically speaking, every row of one table is checked against every row of another table. Only when the join condition is valid for the combination of both rows are they included in the result set. This algorithm is called a nested-loop join and is the most straightforward join algorithm to understand. However, in some situations, a non-constant join is required to LIMIT the rows of the joined table for every row of the source table instead of limiting the complete result. But this requirement is impossible with a constant join as it violates the relational algebra of just joining two sets of data with some operators.

Joining a table to a subquery will not help solve the requirement either. Any joined subquery is executed as an independent subquery; it is executed once for the complete join operation. The database is just transforming the subquery to a table-like structure:

  • The subquery is executed first.
  • The results are saved temporarily in memory or on disk (a “virtual table” is created).
  • The join algorithm is executed with the source table and the virtual target table.

The big difference and great feature of LATERAL joins is the change to the execution model: Instead of executing the subqueries once for all rows, it is now executed once for every row the subquery will be joined to. The procedure mimics a for-each loop in SQL, which iterates over the source table and executes the lateral join for every row with the source table's row as input. Querying the last three bought products for every customer had been impossible before, but it is easy with a for-each loop-like lateral join.

With lateral joins, the execution of a subquery in a join changes from an independent subquery to a dependent/derived subquery. The execution once for every row can have some performance implications. However, these performance implications are no reason never to use them. Without lateral joins the application is implementing the for-each loop in programming code: The subquery is sent to the database multiple times and needs to be parsed, planned, and executed repeatedly. By utilizing a lateral join the database can choose a more optimal execution plan and do the parsing and planning step only once. So, it is not like the lateral join is a slow database feature that should be avoided. On the contrary, it is a powerful feature that moves the application's for-each loop to the database, which can execute it much more efficiently.

Additional Resources

Author Tobias Petry

Tobias Petry

SQL for Devs Founder


Are you interested to learn more?

Be notified on future content. Never spam.