First Row of Many Similar Ones

SQL is a straightforward and expressive language, but it sometimes lacks constructs for writing queries in a simple way. It's more complicated than it should be to write a query to get, e.g., only the most expensive order for every customer of the current fiscal year. You can either use PostgreSQL's vendor-specific DISTINCT ON feature or window functions for every other database.

Usage

MySQL

SELECT *
FROM (
  SELECT *, RANK() OVER(
    PARTITION BY customer_id
    ORDER BY price DESC
  ) AS _rank
  FROM orders
  WHERE year = 2024
) as _tmp
WHERE _rank = 1

PostgreSQL

SELECT DISTINCT ON (customer_id) *
FROM orders
WHERE year = 2024
ORDER BY customer_id ASC, price DESC

Detailed Explanation

The need to get just one row of many similar ones exists very often: E.g. you want to get the most expensive order for every customer of the current fiscal year. However, you can't just use a LIMIT 1 clause for this query, as you want to apply the limit specifically for every customer and not receive just one row for the entire query. You could use for-each loops with lateral joins, but then a subquery has to be executed for every existing customer - including the ones that never bought a product in the specified year.

DISTINCT ON

The SQL standard has still not added any simple way to execute a query for this requirement. That's why PostgreSQL added a vendor-specific feature for precisely this purpose. In addition to the standard DISTINCT clause, you can fine-tune the uniqueness constraint: With an expression like DISTINCT ON (customer_id) , only the customer_id is used to remove duplicates instead of all the columns like normal. The idea is to sort by the uniqueness columns to build consecutive groups of similar rows that should be deduplicated. Only the first row is kept for each of these groups. So more sorting conditions must be added to move the row to keep to the first position within every group. Therefore, for the example, all rows are sorted by customer_id ASC to build the consecutive groups and then by price DESC to keep the row with the highest purchase price for every customer.

Workaround

Most databases don't have a vendor-specific SQL extension, so you must use a more verbose workaround. The idea is to assign incrementing numbers to the rows for every customer and only keep the first one. This approach is easy with window functions: The rank function will assign each row an incrementing number starting at one. Within that window function, the PARTITION customer_id clause specifies that we want a different ranking sequence for each customer and the numbers should be assigned by the price DESC ordering. Those ranked rows generated by the subquery are then filtered to only keep the ones with a ranking equal to one to keep only the highest price order for every customer.

Additional Resources

MySQL documentation: Window Functions
PostgreSQL documentation: DISTINCT ON

Tobias Petry

SQL for Devs Founder

Are you interested to learn more?

Be notified on future content. Never spam.