Sometimes you have e.g. numerous orders of a customer but you only need the one with the highest price. This requirement is stated very often but not easy to do in SQL as the LIMIT clause does not work.
SQL is a straightforward and expressive language, but it sometimes lacks constructs for writing queries in a simple way. It's more complicated than it should be to write a query to get, e.g., only the most expensive order for every customer of the current fiscal year. You can either use PostgreSQL's vendor-specific DISTINCT ON feature or window functions for every other database.
SELECT *
FROM (
SELECT *, RANK() OVER(
PARTITION BY customer_id
ORDER BY price DESC
) AS _rank
FROM orders
WHERE year = 2024
) as _tmp
WHERE _rank = 1
SELECT DISTINCT ON (customer_id) *
FROM orders
WHERE year = 2024
ORDER BY customer_id ASC, price DESC
The need to get just one row of many similar ones exists very often:
E.g. you want to get the most expensive order for every customer of the current fiscal year.
However, you can't just use a
LIMIT 1
clause for this query, as you want to apply the limit specifically for every customer and not receive just one row for the entire query.
You could use for-each loops with lateral joins, but then a subquery has to be executed for every existing customer - including the ones that never bought a product in the specified year.
The SQL standard has still not added any simple way to execute a query for this requirement.
That's why PostgreSQL added a vendor-specific feature for precisely this purpose.
In addition to the standard
DISTINCT
clause, you can fine-tune the uniqueness constraint:
With an expression like
DISTINCT ON (customer_id)
, only the customer_id is used to remove duplicates instead of all the columns like normal.
The idea is to sort by the uniqueness columns to build consecutive groups of similar rows that should be deduplicated.
Only the first row is kept for each of these groups.
So more sorting conditions must be added to move the row to keep to the first position within every group.
Therefore, for the example, all rows are sorted by
customer_id ASC
to build the consecutive groups and then by
price DESC
to keep the row with the highest purchase price for every customer.
Most databases don't have a vendor-specific SQL extension, so you must use a more verbose workaround.
The idea is to assign incrementing numbers to the rows for every customer and only keep the first one.
This approach is easy with window functions:
The rank function will assign each row an incrementing number starting at one.
Within that window function, the
PARTITION customer_id
clause specifies that we want a different ranking sequence for each customer and the numbers should be assigned by the
price DESC
ordering.
Those ranked rows generated by the subquery are then filtered to only keep the ones with a ranking equal to one to keep only the highest price order for every customer.
SQL for Devs Founder
Be notified on future content. Never spam.