UPDATE from a SELECT - Database Tip

Updates to rows are always performed on a single table for a specific number of rows. When e.g. a discount should be applied to all products based on their category an UPDATE query is executed for every category.

SELECT category_id, discount FROM products;
-- | category_id | discount |
-- |-------------|----------|
-- | 1           | 0.2      |
-- | 2           | 0.3      |

UPDATE products
SET price_current = price_original - price_original * 0.2
WHERE category_id = 1;

UPDATE products
SET price_current = price_original - price_original * 0.3
WHERE category_id = 2;

However, when the UPDATE query depends on data of a different table, a more efficient approach is possible: The UPDATE can be joined to the result of a SELECT statement reducing it to a single query.

Usage

MySQL

UPDATE products
JOIN categories USING(category_id)
SET price_current =
  price_original - price_original * categories.discount

PostgreSQL

UPDATE products
SET price_current =
  price_original - price_original * categories.discount
FROM categories
WHERE products.category_id = categories.category_id

Detailed Explanation

Typically an application is doing a slow loop-based process to update rows in a table with the UPDATE statement depending on values of the other table:

const selectQuery = `
    SELECT category_id, discount
    FROM categories
`;
const updateQuery = `
    UPDATE products
    SET price_current = price_original - price_original * :discount
    WHERE category_id = :category_id
`;

for (const category of query(selectQuery)) {
    query(updateQuery, category);
}

For a large number of categories the UPDATE statement needs to be executed numerous times which may result in degraded performance for the application. When each statement is also affecting a sizeable share of the table, an index on category_id may not be used resulting in a full-table scan every time.

But the problem is getting worse when the requirements are more extensive because the UPDATE is depending on more complex relationships or calculations. The application may need to fetch billions of record ids into memory to do more fine-grained updates to the rows. Depending on memory constraints for the application this operation may result in a crash or degraded performance of the server's process. This problem is normally solved by chunking the data to reduce memory consumption, but more queries need to be executed. Which may increase the load on the databases again. Another problem occurring is the selection of many records for the update statements. A large number of records used in a IN(...) statement will not be efficient, the chunk size may need to be decreased further and more queries need to be executed.

Every application is implementing a JOIN of a SELECT with an UPDATE! But as shown in the usage section this feature is already implemented in all databases, you can do joins when updating data. By utilizing the databases feature set fully in this case, the most efficient approach is being used by the database automatically.

Additional Resources

MySQL Documentation: UPDATE statement with the multiple-table syntax
PostgreSQL documentation: UPDATE statement with the FROM clause

Tobias Petry

SQL for Devs Founder

Are you interested to learn more?

Be notified on future content. Never spam.