In most cases an UPDATE statement is executed on a single table with specific values. However, in some cases it is practical to directly update rows based on values of another table or the result of another query.
Updates to rows are always performed on a single table for a specific number of rows. When e.g. a discount should be applied to all products based on their category an UPDATE query is executed for every category.
SELECT category_id, discount FROM products;
-- | category_id | discount |
-- |-------------|----------|
-- | 1 | 0.2 |
-- | 2 | 0.3 |
UPDATE products
SET price_current = price_original - price_original * 0.2
WHERE category_id = 1;
UPDATE products
SET price_current = price_original - price_original * 0.3
WHERE category_id = 2;
UPDATE products
JOIN categories USING(category_id)
SET price_current =
price_original - price_original * categories.discount
UPDATE products
SET price_current =
price_original - price_original * categories.discount
FROM categories
WHERE products.category_id = categories.category_id
Typically an application is doing a slow loop-based process to update rows in a table with the UPDATE statement depending on values of the other table:
const selectQuery = `
SELECT category_id, discount
FROM categories
`;
const updateQuery = `
UPDATE products
SET price_current = price_original - price_original * :discount
WHERE category_id = :category_id
`;
for (const category of query(selectQuery)) {
query(updateQuery, category);
}
For a large number of categories the UPDATE statement needs to be executed numerous times which may result in degraded performance for the application.
When each statement is also affecting a sizeable share of the table, an index on
category_id
may not be used resulting in a full-table scan every time.
But the problem is getting worse when the requirements are more extensive because the UPDATE is depending on more complex relationships or calculations.
The application may need to fetch billions of record ids into memory to do more fine-grained updates to the rows.
Depending on memory constraints for the application this operation may result in a crash or degraded performance of the server's process.
This problem is normally solved by chunking the data to reduce memory consumption, but more queries need to be executed.
Which may increase the load on the databases again.
Another problem occurring is the selection of many records for the update statements.
A large number of records used in a
IN(...)
statement will not be efficient, the chunk size may need to be decreased further and more queries need to be executed.
Every application is implementing a JOIN of a SELECT with an UPDATE! But as shown in the usage section this feature is already implemented in all databases, you can do joins when updating data. By utilizing the databases feature set fully in this case, the most efficient approach is being used by the database automatically.
SQL for Devs Founder
Be notified on future content. Never spam.