Multiple Aggregates in One Query

For e.g. a dashboard, multiple queries on the same data set have to be executed to generate exciting information. However, these queries can also be combined into one that has to scan the data only once.

In some cases, multiple queries calculate statistics for e.g. a dashboard. Instead of executing numerous queries, one query can collect all the information in one single pass through the data. Depending on the data and indexes, this could dramatically speed up any application as any query merged into the single query is one query less to execute. It is an approach worth experimenting with.

Usage

MySQL

SELECT
  SUM(releasedate = 2001) AS released_2001,
  SUM(releasedate = 2002) AS released_2002,
  SUM(director = 'Steven Spielberg') AS director_stevenspielberg,
  SUM(director = 'James Cameron') AS director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';

PostgreSQL

SELECT
  COUNT(*) FILTER (WHERE releasedate = 2001) AS released_2001,
  COUNT(*) FILTER (WHERE releasedate = 2002) AS released_2002,
  COUNT(*) FILTER (WHERE director = 'Steven Spielberg') AS director_stevenspielberg,
  COUNT(*) FILTER (WHERE director = 'James Cameron') AS director_jamescameron
FROM movies
WHERE streamingservice = 'Netflix';

Detailed Explanation

It is a common requirement to calculate many different aggregates for e.g. a dashboard to display informative data. Countless queries are executed to filter or group the data based on distinct requirements for putting together this information. However, sometimes the same rows must be scanned repeatedly to calculate different statistics about them. Loading and processing these identical rows for different statistics again and again is a time-wasting operation. The more statistics displayed on the dashboard, the longer the page will load as more queries need to be executed.

A perfect index cannot be created for all those queries, as inserting new rows would be slowed down dramatically when dozens of indexes need to be updated. So most queries will scan many rows, halfway improved by using generic but not optimal indexes. This process can be optimized by executing only one less-perfect query that has to aggregate the data in multiple ways while looping the data. An index can still be used by filtering the query with the least common denominator of all aggregations. The resulting query will be slightly slower than each single one but will compensate for it by only having to run once.

The MySQL approach works because of a quirk in its type system. As MySQL does not have a boolean value type, a condition's true or false value is transformed to a one or zero integer that can be summed over all rows. It is a nice workaround for the missing filter clause available in PostgreSQL and many other databases. While the PostgreSQL approach is longer it is also more expressive and can do things the MySQL workaround is incapable of. The full support for aggregations is retained, which means constructs like JSONB_AGG(DISTINCT genre) FILTER(WHERE director = 'Quentin Tarantino') to get all genres Quentin Tarantino was a director for as a JSON array are still possible. The opportunities to replace multiple queries with a single one are endless with the filter clause.

Additional Resources

Author Tobias Petry

Tobias Petry

SQL for Devs Founder


Was this database tip helpful?

Be notified on future content. Never spam.