Ghost Conditions For Unindexed Columns

Perfect indexes can not be created for every query because of their overhead. Adding ghost conditions to those queries may lead to the database using better indexes.

Creating the correct indexes for e.g. user-defined data filtering or seldom used conditions is one of the most challenging tasks. You can't make an index for every column combination and must guess the most important ones. But you will still miss some resulting in slow queries. An excellent approach is adding ghost conditions to a query that are index-supported and don't change the results. These conditions are only added to help the database find the most efficient way of querying the data.

Usage

MySQL / PostgreSQL

-- Before
SELECT *
FROM parcels
WHERE status = 'open' AND insurance = 1;

-- After
SELECT *
FROM parcels
WHERE status = 'open' AND insurance = 1 AND type IN('AD', 'GV');

Detailed Explanation

Many applications have the problem of missing indexes for some rare queries. While an index will make reads faster, any write operation will be slowed down because the indexes have to be updated. Therefore, adding an index for every possible query is impractical as the overhead of updating all of them when writing data would far outweigh the benefit. Which indexes to create is always a hard choice between having faster select queries with slowing down all writes and a less-efficient read query with no slowdown for writes. Because of this, it is always recommended to consider whether an index is needed carefully.

This dilemma of which index to create is being made harder by faceted search. It gets very complicated when a user can select from many possible filters to control search results. There will not be perfect indexes for any search filter combination a user may construct. The most common ones will be indexed perfectly, but rare ones will use inadequate indexes that still have to scan through many not-matching rows to find the correct ones.

A slow select query can still be optimized when an index should not be added to prevent slowing down any write query. Applying domain knowledge about the data helps in changing the query. Sometimes a ghost condition can be added to the query with the only purpose of improving the execution:

  • The ghost condition has to be index-supported to make the query faster.
  • The ghost condition should keep the results the same: Any row remaining by the condition must also remain with the ghost condition. The ghost condition is often a less specific filter than the original one.

For the example above, the database can currently use the index on status or transportinsurance to find matching rows for the user-selected condition. However, either of the indexes is too broad so many rows will still have to be filtered after applying the index. A multi-column index on both rows should not be added because it is a rare combination of select filters. By applying domain knowledge, the fact that any insured parcel is always of type AD or GV can be inferred. Adding this condition to the query will not change the results because of the connection of both properties. However, the query will now execute much faster because an existing index on status and type is now applicable because of the new condition. Adding ghost conditions can result in the database using more efficient indexes but will not slow down queries. The only exception is index-only queries, which are very rare in reality.

Author Tobias Petry

Tobias Petry

SQL for Devs Founder


Are you interested to learn more?

Be notified on future content. Never spam.