Creating the correct indexes for e.g. user-defined data filtering or seldom used conditions is one of the most challenging tasks. You can't make an index for every column combination and must guess the most important ones. But you will still miss some resulting in slow queries. An excellent approach is adding ghost conditions to a query that are index-supported and don't change the results. These conditions are only added to help the database find the most efficient way of querying the data.
-- Before
SELECT *
FROM parcels
WHERE status = 'open' AND insurance = 1;
-- After
SELECT *
FROM parcels
WHERE status = 'open' AND insurance = 1 AND type IN('AD', 'GV');
Many applications have the problem of missing indexes for some rare queries. While an index will make reads faster, any write operation will be slowed down because the indexes have to be updated. Therefore, adding an index for every possible query is impractical as the overhead of updating all of them when writing data would far outweigh the benefit. Which indexes to create is always a hard choice between having faster select queries with slowing down all writes and a less-efficient read query with no slowdown for writes. Because of this, it is always recommended to consider whether an index is needed carefully.
This dilemma of which index to create is being made harder by faceted search. It gets very complicated when a user can select from many possible filters to control search results. There will not be perfect indexes for any search filter combination a user may construct. The most common ones will be indexed perfectly, but rare ones will use inadequate indexes that still have to scan through many not-matching rows to find the correct ones.
A slow select query can still be optimized when an index should not be added to prevent slowing down any write query. Applying domain knowledge about the data helps in changing the query. Sometimes a ghost condition can be added to the query with the only purpose of improving the execution:
For the example above, the database can currently use the index on
status
or
transportinsurance
to find matching rows for the user-selected condition.
However, either of the indexes is too broad so many rows will still have to be filtered after applying the index.
A multi-column index on both rows should not be added because it is a rare combination of select filters.
By applying domain knowledge, the fact that any insured parcel is always of type
AD
or
GV
can be inferred.
Adding this condition to the query will not change the results because of the connection of both properties.
However, the query will now execute much faster because an existing index on status and type is now applicable because of the new condition.
Adding ghost conditions can result in the database using more efficient indexes but will not slow down queries.
The only exception is index-only queries, which are very rare in reality.
SQL for Devs Founder
Be notified on future content. Never spam.