Principle 3: From Left To Right
An index on a single column is simple and easy to understand but the real problem and performance improvement opportunity is always with multi-column indexes - also called composite indexes. Once you have mastered using them, you can fix all slow queries yourself. Because of this, most of the rest of this book will focus on multi-column indexes.
The rule for multi-column indexes can be summarized as “From Left to Right Without Skipping a Column”. This simple sentence describes precisely how these indexes are used. However, it is hard to fully understand.
The fundamental constraint is that an index on the columns firstname
, lastname
and country
(Fig. A) can only be used to speed up a certain type of query:
SELECT * FROM contacts WHERE firstname = 'James'
SELECT * FROM contacts WHERE firstname = 'James' AND lastname = 'Walker'
SELECT * FROM contacts WHERE firstname = 'James' AND lastname = 'Walker' AND country = 'US'
Indexes Are Used From Left to Right
It is important to remember that index entries are always sorted by the value of the first column, all duplicate values are then by the second one and so on. The index summary (“1.1 A Different View on B+ Trees”) can then be imagined as a funnel to narrow down the offset within the index by each column (Fig. B).
The condition firstname = 'James' AND lastname = 'Walker' AND country = 'US'
is executed like:
- Start at the top of the index by choosing the funnel for
firstname = 'James'
-
For the
lastname = 'Walker'
condition, choose now the middle row in the second column of the funnel. The row above forSmith
and below forYoung
are now ignored as they can't have a result for the WHERE conditions. - Go one step further to the right on the funnel and choose from the options
GB
andUS
the last one.
Finding the correct index entry was easy with the funnel-searching approach. But coming up with a funnel for doing this step so effortlessly sounds like cheating - it can't be that easy. In reality, the database is doing something exactly like that. The funnel imagination is just abstracting some B+ tree behavior from a complicated technical implementation to an easy-to-understand mental image. It is a simple way to decide whether a specific multi-column index would fit a query well.
The Ordering Is Important
You will find the incorrect advice repeated again and again that your multi-column index should start with the most selective (most different values) column first.
The former index has been rebuilt by this rule on the columns lastname
, firstname
, country
(Fig. C).
You can see that the number of funnel steps to get to the index entry is still the same - just ordering them by selectivity doesn't make a difference.
But the ordering is still essential as an index has to fulfill your querying needs. If all your queries use all the index columns, the order won't make a difference (except for the next principle in the following chapter). But usually, most of your queries will use a subset of the indexes columns, while a tiny fraction will use all. In these cases, the ordering is critical!
If you want to find all contacts from the United States (WHERE country = 'US'
), neither the initial index on firstname
, lastname
and country
(Fig. B) nor the one ordered by selectivity on lastname
, firstname
, country
(Fig. C) can be used.
Remember, an index is used from left to right because of the funneling approach.
The country
column would have to be the first one in the index to use the funnel - having them at the end does not help.
The correct approach to ordering index columns is to cover as many distinct queries as possible.
The index of Fig. D on country
, lastname
and firstname
is a perfect match if you execute these statements:
SELECT * FROM contacts WHERE country = 'US'
SELECT * FROM contacts WHERE country = 'US' AND lastname = 'Walker'
SELECT * FROM contacts WHERE country = 'US' AND lastname = 'Walker' AND firstname = 'James'
There is no generic rule to follow for the ordering of index columns. A multi-column index is always built with the left-to-right rule to fit the funnel approach for as many queries as possible - not only for the one you optimize currently.
Skipping a Column
There is another common misunderstanding in addition to the ordering myth explained before.
It is believed that a condition like firstname = 'James' AND country = 'US'
won't use the index (firstname, lastname, country)
.
The database can use the firstname
of the funnel but then can't continue with the next step as a condition on the last name is not used.
With a multi-column index, the database can't skip any funnel steps to jump to the matching index entries (Index Access Principle 1: Fast Lookup)!
However, the index will still be used by the database but is less efficient than a perfect one on (firstname, country, lastname) or (firstname, country)
.
The steps involved are more complex (Fig. E):
- The database will use as many funnel steps as possible (without skipping a column) to narrow the potential index entries that could match the query. So in this example, there will be a fast lookup to the first index entry of James by using only the first column of the index (Index Access Principle 1: Fast Lookup).
- It will then proceed by iterating all index entries for James (Index Access Principle 2: Scan in One Direction).
-
Each index entry will be validated whether they match the
country = 'US'
condition. Matching ones (green) are used, while non-matching ones (red) are discarded.
This procedure has to iterate over five index entries in this example and keep only two. While the perfect index would have been able to do a fast lookup on the first matching index entry and select both by scanning forwards. The difference may not be important for this small example. But your index may have hundreds of thousands of index entries for James in an actual application. Iterating over all of them to keep only the matching ones is much slower than directly finding the correct index entries when having a fitting funnel.
Although this approach is slower than the perfect index, it is still faster than a single-column index on (firstname)
.
Again, a single funnel step and iteration over the five index entries would be done.
However, the iteration on the multi-column index can filter on the country column within the index and only load the two table rows that match the condition.
The single-column index can't do this pre-filtering and has to load all five rows from the table to filter on the table's country
column.
The single column-index has to load more rows from the table which you want to avoid for best performance.
Remember, for real applications, there could be thousands of rows for James
but only a few would match the condition.
Overlapping Indexes
You may have overlapping indexes in your tables like the last index on (country, lastname, firstname)
and e.g. a single-column index on (country)
.
Both indexes can be used to search for all contacts from the United States, while the multi-column index can be used for more queries to further narrow down that selection.
As indexes must be changed on every table modification, you should remove the single-column index in this example.
You don't get any benefits by keeping it - the multi-column index speeds up the same queries.
This advice is also valid when comparing two multi-column indexes when their ordering is the same but one has more columns included than the other, e.g. (country, lastname)
can be removed in favor of (country, lastname, firstname)
.
But based on the left-to-right rule, the indexes on (country, lastname, telephone)
and (country, lastname, email)
are different because one is not a strict superset of the other.