Unique Indexes With Some Rows Excluded

Unique indexes can not be created in some cases because a value is duplicated by e.g. soft-deleted rows. By excluding those rows from an index, the uniqueness constraint can be properly set up.

Unique indexes are used to ensure that some values or combinations of values are unique. However, they only work when the uniqueness constraint should be enforced on all table rows because it can not be limited to ignore e.g. soft-deleted ones. But with partial indexes, supported by PostgreSQL, this is possible because a WHERE condition can control the included rows of the index. With MySQL, the behaviour to exclude soft-deleted rows in the uniqueness constraint needs to be emulated.

Usage

MySQL

CREATE UNIQUE INDEX users_email_uniq ON users (
  email,
  (IF(deleted_at, NULL, 1))
);

PostgreSQL

CREATE UNIQUE INDEX users_email_uniq ON users (
  email
) WHERE deleted_at IS NULL;

Detailed Explanation

Unique indexes are a great mechanism to ensure values or combinations of values exist in a table only once, as business requirements demand. However, in some cases, these requirements appear impossible to enforce. How should e.g. unique user email addresses be assured when e.g. some rows are marked as deleted? An index on email will not work as many rows possibly have that value, but only a few should be considered for the uniqueness condition. The requirement to make some values unique only when other columns have specific values is very common.

A mostly unknown subclass of database indexes is partial indexes. Instead of building an index on all rows, the WHERE deleted_at IS NULL condition can restrict the included ones. That way, only the email addresses of non-deleted users will be added to the unique index, and no duplicate value error will be thrown because of a reused email address from a deleted user. Additionally, as the rows are limited, the index will be smaller because only the selected ones are included instead of all table rows.

Partial indexes are simple and easy to use when PostgreSQL is used, but MySQL does not support them. So for MySQL, that behavior needs to be emulated with its functionality. That first appears to be impossible but becomes practicable when a specific property is taken into account: In databases, null values cannot be compared to anything, which means two null values will always be different. By relying on that approach, rows that would be ignored in a partial index will use a computed null value in the index, while rows that should be ignored use a simple placeholder value. The IF(deleted_at, NULL, 1) expression, while looking cryptic, is achieving that:

  • When deleted_at IS NULL the value 1 is used. Therefore multiple non-deleted rows with the foo@example.com email addresses will both result in the index value (foo@example.com, 1) leading to a duplicate value error.
  • When deleted_at IS NOT NULL a NULL value will be used. The index value of (foo@example.com, NULL) is defined as being different to another (foo@example.com, NULL) index value and also (foo@example.com, 1) . Therefore the row will not conflict with any other row and behaves like being excluded from the uniqueness constraint. However, compared to a partial index, it is still part of the index and takes up space.

Additional Resources

Author Tobias Petry

Tobias Petry

SQL for Devs Founder


Are you interested to learn more?

Be notified on future content. Never spam.