Tables storing historical data like audit logs or saved webhooks will drastically increase in size over time. But deleting old rows will be slow and the disk space will only be reclaimed when rebuilding them. This can take hours with big tables. But the cleanup workflow can be optimized by transparently splitting those tables into many smaller (e.g. monthly) partitions. Dropping an entire partition is very fast because just the linked partition file on disk will be removed, reclaiming the used disk space.
CREATE TABLE audits ( -- more columns... created_at timestamp NOT NULL ) PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) ( PARTITION audits_2023_01 VALUES LESS THAN (UNIX_TIMESTAMP('2023-02-01 00:00:00')), PARTITION audits_2023_02 VALUES LESS THAN (UNIX_TIMESTAMP('2023-03-01 00:00:00')), PARTITION audits_2023_03 VALUES LESS THAN (UNIX_TIMESTAMP('2023-04-01 00:00:00')), PARTITION audits_2023_04 VALUES LESS THAN (UNIX_TIMESTAMP('2023-05-01 00:00:00')) ); ALTER TABLE audits DROP PARTITION audits_2023_01;
CREATE TABLE audits ( -- more columns... created_at timestamptz NOT NULL ) PARTITION BY RANGE (created_at); CREATE TABLE audits_2023_01 PARTITION OF audits FOR VALUES FROM ('2023-01-01 00:00:00+00') TO ('2023-01-31 23:59:59+00'); CREATE TABLE audits_2023_02 PARTITION OF audits FOR VALUES FROM ('2023-02-01 00:00:00+00') TO ('2023-02-28 23:59:59+00'); CREATE TABLE audits_2023_03 PARTITION OF audits FOR VALUES FROM ('2023-03-01 00:00:00+00') TO ('2023-03-31 23:59:59+00'); CREATE TABLE audits_2023_04 PARTITION OF audits FOR VALUES FROM ('2023-04-01 00:00:00+00') TO ('2023-04-30 23:59:59+00'); ALTER TABLE audits DETACH PARTITION audits_2023_01 CONCURRENTLY; DROP TABLE audits_2023_01;
Any application used for some years will grow linear or exponentially in data size. You get more users, they do more within your application and your data is growing and growing. With the constant size increase, you are experiencing new problems: Your backup process is becoming much slower, changing the table structure or adding indexes takes forever and much more you can experience.
After some time, you decide to delete old historical to free up some space by deleting data that is no longer needed to make the big database more manageable. Good cleanup candidates are audit tables, any logging information or saved webhooks. You won't need them anymore if they were created a few months ago. But just deleting hundreds of thousands of rows has several problems:
VACUUM FULLon PostgreSQL or
OPTIMIZE TABLEon MySQL to completely rebuild the table to shrink it. But the exclusive lock by these operations will block any access with big tables for a very long time.
These drawbacks are unacceptable for most applications. So databases added a solution to many widespread problems including all stated ones. With table partitioning, one big table is split transparently into many smaller partitions. In our example, we've set up monthly partitions for January to April 2023. All queries are still executed on the parent table but the database will re-route them transparently to the affected partitions without any effort. From a SQL perspective, there is no difference between one big table and the partitioned one. You are still executing select or insert queries on the main table.
But with the partitioned approach, we can delete an entire month's data by dropping one table partition instead of deleting specific rows. The database file of that partition will be removed from the filesystem with a single fast operation and won't leave back over any wasted storage space. If you have different retain and cleanup rules you can also create partitions of different time ranges like quarters of weeks
SQL for Devs Founder
Be notified on future content. Never spam.