Storage Management
Retention policies are a good solution to get rid of old detailed data and keep only the aggregated statistics. The problem is, you can't be sure you'll never need the detailed data again. As your product changes, you may want to aggregate data differently. And occasionally, you need to write run-once queries to analyze data in a very specific way.
That's where TimescaleDB's data tiering strategy helps: Instead of just dropping chunks, they are backed up to object storage before. This way, they're removed from your database to reduce storage size and cost, but they still exist outside in case you need them again. The key thing is, these backed-up chunks are still queryable, so it's not just a backup-before-delete solution. With tiered storage, you can optimize your storage by combining high-performance storage (your TimescaleDB database) with low-cost storage (object storage).
Tiered Storage is a feature only available in Tiger Cloud and must be activated in the UI. The feature can be activated in the Explorer tab within your service. After activation, you can create a policy that automatically moves chunks to Tiered Storage after a specified time.
All your chunks and continuous aggregates that you query often and need quick responses should be stored within your TimescaleDB database (high-performance tier). The historical chunks (of your hypertables or continuous aggregates) you no longer query can be moved to the low-cost tier. You can create many cost savings with this simple approach: For example, you can store continuous aggregate chunks in the high-performance tier for speed and move the underlying large hypertable chunks to the low-cost tier.
By default, TimescaleDB ignores tiered chunks on object storage to ensure fast queries. As a result, queries behave the same way as they would if the tiered chunks were deleted. You can, however, change your database connection to tell TimescaleDB to include tiered chunks in queries. This will result in slower performance. But the impact is limited since you only tier chunks that are no longer needed and query very infrequently.
The exclusion of tiered chunks also applies to continuous aggregates. As with data retention, you need to adjust the continuous aggregate refresh window to ignore tiered data.
So how do you create new continuous aggregates with tiered data? After enabling tiered data loading for your connection, the initial calculation includes all tiered rows. There's no special thing to do. If you've already created a continuous aggregate, you can now activate tiered data loading and refresh the tiered data range or the entire aggregate.
SET timescaledb.enable_tiered_reads = TRUE;
-- update only tiered data
CALL refresh_continuous_aggregate( 'requests_cagg', '-infinity'::timestamptz, NOW() - INTERVAL '3 months', force => TRUE);
-- update all data
CALL refresh_continuous_aggregate( 'requests_cagg', '-infinity'::timestamptz, 'infinity'::timestamptz, force => TRUE);
The cost savings of tiered storage also increase with the number of servers. With a high-availability setup consisting of two servers (one server + one failover server) and one replica, each hypertable is stored on three servers. That's paying for storage three times. But tiered storage isn't stored in the database, so it doesn't cost more when you have multiple servers. Instead, replicas (and forks) only reference the tiered chunks on object storage with no additional cost per server.
With tiered storage, you can no longer change the data in tiered chunks. This isn't usually a problem because historic data no longer needs to be updated. However, you may occasionally need to e.g. backfill some old data. To handle these situations, you can untier a chunk, make the necessary changes, and then tier it again.
-- untier chunks
DO $$
DECLARE chunk RECORD;
BEGIN
FOR chunk IN SELECT chunk_name FROM timescaledb_osm.tiered_chunks WHERE hypertable_name = 'requests'
LOOP
CALL untier_chunk(chunk.chunk_name);
END LOOP;
END $$;
-- change data
UPDATE requests SET ... WHERE ...;
-- tier them again
SELECT tier_chunk(chunk) FROM show_chunks('requests', older_than => INTERVAL '3 months') AS chunk;
To summarize, tiered storage is a great solution for moving old data to cheaper storage while keeping it accessible. For example, Tiger Data tiers over 2PB of data - a size that far exceeds what even AWS can offer for block storage.