Using TimescaleDB

Hypertables

TimescaleDB is just an extension that turns PostgreSQL into a powerful analytics database: You don't have to learn about various new storage engines with different behaviors to choose from or use a completely new query language. And you're even not limited by the usual restrictions of analytics databases which disallow updating or deleting any rows. You're free to change everything anytime. After all, it's just PostgreSQL!

You only need to run two queries to improve analytics queries with TimescaleDB: First, activate the extension to add new management functions. Next, use the create_hypertable function to upgrade a table to a TimescaleDB hypertable, which is a PostgreSQL table with enhanced analytics capabilities.

CREATE TABLE requests (
  id bigint NOT NULL,
  server_id int NOT NULL,
  request_id uuid NOT NULL,
  url_domain text NOT NULL,
  url_path text NOT NULL,
  geo_country char(2) NOT NULL,
  geo_region text NOT NULL,
  browser_name text NOT NULL,
  browser_version int NOT NULL,
  duration_backend float NOT NULL,
  duration_frontend float NOT NULL,
  duration_network float NOT NULL,
  duration_total float NOT NULL,
  time timestamptz NOT NULL
 );

↑ Standard Table Definition

↓ New Features

CREATE EXTENSION IF NOT EXISTS timescaledb;
SELECT create_hypertable('requests', by_range('time', INTERVAL '7 days'));

So, what is a hypertable and why do you need it? Standard tables keep growing with each new row you insert. Over time, they become a burden because of their large size when they reach many millions of rows:

SELECTs can become really slow if you didn't create specific indexes for all your queries.
INSERT queries might be significantly slower when an index reaches it threshold and must be reorganized for optimal performance.
DELETE-ing old rows is very slow and increases cpu and disk usage on the server dramatically - often affecting the performance of all other queries.

TimescaleDB resolves these issues by automatically splitting hypertables into smaller tables (called chunks) based on time: In the example code, TimescaleDB was configured to create a new chunk for every seven days of data. That's all you have to do: You still use the requests table for your queries and TimescaleDB internally rewrites the query to use those chunks - whether it's just one or multiple.

Here's an example that shows how TimescaleDB chooses the correct chunk to insert a new row into a hypertable with monthly chunks:

Chunk selection also applies to SELECT queries. The time range affects how many chunks are used - from one to all of them:

Isn't this just partitioning? Not exactly. With partitioning, you're responsible for creating partitions for new dates. If you don't add them on time, every INSERT will fail. On the other hand, TimescaleDB automatically adds new chunks when needed, so you have one less thing to worry about.

But the benefits of TimescaleDB hypertables over manual partitioning don't end here. They are the building block for many cool features (explained in the following modules): column storage for faster queries and reduced storage costs, continuous aggregates for automatic data summaries, data retention to efficiently drop old data, etc.

The open question about hypertables is always which chunking interval should be used. In general, it's best to stick with the default seven-day interval suggested by TimescaleDB unless you have a strong reason to use a different one. Some features, like data retention and the columnstore can be tuned by changing the chunking interval - more on that later. As always, there's no one-size-fits-all formula to calculate the ideal chunking interval as it depends on the behavior of your application. However, there are a few key guidelines to keep in mind:

You shouldn't exceed 1000 chunks per hypertable for optimal performance. If you have more chunks, TimescaleDB's logic to decide which chunks to use for a query may get slower. So plan ahead on how many chunks you may have in e.g. 5 years.
The total size of all indexes for the hypertables you're actively inserting data into should be under 25% of your available memory. Note that this rule only applies to chunks that haven't been converted to the columnstore format (more on that later).