Appendix

Lies, Damned Lies, and Benchmarks

Every benchmark is flawed. It's impossible to create one that applies to everyone, because databases can be used in many different ways. As a result, you shouldn't trust any benchmark and always run your own tests with your own data and queries that match your specific situation. The best a generic benchmark can strive for is to be as fair and balanced as possible - which is what I did.

My goal with the course is to show you how easily you can swap out your current struggling database for TimescaleDB without having to learn a whole new way of doing things. That's what I love about TimescaleDB: Just a few queries and you're done. Pure magic. So in this benchmark, I'll compare TimescaleDB to MySQL and PostgreSQL, which are the two most widely used open-source databases. If you're interested in seeing how TimescaleDB stacks up against other analytical databases, you'll need to look elsewhere. There are plenty of ones to explore but they are all not as easy to use.

Why a New Benchmark?

Only two good benchmarks compare TimescaleDB to MySQL/PostgreSQL. And both of them have some problems - as I believe:

ClickBench: This benchmark, created by ClickHouse, scans a large table with almost 100 million rows without any indexes. This isn't a realistic example of a database schema used in a real application. All it's really testing is a database's ability to rapidly scan through a vast number of rows in a brute-force style.
RTABench: This benchmark, created by TigerData (the makers of TimescaleDB), is designed to test real-time analytics and addresses the limitations of ClickBench. One problem I have with it is that almost half the analytical queries involve joins. I've rarely seen joins used in analytical queries in real-world scenarios, although others may have different experiences.

The Setup

The Server & Database Configuration

I've used c6a.4xlarge instances on AWS with a 500GB gp3 EBS volume, which is the standard in the database community for benchmarking. This machine has 16 vCPUs, 32GB of memory and can handle 3,000 IOPS. To ensure a fair comparison, I made some common-sense configuration tweaks to all databases (avoiding any little-known tricks):

MySQL 8.4 (LTS): I've turned on the innodb-dedicated-server setting so MySQL automatically configures itself to use all resources of the server. I've also increased join_buffer_size, max_heap_table_size, sort_buffer_size and tmp_table_size to 64MB each. These values are probably too high for a production MySQL server as you risk crashing it with out-of-memory errors but nobody should be able to say that MySQL was not tuned enough. This configuration mimics a typical, slightly-optimized MySQL server rather than one that's been highly optimized.
PostgreSQL 17.6: I've filled in the hardware specs into PGTune and copied their generated configuration. Again, this is a basic tuned configuration to use all server resources without any deeper config tuning knowledge.
TimescaleDB 2.22.0 (PostgreSQL 17.6): I've used the timescaledb-tune command which is part of TimescaleDB to auto-tune the configuration. The generated values are almost identical to the PGTune configuration with some additional TimescaleDB settings.

The Dataset

I was unable to find a dataset that perfectly fit my benchmark needs. Building one from scratch wasn't an option, as I would have produced unrealistic data distributions and therefore an unrealistic benchmark. After considering my options, I've chosen to use the ClickBench dataset, which provides information about website visits: This dataset is a single table covering all possible information about website visitors with nearly 100 million rows (99,997,497 to be precise) in the period from 2013-07-01 20:00:00 to 2013-07-31 19:59:59. The CSV file is quite large with 80.95 GB and 105 columns.

However, I had to make some changes to the data to make it more realistic:

Adding Partitioning: You usually query a specific time range rather than all rows in a time-series application. To optimize for this, I've partitioned the table by day (into 31 parts). Now the databases can just ignore partitions with irrelevant dates.
Adding Multi-Tenancy: The original dataset is intended to be entirely scanned to produce results. However, most real-world applications have some concept of multi-tenancy and you typically only need data for one tenant. To mimic this, the data is divided into ten tenants with only one used by the benchmark. Though this reduced the queried data size by 90% to around 10 million rows (9,996,017 exactly).
Sorting by `EventTime`: An application would immediately save website visits to the database as they occur. To simulate this expected behaviour, data was sorted by the visit's timestamp before filling the table.
Adding Indexes: The original ClickBench benchmark didn't use any indexes which is unrealistic. So I added indexes that perfectly match the filters for all queries in the benchmark. Keep in mind that having such perfect indexes is not realistic for real-world applications, as it would require creating a large number of indexes to cover all possible filter combinations. However, I wanted to maximize the speed of the relational databases. This approach benefits MySQL and PostgreSQL.

The TimescaleDB Hypertables

A TimescaleDB hypertable can be configured in many ways, and I've tried to predict the most common approach used in real-world applications. However, this means the benchmark isn't optimized for maximum performance. Instead, it's intended to reflect real-world usage.

I've activated the TimescaleDB Columnstore (Hypercore) as you would typically do: I set the segmentation to the `Tenant` column and the sorting to `EventTime` - no specific tuning applied. I also limited the columnstore to exclude the newest day (`2013-07-31`), since it would still be being inserted into a real application and most hypertables are likely configured this way. As a result, queries that include the last day won't benefit from the columnstore's improved performance. This approach benefits MySQL and PostgreSQL.

For all queries, a TimescaleDB Continuous Aggregate (incremental materialized view) has been created to optimize performance. To make this more realistic, the continuous aggregates have been defined with a one-hour time range, even though the queries group data by days. This setup would allow reuse of the continuous aggregates for drilling down into a specific day, although it's not used in the benchmark. The trade-off is a slight loss of performance, but it's a more realistic use case. This approach benefits MySQL and PostgreSQL.

I've also added TimescaleDB Realtime Continuous Aggregates that don't pre-summarize the last hour of data. Some applications can show slightly delayed data and some must show live data. It's impossible to predict what you need. So these realtime continuous aggregates always calculate live data from the hypertable between '2013-07-31 19:00:00' and '2013-07-31 19:59:59', and then merge it with the pre-summarized data from previous hours. Also, keep in mind that the last day isn't stored in a TimescaleDB columnstore, so all rows must be fetched completely unoptimized like PostgreSQL does. This approach benefits MySQL and PostgreSQL.

The Test Procedure

I've decided to run five different queries with increasing complexity. I've spent a week and a half on this benchmark, and that's enough for now.

I focused the benchmark on hot run performance where all data is already in memory. This skips the slower initial query run, which would be very hard to benchmark accurately. This approach favors MySQL and PostgreSQL because their on-disk size is much larger than TimescaleDB's. Additionally, they cannot selectively read only the needed columns from disk, unlike TimescaleDB's columnstore.

To warm up the cache, all queries are executed five times. Then, each query was executed 11 times and the median runtime used as the result. I reran tests to ensure reliable results when at least one of the benchmark runs differed significantly from the median performance.

The Results

The following results compare PostgreSQL and MySQL to different configurations of TimescaleDB (as explained before):

TimescaleDB Columnstore: The data has been stored into a compressed columnstore.
TimescaleDB Continuous Aggregate: The data has been stored into a compressed columnstore and a automatic precomputation for the queries in the benchmark has been activated..
TimescaleDB Realtime Continuous Aggregate: The same as above, but the last hour of data isn't precomputed so the query results are always up-to-date and can't return stale data after new inserts.

Data Size

As expected, MySQL and PostgreSQL have similar table sizes after importing the dataset. Interestingly, these sizes are very close to the 80.95 GB CSV file. TimescaleDB achieves a 60% reduction in size, which is a decent compression ratio, although I've seen much better results in other projects. The anonymization process in the ClickBench dataset made compression more challenging by transforming long strings such as URLs, page titles and referers to mostly random values.

1st Query

SELECT COUNT(*) FROM hits WHERE Tenant = 1;

-- Continuous Aggregate Definition:
CREATE MATERIALIZED VIEW hits_hourly WITH (timescaledb.continuous) AS
SELECT TIME_BUCKET('1 hour', EventTime) AS hour, Tenant, COUNT(*) AS cnt
FROM hits
GROUP BY hour, Tenant;

-- Continuous Aggregate Query:
SELECT SUM(cnt) from hits_hourly WHERE Tenant = 1

SELECT COUNT(*) FROM hits WHERE Tenant = 1;

This query is straightforward because the database only needs to scan the index for the tenant and count the entries. It provides a baseline to estimate the performance cost of handling large operations on 10 million rows.

The results confirm that MySQL is not suitable for statistical queries (even simple ones) because a single thread must execute the whole query and examine all the data. On the other hand, PostgreSQL's performance is acceptable given it has to count ten million rows by splitting the task among six workers operating in parallel.

I didn't expect TimescaleDB to outperform the others yet at this point, given the query's simplicity. However, CPU vectorization instructions operating on many values at once rather than processing them individually (like MySQL and PostgreSQL) resulted in a big speed boost. The runtime is nearly instant, thanks to pre-calculating results either completely (Continuous Aggregate) or by processing only the freshest data and merging it with pre-calculated results for older data (Realtime Continuous Aggregate).

2nd Query

SELECT TIME_BUCKET('1 hour', EventTime) AS hour, COUNT(*) FROM hits WHERE Tenant = 1 GROUP BY hour ORDER BY hour ASC;

-- Continuous Aggregate Definition:
-- (reuse of hits_hourly from before!)

-- Continuous Aggregate Query:
SELECT hour, cnt FROM hits_hourly WHERE Tenant = 1 ORDER BY hour ASC

SELECT DATE_TRUNC('hour', EventTime) AS hour, COUNT(*) FROM hits WHERE Tenant = 1 GROUP BY hour ORDER BY hour ASC

SELECT DATE_FORMAT(EventTime, '%Y-%m-%d %H:00:00') AS hour, COUNT(*) FROM hits WHERE Tenant = 1 GROUP BY hour ORDER BY hour ASC

The previous query has been refined to calculate hourly counts, rather than the entire range. This should now test calculation overhead more accurately, going beyond just walking the index but its still an index-only query.

As expected, runtime increases as more work must be done - except for the Continuous Aggregates which pre-calculated everything. I don't know why the TimescaleDB columnstore query is slower than PostgreSQL. But hey, at least it's an honest benchmark showing that another database could be faster.

3rd Query

SELECT AVG(windowclientwidth) FROM hits WHERE Tenant = 1 and IsMobile = 0 and EventTime >= '2013-07-24 00:00:00';

-- Continuous Aggregate Definition:
CREATE MATERIALIZED VIEW hits_windowclientwidth WITH (timescaledb.continuous) AS
SELECT TIME_BUCKET('1 hour', EventTime) AS hour, Tenant, IsMobile, STATS_AGG(windowclientwidth) AS windowclientwidth
FROM hits
GROUP BY hour, Tenant, IsMobile;

-- Continuous Aggregate Query:
SELECT AVERAGE(ROLLUP(windowclientwidth)) FROM hits_windowclientwidth WHERE Tenant = 1 AND IsMobile = 0 AND hour >= '2013-07-24 00:00:00'

SELECT AVG(windowclientwidth) FROM hits WHERE Tenant = 1 and IsMobile = 0 and EventTime >= '2013-07-24 00:00:00'

This query is the first one needing an extra column from the table (windowclientwidth), while the columns for filtering (Tenant, IsMobile and EventTime) are covered by the index. It's also worth noting that the query is now only handling 1,797,511 rows because of the additional filters.

The runtime aligns with expectations based on the performance differences observed with the previous queries. However, I noticed that MySQL used an unfitting (and slower) index instead of the specialized one I created for this query. So I've forced it to use the correct one for these benchmark results which resulted in two seconds less runtime. This approach favors MySQL.

4th Query

SELECT RefererCategoryID, Income, COUNT(*) FROM hits WHERE Tenant = 1 AND EventTime BETWEEN '2013-07-21 00:00:00' AND '2013-07-21 23:59:59' AND RefererCategoryID > 0 GROUP BY RefererCategoryID, Income ORDER BY COUNT(*) DESC LIMIT 50;

-- Continuous Aggregate Definition:
CREATE MATERIALIZED VIEW hits_referers WITH (timescaledb.continuous) AS
SELECT TIME_BUCKET('1 hour', EventTime) AS hour, Tenant, RefererCategoryID, Income, COUNT(*) AS cnt
FROM hits
GROUP BY hour, Tenant, RefererCategoryID, Income;

-- Continuous Aggregate Query:
SELECT RefererCategoryID, Income, SUM(cnt) FROM hits_referers WHERE Tenant = 1 AND hour BETWEEN '2013-07-21 00:00:00' AND '2013-07-21 23:59:59' AND RefererCategoryID > 0 GROUP BY RefererCategoryID, Income ORDER BY SUM(cnt) DESC LIMIT 50

SELECT RefererCategoryID, Income, COUNT(*) FROM hits WHERE Tenant = 1 AND EventTime BETWEEN '2013-07-21 00:00:00' AND '2013-07-21 23:59:59' AND RefererCategoryID > 0 GROUP BY RefererCategoryID, Income ORDER BY COUNT(*) DESC LIMIT 50

This query makes the database's job even harder, as it now has to group the rows, sort them, and pick the best ones. The index still handles the filtering, producing 1,122,373 rows, but the database must load the columns needed for grouping from the table.

Again, the performance is as expected by the previous queries. And again MySQL had to be forced to use the correct index. What are you doing MySQL? You had one job and you did it wrong... This approach favors MySQL.

5th Query

SELECT RegionId, COUNT(DISTINCT DATE_TRUNC('day', EventTime)::text || UserID) AS users FROM hits WHERE Tenant = 1 AND EventTime BETWEEN '2013-07-01 00:00:00' AND '2013-07-05 23:59:59' GROUP BY RegionId ORDER BY users DESC LIMIT 10;

-- Continuous Aggregate Definition:
CREATE MATERIALIZED VIEW hits_regions WITH (timescaledb.continuous) AS
SELECT TIME_BUCKET('1 day', EventTime) AS day, Tenant, RegionID, COUNT(DISTINCT UserId) AS users
FROM hits
GROUP BY day, Tenant, RegionID;

-- Continuous Aggregate Query:
SELECT RegionID, SUM(users) AS users FROM hits_regions WHERE Tenant = 1 AND day BETWEEN '2013-07-01' AND '2013-07-05' GROUP BY RegionID ORDER BY users DESC LIMIT 10

SELECT RegionId, COUNT(DISTINCT DATE_TRUNC('day', EventTime)::text || UserID) AS users FROM hits WHERE Tenant = 1 AND EventTime BETWEEN '2013-07-01 00:00:00' AND '2013-07-05 23:59:59' GROUP BY RegionId ORDER BY users DESC LIMIT 10

SELECT RegionId, COUNT(DISTINCT CONCAT(DATE_FORMAT(EventTime, '%Y-%m-%d'), UserID)) AS users FROM hits WHERE Tenant = 1 AND EventTime BETWEEN '2013-07-01 00:00:00' AND '2013-07-05 23:59:59' GROUP BY RegionId ORDER BY users DESC LIMIT 10

This query is the most demanding one in the benchmark. The only way for a database to execute a COUNT DISTINCT query is to keep track of every value it finds to ignore duplicates. This type of query is very resource-intensive!

MySQL and PostgreSQL now perform similarly because the COUNT DISTINCT calculation is so resource-intensive that MySQL's overhead no longer makes a difference. And the TimescaleDB Columnstore is only 30% faster than both of them because the heavy COUNT DISTINCT computation reduced the columnstore performance gain significantly. Once again, pre-calculating results with Continuous Aggregates proves to be a good solution, as it shifts the slow query execution time to a one-time operation when the rows are pre-summarized.

The Conclusion

When you combine the previous runs into one number, you'll see these impressive performance numbers. Keep in mind that the benchmark setup was intentionally configured to favor PostgreSQL and MySQL to be as fair as possible and all data was cached in memory! With disk access, the numbers for MySQL and PostgreSQL are much much higher and TimescaleDB would shine much more as it would only load the needed columns.

TimescaleDB Continuous Aggregates:
- 1,818x faster than PostgreSQL
- 3,769x faster than MySQL
TimescaleDB Realtime Continuous Aggregates:
- 466x faster than PostgreSQL
- 966x faster than MySQL
TimescaleDB Columnstore:
- 6x faster than PostgreSQL
- 13x faster than MySQL

As always, please run your own benchmarks! But it's hard to beat TimescaleDB's (Realtime) Continuous Aggregates, since the heavy computations are already done and you get the results for now free. You can also try building this yourself with MySQL and PostgreSQL, but it's a lot of work that's better spent developing your application and hard to make performant if you can't hook deep into the database.