Performance optimization: production-tested strategies

This guide is part of a collection of findings gained from community meetups. For more real world solutions and insights you can browse by specific problem. Having trouble with Materialized Views? Check out the Materialized Views community insights guide. If you're experiencing slow queries and want more examples, we also have a Query Optimization guide.

Order by cardinality (lowest to highest)

ClickHouse's primary index works best when low-cardinality columns come first, allowing it to skip large chunks of data efficiently. High-cardinality columns later in the key provide fine-grained sorting within those chunks. Start with columns that have few unique values (like status, category, country) and end with columns that have many unique values (like user_id, timestamp, session_id).

Time granularity matters

When using timestamps in your ORDER BY clause, consider the cardinality vs precision trade-off. Microsecond-precision timestamps create very high cardinality (nearly one unique value per row), which reduces the effectiveness of ClickHouse's sparse primary index. Rounded timestamps create lower cardinality that enables better index skipping, but you lose precision for time-based queries.

Focus on individual queries, not averages

When debugging ClickHouse performance, don't rely on average query times or overall system metrics. Instead, identify why specific queries are slow. A system can have good average performance while individual queries suffer from memory exhaustion, poor filtering, or high cardinality operations.

"The right way is to ask yourself why this particular query was processed in five seconds... I don't care if median and other queries process quickly. I only care about my query"

The key is understanding whether your slow query is hitting memory limits (too many unique groups) or scanning too much data (poor filtering). Here's how to diagnose each:

Identify memory-bound queries:

Identify scanning-bound queries:

Use this approach to focus your optimization efforts: if it's memory-bound, reduce cardinality or use sampling. If it's scanning-bound, improve your WHERE conditions or primary key design.

The key lesson from production teams: When a query is slow, don't just look at averages. Ask "Why was THIS specific query slow?" and examine the actual resource usage patterns.

Memory vs row scanning trade-offs

Sentry is a developer-first error tracking platform processing billions of events daily from 4+ million developers. Their key insight: "The cardinality of the grouping key that's going to drive memory in this particular situation" - High cardinality aggregations kill performance through memory exhaustion, not row scanning.

When queries fail, determine if it's a memory problem (too many groups) or scanning problem (too many rows).

ClickHouse must maintain separate aggregation states in memory for each unique combination of GROUP BY columns. Memory usage grows exponentially with the number of unique combinations, not linearly with the number of rows processed.

The critical insight: cardinality multiplies across dimensions. A query like GROUP BY user_id, error_message, url_path doesn't just track three columns - it creates a separate memory state for every unique combination of all three values together. With thousands of users, hundreds of error types, and thousands of URL paths, you could easily generate millions of aggregation states that must be held in memory simultaneously.

This is exactly where Sentry's platform hit the wall. They had billions of events with highly diverse error attributes, and traditional GROUP BY operations would exhaust available memory trying to track every unique combination before they could complete the aggregation and return results.

For extreme cases, Sentry uses deterministic sampling. A 10% sample reduces memory usage by 90% while maintaining roughly 5% accuracy for most aggregations:

This ensures the same users appear in every query, providing consistent results across time periods. The key insight: cityHash64() produces consistent hash values for the same input, so user_id = 12345 will always hash to the same value, ensuring that user either always appears in your 10% sample or never does - no flickering between queries.

Sentry's bit mask optimization: from memory explosion to bounded counters

When aggregating by high-cardinality columns (like URLs), each unique value creates a separate aggregation state in memory, leading to memory exhaustion. Sentry's solution: instead of grouping by the actual URL strings, group by boolean expressions that collapse into bit masks.

The difference is dramatic. Before: each user stores arrays of ALL unique repo names (potentially MBs per user). After: each user stores exactly 6 integers (48 bytes), regardless of how many repositories they interact with. Sentry achieved 100x memory reduction for certain query patterns with this approach.

Instead of storing every unique string in memory, you're storing the answer to questions about those strings as integers. The aggregation state becomes bounded and tiny, regardless of data diversity.

From Sentry's engineering team: "These heavy queries are more than 10x faster and our memory usage is 100x lower (and, more importantly, bounded). Our largest customers no longer see errors when searching for replays and we can now support customers of arbitrary size without running out of memory."

Video sources

Lost in the Haystack - Optimizing High Cardinality Aggregations - Sentry's production lessons on memory optimization
ClickHouse Performance Analysis - Alexey Milovidov on debugging methodology
ClickHouse Meetup: Query Optimization Techniques - Community optimization strategies

Read Next:

Order by cardinality (lowest to highest)​

Time granularity matters​

Focus on individual queries, not averages​

Memory vs row scanning trade-offs​

Sentry's bit mask optimization: from memory explosion to bounded counters​

Video sources​