ClickStack and Prometheus solve observability differently. Prometheus is a purpose-built metrics engine with a pull model, a custom time-series database, and PromQL. ClickStack is a unified observability platform that stores logs, traces, metrics, and session replays in ClickHouse and queries everything with SQL.
We ran both in production Kubernetes environments — Prometheus with the Grafana stack (Loki, Tempo, Mimir) for full observability coverage, and ClickStack as a single platform handling all signals. This post covers what we found across architecture, performance, cost, operations, and the scenarios where each one wins.
If you are evaluating observability platforms for a new deployment, this comparison gives you the practical details you need to make the right call.
Architecture: Fundamentally Different Approaches
The core difference is not feature lists — it is how each system thinks about data.
Prometheus Architecture
Prometheus uses an HTTP pull model to scrape metrics from targets at fixed intervals. It stores data in its own purpose-built time-series database (TSDB), where each unique label combination creates a distinct “series” that lives as a separate object in memory and on disk.
Targets (exporters) ← Prometheus (scrape) → TSDB → PromQL → Grafana
→ Alertmanager
For full observability, Prometheus is just one piece. You also need:
- Loki for logs (separate storage, LogQL query language)
- Tempo for traces (separate storage, TraceQL query language)
- Mimir for long-term metrics storage and multi-tenancy
- Grafana for visualisation and dashboards
- Alertmanager for alert routing
That is five or six stateful systems, three query languages, and independent scaling requirements for each.
ClickStack Architecture
ClickStack takes a unified backend approach. All four observability signals — logs, traces, metrics, and session replays — are stored in a single ClickHouse instance with optimised schemas per signal type.
Apps (OTel SDK) → OTel Collector (push) → ClickHouse → HyperDX UI
The stack has three core components:
- ClickHouse — columnar OLAP database for all telemetry storage
- OpenTelemetry Collector — ingests data via OTLP (gRPC and HTTP)
- HyperDX — UI for search, dashboards, alerts, and trace exploration
MongoDB stores application state (dashboards, users, config), but it handles no telemetry data.
What This Means in Practice
| Aspect | Prometheus + Grafana Stack | ClickStack |
|---|---|---|
| Systems to operate | 5–6 stateful services | 2–3 (ClickHouse, HyperDX, MongoDB) |
| Query languages | PromQL, LogQL, TraceQL | SQL + Lucene |
| Data collection | Pull (scrape) | Push (OTLP) |
| Signal correlation | UI-level linking | SQL joins across signals |
| Data model | Series identity at write time | Rows and columns, identity at query time |
The Prometheus approach gives you battle-tested, purpose-built tools for each signal type. The ClickStack approach gives you one database to manage and one query language to learn. Both have trade-offs, and the right choice depends on your team and scale.
Metrics: Pull vs Push, PromQL vs SQL
This is where the comparison gets most interesting, because metrics are Prometheus’s core strength.
How Prometheus Handles Metrics
Prometheus scrapes targets every 15–30 seconds, collecting counter, gauge, histogram, and summary metric types. Each unique combination of metric name and labels creates a time series. The TSDB stores these as compressed chunks on disk.
PromQL is designed specifically for time-series analysis:
# Average request latency over 5 minutes by service
rate(http_request_duration_seconds_sum[5m])
/ rate(http_request_duration_seconds_count[5m])
# 99th percentile latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
PromQL is concise, expressive, and optimised for the kinds of queries you run on metrics. If your team already knows PromQL, it is hard to beat for metrics-specific workflows.
How ClickStack Handles Metrics
ClickStack receives metrics through the OpenTelemetry Collector, which can scrape Prometheus endpoints using the prometheus receiver or accept metrics pushed via OTLP. Metrics are stored in ClickHouse tables with compression codecs and optimised ordering.
The same queries in SQL:
-- Average request latency over 5 minutes by service
SELECT
ServiceName,
toStartOfFiveMinutes(Timestamp) AS interval,
sum(Value) / count() AS avg_latency
FROM otel_metrics
WHERE MetricName = 'http_request_duration_seconds'
AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY ServiceName, interval
ORDER BY interval;
-- Join metrics with traces to find slow endpoints
SELECT
m.ServiceName,
t.SpanName,
avg(m.Value) AS avg_latency,
count(t.TraceId) AS trace_count
FROM otel_metrics m
LEFT JOIN otel_traces t ON m.ServiceName = t.ServiceName
AND toStartOfMinute(m.Timestamp) = toStartOfMinute(t.Timestamp)
WHERE m.MetricName = 'http_request_duration_seconds'
GROUP BY m.ServiceName, t.SpanName
ORDER BY avg_latency DESC;
SQL is more verbose for basic metrics queries, but it unlocks something Prometheus fundamentally cannot do: joining metrics with traces and logs in a single query. That second query above — correlating slow metrics with their corresponding traces — requires manual UI navigation in the Prometheus/Grafana stack.
Verdict on Metrics
If your team lives in PromQL and your primary use case is metrics alerting and dashboards, Prometheus is more efficient for that specific workflow. If you need cross-signal analysis or your team already knows SQL, ClickStack removes the barrier between metrics and everything else.
High Cardinality: Where the Architecture Diverges
High cardinality — metrics with millions of unique label combinations — is the scenario that exposes the deepest architectural difference between these two systems.
Prometheus and Cardinality
Prometheus treats each unique label combination as a distinct series, created at write time. Every series costs approximately 3–4 KB of memory for metadata, symbol table entries, and posting list entries in the inverted index.
The maths is harsh:
| Active Series | Memory Overhead (metadata only) |
|---|---|
| 100,000 | ~300 MB |
| 1,000,000 | ~3–4 GB |
| 10,000,000 | ~30–40 GB |
This is before storing actual sample values. At 10 million series, Prometheus needs 30–40 GB of RAM just for metadata. Add samples, query buffers, and WAL, and you are looking at significantly more.
When cardinality spikes — a deployment creates thousands of new pods, a label includes user IDs, or a service emits ephemeral request-scoped metrics — Prometheus can OOM and crash. This takes down your monitoring during the exact moment you need it most.
Sharding across multiple Prometheus instances does not eliminate the problem. Distributing 10 million series across 10 shards leaves 1 million series per shard with identical per-series overhead.
ClickStack and Cardinality
ClickHouse stores data as columns without per-series identity. Series emerge only at query time through GROUP BY operations. Ingestion has no per-identity overhead — a row is a row regardless of how many unique label combinations exist.
| Active Series Equivalent | ClickHouse Ingest Memory |
|---|---|
| 100,000 | ~10 MB |
| 1,000,000 | ~100 MB |
| 10,000,000 | ~1 GB |
The trade-off: ClickHouse defers the cardinality cost to query time. A GROUP BY user_id across millions of distinct values requires proportional aggregation memory. The difference is that a bad query kills one query, not your entire monitoring system.
ClickHouse also brings architectural advantages for high-cardinality data:
- Columnar storage — Low-cardinality columns compress 10–100x; high-cardinality columns do not bloat other columns
- Vectorized execution — Processes thousands of values per CPU cycle
- LowCardinality type — Dictionary-encodes columns with fewer than ~10K distinct values
- Sparse index — Min/max indexes per 8,192-row granule enable efficient granule skipping
Verdict on Cardinality
If your metrics have bounded, predictable cardinality (under 1 million series), Prometheus handles this well. If you are dealing with high-cardinality data — user IDs, request IDs, container-level metrics in large Kubernetes clusters — ClickStack handles the ingestion gracefully where Prometheus struggles.
For a deeper look at how this affects Kubernetes monitoring specifically, see our guide to Prometheus monitoring on Kubernetes.
Storage, Retention, and Cost
Long-term storage is one of Prometheus’s well-known limitations and one of ClickStack’s strongest advantages.
Prometheus Storage
Prometheus defaults to 15 days of retention. Its TSDB uses Gorilla compression for timestamps and XOR encoding for values, which works well for long-lived series but poorly for ephemeral ones (short-lived pods, containers).
For longer retention, you need an additional system:
- Thanos — Adds S3/GCS object storage, compaction, and global query federation
- Cortex/Mimir — Multi-tenant long-term storage with deduplication
- VictoriaMetrics — Drop-in long-term storage alternative
Each of these adds operational complexity, another system to maintain, and its own failure modes.
ClickStack Storage
ClickHouse stores all data natively with configurable TTL per signal type. The columnar format with Delta and ZSTD compression achieves strong compression ratios:
| Signal | Compression Ratio | Recommended TTL |
|---|---|---|
| Metrics | 10–20x | 90 days |
| Logs | 8–15x (varies by content) | 14–30 days |
| Traces | 10–15x | 30 days |
| Session replays | 5–10x | 7 days |
For cold storage, ClickHouse supports tiered storage policies — keep recent data on SSD, move older data to S3 automatically. No separate system required.
Cost Comparison
For a mid-scale production environment (~200 microservices, ~50 GB/day total telemetry):
| Component | Prometheus + Grafana Stack | ClickStack |
|---|---|---|
| Metrics storage | Prometheus + Thanos/Mimir | ClickHouse (included) |
| Log storage | Loki | ClickHouse (included) |
| Trace storage | Tempo | ClickHouse (included) |
| Visualisation | Grafana | HyperDX (included) |
| Systems to operate | 5–6 | 2–3 |
| Estimated infra cost | $1,200–2,000/month | $500–800/month |
| Engineering overhead | 3 query languages, 5+ config formats | SQL + YAML |
The cost difference is driven primarily by operational consolidation. Running fewer stateful systems means fewer nodes, less memory, and less engineering time spent on maintenance. Teams running the Prometheus/Grafana stack at mid-scale companies report approximately $450K annually when including engineering overhead.
Kubernetes Deployment
Both systems are Kubernetes-native, but the deployment footprint is different.
Prometheus on Kubernetes
The standard approach uses the kube-prometheus-stack Helm chart, which deploys:
- Prometheus server (StatefulSet)
- Alertmanager (StatefulSet)
- Grafana (Deployment)
- Node exporter (DaemonSet)
- kube-state-metrics (Deployment)
- Prometheus Operator (Deployment)
Add Loki, Tempo, and Mimir for full observability and you are managing 8+ StatefulSets/Deployments with independent PVCs, scaling, and configuration.
helm install kube-prometheus prometheus-community/kube-prometheus-stack
helm install loki grafana/loki-stack
helm install tempo grafana/tempo
This is battle-tested. The Prometheus Operator with ServiceMonitor CRDs makes target discovery clean. The ecosystem of exporters covers virtually every technology. But operational burden is real — we have seen teams spend 20–30% of their platform engineering time maintaining the monitoring stack itself.
ClickStack on Kubernetes
ClickStack deploys via a single Helm chart:
helm repo add clickstack https://clickhouse.github.io/ClickStack-helm-charts
helm install my-clickstack clickstack/clickstack
This provisions ClickHouse, HyperDX, the OTel Collector, and MongoDB. For production, you can run ClickHouse externally using the Altinity operator with sharding and replication.
The OTel Collector runs as a DaemonSet to collect node-level logs and metrics. Application instrumentation uses OpenTelemetry SDKs pointing to the collector’s OTLP endpoints.
For a step-by-step Kubernetes deployment guide, see our ClickStack setup guide.
Deployment Comparison
| Aspect | Prometheus + Grafana Stack | ClickStack |
|---|---|---|
| Helm charts | 3–4 separate charts | 1 chart |
| StatefulSets | 3–5 (Prometheus, Alertmanager, Loki, Tempo, Mimir) | 1–2 (ClickHouse, MongoDB) |
| DaemonSets | 1 (node-exporter) | 1 (OTel Collector) |
| PVCs to manage | 5–8+ | 2–3 |
| CRDs | ServiceMonitor, PodMonitor, PrometheusRule | None |
| Service discovery | Prometheus Operator CRDs | OTel Collector config |
Alerting and Dashboards
Prometheus Alerting
Prometheus uses recording rules and alerting rules defined in YAML, processed by Alertmanager for deduplication, grouping, and routing:
groups:
- name: slo-alerts
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate exceeds 1% SLO"
This is mature, well-understood, and integrates with PagerDuty, Slack, OpsGenie, and dozens of other notification channels. Grafana adds dashboarding with hundreds of community-maintained dashboard templates.
ClickStack Alerting
ClickStack (HyperDX) supports alert definitions through the UI or API, with SQL-based alert conditions:
SELECT count(*)
FROM otel_logs
WHERE SeverityText = 'ERROR'
AND Timestamp > now() - INTERVAL 5 MINUTE
AND ServiceName = 'payment-api'
The alerting system is functional but younger. It supports webhook notifications and basic routing. For teams that need Alertmanager’s advanced grouping, inhibition, and silencing features, ClickStack’s alerting is not yet at parity.
However, HyperDX’s strength is in investigation, not just alerting. When an alert fires, you can immediately drill down from a metric anomaly to the specific traces and logs that caused it — all in the same interface with the same query language.
Verdict on Alerting
Prometheus + Alertmanager + Grafana is the more mature alerting and dashboarding ecosystem. If your operational workflows depend heavily on Alertmanager routing trees, Grafana dashboard templates, and recording rules, switching has a cost. ClickStack’s advantage emerges during incident investigation, where cross-signal correlation speeds up root cause analysis.
Migration Path: Running Both
You do not have to choose one and rip out the other. A practical migration path runs both systems in parallel:
Phase 1: Add ClickStack Alongside Prometheus
Deploy ClickStack and configure the OTel Collector to scrape existing Prometheus endpoints:
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'existing-services'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
exporters:
clickhouse:
endpoint: http://clickhouse:8123
database: default
service:
pipelines:
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [clickhouse]
This gives you metrics in both Prometheus (for existing dashboards and alerts) and ClickStack (for cross-signal investigation).
Phase 2: Add Logs and Traces to ClickStack
Instrument applications with OpenTelemetry SDKs to send traces and logs to ClickStack. Keep Prometheus running for metrics alerting.
Phase 3: Evaluate and Decide
After running both for 4–8 weeks, compare:
- Alert quality and response times
- Time to root cause during incidents
- Operational burden of maintaining both systems
- Team preference for PromQL vs SQL
Some organisations keep both permanently — Prometheus for bounded, pre-aggregated alerting metrics and ClickStack for high-cardinality investigation data. This hybrid approach is explicitly recommended by teams managing high-cardinality workloads.
When to Choose Prometheus
Prometheus is the better choice when:
- Your team knows PromQL and has invested in dashboards and alerting rules
- Metrics are your primary signal and you do not need unified logs/traces/metrics
- Cardinality is bounded — under 1 million active series
- You need the exporter ecosystem — Prometheus has exporters for virtually every technology
- Alertmanager’s routing is critical to your incident response workflow
- You are already running the Grafana stack and it is working well
For teams in this position, our Prometheus consulting services can help optimise scaling, retention, and high-availability configurations.
When to Choose ClickStack
ClickStack is the better choice when:
- You need unified observability — logs, traces, metrics, and session replays in one system
- High cardinality is a problem — user IDs, request IDs, container metrics at scale
- Your team knows SQL and prefers it over learning PromQL, LogQL, and TraceQL
- Operational simplicity matters — fewer systems to maintain means less toil
- Cost efficiency is a priority — one database backend is cheaper than five
- Cross-signal correlation is important for incident investigation
- You are starting fresh and do not have an existing Prometheus investment
If you are evaluating ClickStack for a new deployment, our ClickStack setup guide covers Docker, Helm, and production Kubernetes installation.
Feature Comparison Table
| Feature | ClickStack | Prometheus + Grafana Stack |
|---|---|---|
| Logs | Built-in (ClickHouse) | Loki (separate system) |
| Metrics | Built-in (ClickHouse) | Prometheus TSDB |
| Traces | Built-in (ClickHouse) | Tempo (separate system) |
| Session replays | Built-in | Not available |
| Query language | SQL + Lucene | PromQL, LogQL, TraceQL |
| Data collection | Push (OTLP) | Pull (scrape) + push for logs/traces |
| High cardinality | Strong (columnar, deferred cost) | Weak (per-series memory overhead) |
| Long-term storage | Native with tiered S3 | Requires Thanos/Mimir/Cortex |
| Compression | 10–20x (Delta + ZSTD) | Gorilla + XOR (series-dependent) |
| Alerting maturity | Basic (growing) | Mature (Alertmanager) |
| Dashboard ecosystem | HyperDX (smaller community) | Grafana (massive community) |
| Exporter ecosystem | OTel SDK (growing) | Thousands of exporters |
| Cross-signal joins | SQL joins | Manual UI navigation |
| Kubernetes operator | None (Helm chart) | Prometheus Operator (CRDs) |
| License | MIT | Apache 2.0 |
| Maturity | New (2025 launch) | Established (2012, CNCF graduated) |
Frequently Asked Questions
Can ClickStack scrape Prometheus endpoints?
Yes. The OTel Collector includes a prometheus receiver that scrapes standard Prometheus /metrics endpoints. You can migrate existing scrape configurations directly.
Does ClickStack support PromQL?
Not natively. ClickStack uses SQL and Lucene-style queries. If your workflows depend heavily on PromQL, this is a significant learning curve. However, many PromQL patterns map directly to SQL equivalents.
Can I use Grafana with ClickStack?
Yes. ClickHouse has an official Grafana plugin that lets you build Grafana dashboards on top of ClickHouse data. You can use Grafana as the visualisation layer while ClickStack handles storage and ingestion.
Is ClickStack production-ready?
ClickStack is new (launched in 2025 after ClickHouse acquired HyperDX), but it is built on two mature foundations — ClickHouse (processing billions of events per second at Tesla and OpenAI) and OpenTelemetry (CNCF incubating project). The UI and alerting features are less mature than Grafana, but improving rapidly.
What about VictoriaMetrics or Thanos as alternatives?
Both are excellent Prometheus-compatible solutions for long-term storage and scaling. If you want to keep PromQL and Prometheus’s data model but need better retention and federation, VictoriaMetrics or Thanos are a better fit than ClickStack. ClickStack is the right choice when you want to move beyond the three-pillar model entirely.
How does this compare to SigNoz?
SigNoz is architecturally similar to ClickStack — unified observability on ClickHouse with OpenTelemetry. SigNoz has been available since 2021 and has a more mature UI and alerting system. ClickStack has the backing of the ClickHouse team directly, which may mean tighter database integration over time. Both are strong options in the unified observability platform space.
Need Help Choosing Your Observability Stack?
The choice between ClickStack and Prometheus depends on your team’s skills, scale, and operational priorities. We have deployed and maintained both in production Kubernetes environments and can help you evaluate the right fit.
Our Prometheus and observability consulting services help you:
- Evaluate ClickStack, Prometheus, and hybrid architectures based on your specific workload and cardinality profile
- Migrate from expensive SaaS tools like Datadog to self-hosted observability without losing visibility
- Optimise Prometheus for high-scale environments with federation, sharding, and long-term storage
Whether you are scaling an existing Prometheus deployment or evaluating ClickStack for a new platform, we can help you avoid the pitfalls and get to production faster.