~/blog/clickstack-vs-prometheus-observability-comparison-2026
zsh
[ENGINEERING]

$ ClickStack vs Prometheus: We Ran Both — Here's the Verdict

author="Engineering Team" date="2026-02-12"

ClickStack and Prometheus solve observability differently. Prometheus is a purpose-built metrics engine with a pull model, a custom time-series database, and PromQL. ClickStack is a unified observability platform that stores logs, traces, metrics, and session replays in ClickHouse and queries everything with SQL.

We ran both in production Kubernetes environments — Prometheus with the Grafana stack (Loki, Tempo, Mimir) for full observability coverage, and ClickStack as a single platform handling all signals. This post covers what we found across architecture, performance, cost, operations, and the scenarios where each one wins.

If you are evaluating observability platforms for a new deployment, this comparison gives you the practical details you need to make the right call.

Architecture: Fundamentally Different Approaches

The core difference is not feature lists — it is how each system thinks about data.

Prometheus Architecture

Prometheus uses an HTTP pull model to scrape metrics from targets at fixed intervals. It stores data in its own purpose-built time-series database (TSDB), where each unique label combination creates a distinct “series” that lives as a separate object in memory and on disk.

Targets (exporters) ← Prometheus (scrape) → TSDB → PromQL → Grafana
                                                 → Alertmanager

For full observability, Prometheus is just one piece. You also need:

  • Loki for logs (separate storage, LogQL query language)
  • Tempo for traces (separate storage, TraceQL query language)
  • Mimir for long-term metrics storage and multi-tenancy
  • Grafana for visualisation and dashboards
  • Alertmanager for alert routing

That is five or six stateful systems, three query languages, and independent scaling requirements for each.

ClickStack Architecture

ClickStack takes a unified backend approach. All four observability signals — logs, traces, metrics, and session replays — are stored in a single ClickHouse instance with optimised schemas per signal type.

Apps (OTel SDK) → OTel Collector (push) → ClickHouse → HyperDX UI

The stack has three core components:

  • ClickHouse — columnar OLAP database for all telemetry storage
  • OpenTelemetry Collector — ingests data via OTLP (gRPC and HTTP)
  • HyperDX — UI for search, dashboards, alerts, and trace exploration

MongoDB stores application state (dashboards, users, config), but it handles no telemetry data.

What This Means in Practice

AspectPrometheus + Grafana StackClickStack
Systems to operate5–6 stateful services2–3 (ClickHouse, HyperDX, MongoDB)
Query languagesPromQL, LogQL, TraceQLSQL + Lucene
Data collectionPull (scrape)Push (OTLP)
Signal correlationUI-level linkingSQL joins across signals
Data modelSeries identity at write timeRows and columns, identity at query time

The Prometheus approach gives you battle-tested, purpose-built tools for each signal type. The ClickStack approach gives you one database to manage and one query language to learn. Both have trade-offs, and the right choice depends on your team and scale.

Metrics: Pull vs Push, PromQL vs SQL

This is where the comparison gets most interesting, because metrics are Prometheus’s core strength.

How Prometheus Handles Metrics

Prometheus scrapes targets every 15–30 seconds, collecting counter, gauge, histogram, and summary metric types. Each unique combination of metric name and labels creates a time series. The TSDB stores these as compressed chunks on disk.

PromQL is designed specifically for time-series analysis:

# Average request latency over 5 minutes by service
rate(http_request_duration_seconds_sum[5m])
  / rate(http_request_duration_seconds_count[5m])

# 99th percentile latency
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

PromQL is concise, expressive, and optimised for the kinds of queries you run on metrics. If your team already knows PromQL, it is hard to beat for metrics-specific workflows.

How ClickStack Handles Metrics

ClickStack receives metrics through the OpenTelemetry Collector, which can scrape Prometheus endpoints using the prometheus receiver or accept metrics pushed via OTLP. Metrics are stored in ClickHouse tables with compression codecs and optimised ordering.

The same queries in SQL:

-- Average request latency over 5 minutes by service
SELECT
  ServiceName,
  toStartOfFiveMinutes(Timestamp) AS interval,
  sum(Value) / count() AS avg_latency
FROM otel_metrics
WHERE MetricName = 'http_request_duration_seconds'
  AND Timestamp > now() - INTERVAL 1 HOUR
GROUP BY ServiceName, interval
ORDER BY interval;

-- Join metrics with traces to find slow endpoints
SELECT
  m.ServiceName,
  t.SpanName,
  avg(m.Value) AS avg_latency,
  count(t.TraceId) AS trace_count
FROM otel_metrics m
LEFT JOIN otel_traces t ON m.ServiceName = t.ServiceName
  AND toStartOfMinute(m.Timestamp) = toStartOfMinute(t.Timestamp)
WHERE m.MetricName = 'http_request_duration_seconds'
GROUP BY m.ServiceName, t.SpanName
ORDER BY avg_latency DESC;

SQL is more verbose for basic metrics queries, but it unlocks something Prometheus fundamentally cannot do: joining metrics with traces and logs in a single query. That second query above — correlating slow metrics with their corresponding traces — requires manual UI navigation in the Prometheus/Grafana stack.

Verdict on Metrics

If your team lives in PromQL and your primary use case is metrics alerting and dashboards, Prometheus is more efficient for that specific workflow. If you need cross-signal analysis or your team already knows SQL, ClickStack removes the barrier between metrics and everything else.

High Cardinality: Where the Architecture Diverges

High cardinality — metrics with millions of unique label combinations — is the scenario that exposes the deepest architectural difference between these two systems.

Prometheus and Cardinality

Prometheus treats each unique label combination as a distinct series, created at write time. Every series costs approximately 3–4 KB of memory for metadata, symbol table entries, and posting list entries in the inverted index.

The maths is harsh:

Active SeriesMemory Overhead (metadata only)
100,000~300 MB
1,000,000~3–4 GB
10,000,000~30–40 GB

This is before storing actual sample values. At 10 million series, Prometheus needs 30–40 GB of RAM just for metadata. Add samples, query buffers, and WAL, and you are looking at significantly more.

When cardinality spikes — a deployment creates thousands of new pods, a label includes user IDs, or a service emits ephemeral request-scoped metrics — Prometheus can OOM and crash. This takes down your monitoring during the exact moment you need it most.

Sharding across multiple Prometheus instances does not eliminate the problem. Distributing 10 million series across 10 shards leaves 1 million series per shard with identical per-series overhead.

ClickStack and Cardinality

ClickHouse stores data as columns without per-series identity. Series emerge only at query time through GROUP BY operations. Ingestion has no per-identity overhead — a row is a row regardless of how many unique label combinations exist.

Active Series EquivalentClickHouse Ingest Memory
100,000~10 MB
1,000,000~100 MB
10,000,000~1 GB

The trade-off: ClickHouse defers the cardinality cost to query time. A GROUP BY user_id across millions of distinct values requires proportional aggregation memory. The difference is that a bad query kills one query, not your entire monitoring system.

ClickHouse also brings architectural advantages for high-cardinality data:

  • Columnar storage — Low-cardinality columns compress 10–100x; high-cardinality columns do not bloat other columns
  • Vectorized execution — Processes thousands of values per CPU cycle
  • LowCardinality type — Dictionary-encodes columns with fewer than ~10K distinct values
  • Sparse index — Min/max indexes per 8,192-row granule enable efficient granule skipping

Verdict on Cardinality

If your metrics have bounded, predictable cardinality (under 1 million series), Prometheus handles this well. If you are dealing with high-cardinality data — user IDs, request IDs, container-level metrics in large Kubernetes clusters — ClickStack handles the ingestion gracefully where Prometheus struggles.

For a deeper look at how this affects Kubernetes monitoring specifically, see our guide to Prometheus monitoring on Kubernetes.

Storage, Retention, and Cost

Long-term storage is one of Prometheus’s well-known limitations and one of ClickStack’s strongest advantages.

Prometheus Storage

Prometheus defaults to 15 days of retention. Its TSDB uses Gorilla compression for timestamps and XOR encoding for values, which works well for long-lived series but poorly for ephemeral ones (short-lived pods, containers).

For longer retention, you need an additional system:

  • Thanos — Adds S3/GCS object storage, compaction, and global query federation
  • Cortex/Mimir — Multi-tenant long-term storage with deduplication
  • VictoriaMetrics — Drop-in long-term storage alternative

Each of these adds operational complexity, another system to maintain, and its own failure modes.

ClickStack Storage

ClickHouse stores all data natively with configurable TTL per signal type. The columnar format with Delta and ZSTD compression achieves strong compression ratios:

SignalCompression RatioRecommended TTL
Metrics10–20x90 days
Logs8–15x (varies by content)14–30 days
Traces10–15x30 days
Session replays5–10x7 days

For cold storage, ClickHouse supports tiered storage policies — keep recent data on SSD, move older data to S3 automatically. No separate system required.

Cost Comparison

For a mid-scale production environment (~200 microservices, ~50 GB/day total telemetry):

ComponentPrometheus + Grafana StackClickStack
Metrics storagePrometheus + Thanos/MimirClickHouse (included)
Log storageLokiClickHouse (included)
Trace storageTempoClickHouse (included)
VisualisationGrafanaHyperDX (included)
Systems to operate5–62–3
Estimated infra cost$1,200–2,000/month$500–800/month
Engineering overhead3 query languages, 5+ config formatsSQL + YAML

The cost difference is driven primarily by operational consolidation. Running fewer stateful systems means fewer nodes, less memory, and less engineering time spent on maintenance. Teams running the Prometheus/Grafana stack at mid-scale companies report approximately $450K annually when including engineering overhead.

Kubernetes Deployment

Both systems are Kubernetes-native, but the deployment footprint is different.

Prometheus on Kubernetes

The standard approach uses the kube-prometheus-stack Helm chart, which deploys:

  • Prometheus server (StatefulSet)
  • Alertmanager (StatefulSet)
  • Grafana (Deployment)
  • Node exporter (DaemonSet)
  • kube-state-metrics (Deployment)
  • Prometheus Operator (Deployment)

Add Loki, Tempo, and Mimir for full observability and you are managing 8+ StatefulSets/Deployments with independent PVCs, scaling, and configuration.

helm install kube-prometheus prometheus-community/kube-prometheus-stack
helm install loki grafana/loki-stack
helm install tempo grafana/tempo

This is battle-tested. The Prometheus Operator with ServiceMonitor CRDs makes target discovery clean. The ecosystem of exporters covers virtually every technology. But operational burden is real — we have seen teams spend 20–30% of their platform engineering time maintaining the monitoring stack itself.

ClickStack on Kubernetes

ClickStack deploys via a single Helm chart:

helm repo add clickstack https://clickhouse.github.io/ClickStack-helm-charts
helm install my-clickstack clickstack/clickstack

This provisions ClickHouse, HyperDX, the OTel Collector, and MongoDB. For production, you can run ClickHouse externally using the Altinity operator with sharding and replication.

The OTel Collector runs as a DaemonSet to collect node-level logs and metrics. Application instrumentation uses OpenTelemetry SDKs pointing to the collector’s OTLP endpoints.

For a step-by-step Kubernetes deployment guide, see our ClickStack setup guide.

Deployment Comparison

AspectPrometheus + Grafana StackClickStack
Helm charts3–4 separate charts1 chart
StatefulSets3–5 (Prometheus, Alertmanager, Loki, Tempo, Mimir)1–2 (ClickHouse, MongoDB)
DaemonSets1 (node-exporter)1 (OTel Collector)
PVCs to manage5–8+2–3
CRDsServiceMonitor, PodMonitor, PrometheusRuleNone
Service discoveryPrometheus Operator CRDsOTel Collector config

Alerting and Dashboards

Prometheus Alerting

Prometheus uses recording rules and alerting rules defined in YAML, processed by Alertmanager for deduplication, grouping, and routing:

groups:
- name: slo-alerts
  rules:
  - alert: HighErrorRate
    expr: |
      sum(rate(http_requests_total{status=~"5.."}[5m]))
      / sum(rate(http_requests_total[5m])) > 0.01
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Error rate exceeds 1% SLO"

This is mature, well-understood, and integrates with PagerDuty, Slack, OpsGenie, and dozens of other notification channels. Grafana adds dashboarding with hundreds of community-maintained dashboard templates.

ClickStack Alerting

ClickStack (HyperDX) supports alert definitions through the UI or API, with SQL-based alert conditions:

SELECT count(*)
FROM otel_logs
WHERE SeverityText = 'ERROR'
  AND Timestamp > now() - INTERVAL 5 MINUTE
  AND ServiceName = 'payment-api'

The alerting system is functional but younger. It supports webhook notifications and basic routing. For teams that need Alertmanager’s advanced grouping, inhibition, and silencing features, ClickStack’s alerting is not yet at parity.

However, HyperDX’s strength is in investigation, not just alerting. When an alert fires, you can immediately drill down from a metric anomaly to the specific traces and logs that caused it — all in the same interface with the same query language.

Verdict on Alerting

Prometheus + Alertmanager + Grafana is the more mature alerting and dashboarding ecosystem. If your operational workflows depend heavily on Alertmanager routing trees, Grafana dashboard templates, and recording rules, switching has a cost. ClickStack’s advantage emerges during incident investigation, where cross-signal correlation speeds up root cause analysis.

Migration Path: Running Both

You do not have to choose one and rip out the other. A practical migration path runs both systems in parallel:

Phase 1: Add ClickStack Alongside Prometheus

Deploy ClickStack and configure the OTel Collector to scrape existing Prometheus endpoints:

receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'existing-services'
          kubernetes_sd_configs:
            - role: pod
          relabel_configs:
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true

exporters:
  clickhouse:
    endpoint: http://clickhouse:8123
    database: default

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [clickhouse]

This gives you metrics in both Prometheus (for existing dashboards and alerts) and ClickStack (for cross-signal investigation).

Phase 2: Add Logs and Traces to ClickStack

Instrument applications with OpenTelemetry SDKs to send traces and logs to ClickStack. Keep Prometheus running for metrics alerting.

Phase 3: Evaluate and Decide

After running both for 4–8 weeks, compare:

  • Alert quality and response times
  • Time to root cause during incidents
  • Operational burden of maintaining both systems
  • Team preference for PromQL vs SQL

Some organisations keep both permanently — Prometheus for bounded, pre-aggregated alerting metrics and ClickStack for high-cardinality investigation data. This hybrid approach is explicitly recommended by teams managing high-cardinality workloads.

When to Choose Prometheus

Prometheus is the better choice when:

  • Your team knows PromQL and has invested in dashboards and alerting rules
  • Metrics are your primary signal and you do not need unified logs/traces/metrics
  • Cardinality is bounded — under 1 million active series
  • You need the exporter ecosystem — Prometheus has exporters for virtually every technology
  • Alertmanager’s routing is critical to your incident response workflow
  • You are already running the Grafana stack and it is working well

For teams in this position, our Prometheus consulting services can help optimise scaling, retention, and high-availability configurations.

When to Choose ClickStack

ClickStack is the better choice when:

  • You need unified observability — logs, traces, metrics, and session replays in one system
  • High cardinality is a problem — user IDs, request IDs, container metrics at scale
  • Your team knows SQL and prefers it over learning PromQL, LogQL, and TraceQL
  • Operational simplicity matters — fewer systems to maintain means less toil
  • Cost efficiency is a priority — one database backend is cheaper than five
  • Cross-signal correlation is important for incident investigation
  • You are starting fresh and do not have an existing Prometheus investment

If you are evaluating ClickStack for a new deployment, our ClickStack setup guide covers Docker, Helm, and production Kubernetes installation.

Feature Comparison Table

FeatureClickStackPrometheus + Grafana Stack
LogsBuilt-in (ClickHouse)Loki (separate system)
MetricsBuilt-in (ClickHouse)Prometheus TSDB
TracesBuilt-in (ClickHouse)Tempo (separate system)
Session replaysBuilt-inNot available
Query languageSQL + LucenePromQL, LogQL, TraceQL
Data collectionPush (OTLP)Pull (scrape) + push for logs/traces
High cardinalityStrong (columnar, deferred cost)Weak (per-series memory overhead)
Long-term storageNative with tiered S3Requires Thanos/Mimir/Cortex
Compression10–20x (Delta + ZSTD)Gorilla + XOR (series-dependent)
Alerting maturityBasic (growing)Mature (Alertmanager)
Dashboard ecosystemHyperDX (smaller community)Grafana (massive community)
Exporter ecosystemOTel SDK (growing)Thousands of exporters
Cross-signal joinsSQL joinsManual UI navigation
Kubernetes operatorNone (Helm chart)Prometheus Operator (CRDs)
LicenseMITApache 2.0
MaturityNew (2025 launch)Established (2012, CNCF graduated)

Frequently Asked Questions

Can ClickStack scrape Prometheus endpoints?

Yes. The OTel Collector includes a prometheus receiver that scrapes standard Prometheus /metrics endpoints. You can migrate existing scrape configurations directly.

Does ClickStack support PromQL?

Not natively. ClickStack uses SQL and Lucene-style queries. If your workflows depend heavily on PromQL, this is a significant learning curve. However, many PromQL patterns map directly to SQL equivalents.

Can I use Grafana with ClickStack?

Yes. ClickHouse has an official Grafana plugin that lets you build Grafana dashboards on top of ClickHouse data. You can use Grafana as the visualisation layer while ClickStack handles storage and ingestion.

Is ClickStack production-ready?

ClickStack is new (launched in 2025 after ClickHouse acquired HyperDX), but it is built on two mature foundations — ClickHouse (processing billions of events per second at Tesla and OpenAI) and OpenTelemetry (CNCF incubating project). The UI and alerting features are less mature than Grafana, but improving rapidly.

What about VictoriaMetrics or Thanos as alternatives?

Both are excellent Prometheus-compatible solutions for long-term storage and scaling. If you want to keep PromQL and Prometheus’s data model but need better retention and federation, VictoriaMetrics or Thanos are a better fit than ClickStack. ClickStack is the right choice when you want to move beyond the three-pillar model entirely.

How does this compare to SigNoz?

SigNoz is architecturally similar to ClickStack — unified observability on ClickHouse with OpenTelemetry. SigNoz has been available since 2021 and has a more mature UI and alerting system. ClickStack has the backing of the ClickHouse team directly, which may mean tighter database integration over time. Both are strong options in the unified observability platform space.


Need Help Choosing Your Observability Stack?

The choice between ClickStack and Prometheus depends on your team’s skills, scale, and operational priorities. We have deployed and maintained both in production Kubernetes environments and can help you evaluate the right fit.

Our Prometheus and observability consulting services help you:

  • Evaluate ClickStack, Prometheus, and hybrid architectures based on your specific workload and cardinality profile
  • Migrate from expensive SaaS tools like Datadog to self-hosted observability without losing visibility
  • Optimise Prometheus for high-scale environments with federation, sharding, and long-term storage

Whether you are scaling an existing Prometheus deployment or evaluating ClickStack for a new platform, we can help you avoid the pitfalls and get to production faster.

Talk to our observability engineers →

Continue exploring these related topics

Chat with real humans
Chat on WhatsApp