Technology

ClickHouse

A column-oriented DBMS designed for real-time analytical queries, with native support for reading from and writing to S3.

7 connections 4 resources 1 post

Summary

What it is

A column-oriented DBMS designed for real-time analytical queries, with native support for reading from and writing to S3.

Where it fits

ClickHouse occupies the performance tier above pure lakehouse queries. It can use S3 as a storage backend (S3-backed MergeTree) while maintaining its own columnar indexes for sub-second query performance — bridging the gap between S3 data lakes and dedicated analytics databases.

Misconceptions / Traps
  • ClickHouse with S3 storage is not the same as querying S3 directly. ClickHouse maintains local indexes and metadata for performance; it uses S3 for durability and cost.
  • The S3 table function (for ad-hoc S3 reads) and the S3-backed MergeTree engine (for persistent tables) are different features with different performance characteristics.
Key Connections
  • depends_on Apache Parquet — reads/writes Parquet for S3 interop
  • implements Separation of Storage and Compute — S3-backed storage with independent compute
  • scoped_to S3, Lakehouse

Definition

What it is

A column-oriented database management system designed for real-time analytical queries, with native support for reading from and writing to S3. The 25.x series (early 2026) added **bidirectional Iceberg read/write**, **Delta Lake INSERT** support, and native **Apache Paimon** compatibility — converting ClickHouse from a hot-tier accelerator into a first-class lakehouse engine. ClickHouse Inc. acquired **Langfuse** (LLM observability) in March 2026, planting an LLM-trace-store flag adjacent to the analytics engine.

Why it exists

Some analytical workloads require sub-second query performance on recent data, which pure S3-backed query engines cannot consistently deliver. ClickHouse uses S3 as a storage backend while maintaining its own columnar indexes for speed; with Iceberg/Delta/Paimon integration it can also serve as the query layer over an open-format lakehouse without rewriting source-of-truth data.

Primary use cases

Real-time analytics dashboards backed by S3 storage, log analytics with S3 archival, hybrid hot/cold query patterns, LLM observability stores (Langfuse), bidirectional read/write against open table formats.

Recent developments

Latest signals
  • Vector Search GA + 9,000× faster JSON than PostgreSQL JSONB. Per ClickHouse's 2025 roundup, the v25.8 release brought vector search with binary quantization to general availability — ClickHouse joins the analytic-database-with-vector-search shape that Snowflake and Databricks reached earlier, but at OSS-engine speed. The real-time analytics database guide for 2026 reports native JSON throughput 9,000× faster than PostgreSQL JSONB on JSONBench — a real workload pattern for telemetry, event-store, and LLM-trace ingestion (the Langfuse acquisition is downstream of this number).
  • Automatic Global Join Reordering — TPC-H SF100 wins by ~1,450×. Per the same engineering guide, v25.09's Automatic Global Join Reordering posted a ~1,450× speedup on TPC-H SF100 vs the prior planner. v25.10 added runtime bloom filters for an additional 2.1× speedup on selective joins. The cumulative effect: ClickHouse closes the "complex JOIN performance" gap that historically pushed teams toward Snowflake or Databricks for multi-fact-table analytics.
  • v26.4 lands JSON skip indexes + NATURAL JOIN + parameterized Web UI queries (April 30, 2026). Per the Changelog 2026, v26.4 ships MergeTree skip index support for JSON columns (the missing piece for using native JSON as a queryable column type at scale), NATURAL JOIN syntax for terser SQL, commit_order projection index, and parameterized queries in the Web UI. chDB — the embedded ClickHouse runtime — picked up the v25.8 kernel for a reported 61× performance improvement, keeping the in-process analytical-database story competitive with DuckDB.

Connections 7

Outbound 6
Inbound 1

Resources 4

Featured in