Architecture

LSM-tree on S3

An architectural pattern adapting Log-Structured Merge-tree storage to object storage, where writes are batched into sorted append-only runs and periodically compacted into larger files.

4 connections 2 resources

Summary

What it is

An architectural pattern adapting Log-Structured Merge-tree storage to object storage, where writes are batched into sorted append-only runs and periodically compacted into larger files.

Where it fits

LSM-trees are the foundational architecture for streaming-first table formats like Apache Paimon. S3's append-friendly, immutable-object model aligns naturally with LSM's write pattern: batch writes into sorted runs, flush to S3 as immutable files, and merge asynchronously. This enables high-throughput CDC ingestion with predictable read performance.

Misconceptions / Traps

LSM compaction on S3 involves reading, merging, and rewriting entire files — not the in-place operations possible on local disk. Compaction costs (I/O and compute) must be budgeted.
Read amplification increases with the number of uncompacted levels. Compaction scheduling is critical for maintaining query performance.

Key Connections

enables Apache Paimon — Paimon's core storage architecture
solves Small Files Problem — compaction merges small files into larger ones
scoped_to Table Formats, S3

Definition

What it is

An architectural pattern that adapts Log-Structured Merge-tree storage to object storage, organizing writes as append-only sorted runs that are periodically compacted into larger files. This is the foundational architecture for streaming-first table formats like Apache Paimon.

Why it exists

Traditional B-tree and copy-on-write approaches assume low-latency random writes — something S3 does not provide. LSM-trees align naturally with S3's append-friendly, immutable-object model by batching writes into sorted runs and merging them asynchronously, enabling high-throughput ingestion with predictable read performance.

Primary use cases

Streaming lakehouse ingestion on S3, high-frequency write workloads (CDC, IoT) on object storage, real-time analytics with minute-level data visibility.

Recent developments

Latest signals

Apache Paimon is the reference LSM-on-S3 implementation. Paimon combines lake format + LSM-tree structure to bring real-time streaming updates into the lake architecture — the same family of data structures as RocksDB, LevelDB, Cassandra, adapted for object storage's append-friendly model. Per Paimon — Apache Paimon project and Ververica Blog — Apache Paimon: the Streaming Lakehouse.
2026 framing: Paimon beats Iceberg V3 for high-frequency mutable streams. DataLakehouseHub's May 2026 head-to-head: Paimon's LSM-tree design produces cleaner operational profile than Iceberg V3 for continuous-upsert workloads + Flink-native execution + real-time table freshness. Iceberg V3 still wins on the broader engine ecosystem. Per DataLakehouseHub — When Paimon Beats Iceberg for Mutable Streams (May 2026).
Primary key tables double as changelog source for downstream Flink. Paimon primary-key tables serve both as batch-readable tables and as changelog sources (+I/-U/+U/-D records) — let downstream Flink jobs chain mutations from upstream Paimon tables. The "table is the queue is the table" closure that Iceberg/Delta don't natively offer. Per Ververica — Streaming Lakehouse.
Paimon 1.3 release adds Apache Fluss + StarRocks integration. Real-time-lakehouse stack: Apache Fluss (streaming storage) → Paimon (LSM-tree lake format) → StarRocks (query engine) is the 2026 reference architecture for sub-second-latency lakehouse queries. Per BladePipe — Real-Time Lakehouse with Paimon and StarRocks and Apache Fluss — Paimon integration docs.
AWS-native deployment supported: S3 storage + EMR/Kinesis/Lambda integration. Paimon ships first-class AWS S3 storage backend; the streaming-lakehouse pattern works end-to-end on AWS without bespoke infrastructure. Per VeloDB Glossary — Apache Paimon.
LSM compaction is the load-bearing operational primitive. LSM-on-S3 deployments live or die by the compaction strategy — too aggressive and you waste S3 PUT budget; too lazy and read amplification grows unbounded. Production Paimon deployments treat compaction tuning as a first-class operational discipline.

Connections 4

Outbound 4

scoped_to2

S3 Table Formats

enables1

Apache Paimon

solves1

Small Files Problem

Resources 2

DocsHigh

paimon.apache.org/

Apache Paimon's documentation on how LSM-tree architecture is adapted for object storage, the reference implementation of this pattern.

BlogMedium

hudi.apache.org/blog/2025/12/10/apache-hudi-11-deep-dive-opt...

Deep dive into Hudi 1.1's streaming ingestion optimizations covering LSM-tree-inspired write patterns on S3.