LSM-tree on S3
An architectural pattern adapting Log-Structured Merge-tree storage to object storage, where writes are batched into sorted append-only runs and periodically compacted into larger files.
Summary
An architectural pattern adapting Log-Structured Merge-tree storage to object storage, where writes are batched into sorted append-only runs and periodically compacted into larger files.
LSM-trees are the foundational architecture for streaming-first table formats like Apache Paimon. S3's append-friendly, immutable-object model aligns naturally with LSM's write pattern: batch writes into sorted runs, flush to S3 as immutable files, and merge asynchronously. This enables high-throughput CDC ingestion with predictable read performance.
- LSM compaction on S3 involves reading, merging, and rewriting entire files — not the in-place operations possible on local disk. Compaction costs (I/O and compute) must be budgeted.
- Read amplification increases with the number of uncompacted levels. Compaction scheduling is critical for maintaining query performance.
enablesApache Paimon — Paimon's core storage architecturesolvesSmall Files Problem — compaction merges small files into larger onesscoped_toTable Formats, S3
Definition
An architectural pattern that adapts Log-Structured Merge-tree storage to object storage, organizing writes as append-only sorted runs that are periodically compacted into larger files. This is the foundational architecture for streaming-first table formats like Apache Paimon.
Traditional B-tree and copy-on-write approaches assume low-latency random writes — something S3 does not provide. LSM-trees align naturally with S3's append-friendly, immutable-object model by batching writes into sorted runs and merging them asynchronously, enabling high-throughput ingestion with predictable read performance.
Streaming lakehouse ingestion on S3, high-frequency write workloads (CDC, IoT) on object storage, real-time analytics with minute-level data visibility.
Recent developments
- Apache Paimon is the reference LSM-on-S3 implementation. Paimon combines lake format + LSM-tree structure to bring real-time streaming updates into the lake architecture — the same family of data structures as RocksDB, LevelDB, Cassandra, adapted for object storage's append-friendly model. Per Paimon — Apache Paimon project and Ververica Blog — Apache Paimon: the Streaming Lakehouse.
- 2026 framing: Paimon beats Iceberg V3 for high-frequency mutable streams. DataLakehouseHub's May 2026 head-to-head: Paimon's LSM-tree design produces cleaner operational profile than Iceberg V3 for continuous-upsert workloads + Flink-native execution + real-time table freshness. Iceberg V3 still wins on the broader engine ecosystem. Per DataLakehouseHub — When Paimon Beats Iceberg for Mutable Streams (May 2026).
- Primary key tables double as changelog source for downstream Flink. Paimon primary-key tables serve both as batch-readable tables and as changelog sources (+I/-U/+U/-D records) — let downstream Flink jobs chain mutations from upstream Paimon tables. The "table is the queue is the table" closure that Iceberg/Delta don't natively offer. Per Ververica — Streaming Lakehouse.
- Paimon 1.3 release adds Apache Fluss + StarRocks integration. Real-time-lakehouse stack: Apache Fluss (streaming storage) → Paimon (LSM-tree lake format) → StarRocks (query engine) is the 2026 reference architecture for sub-second-latency lakehouse queries. Per BladePipe — Real-Time Lakehouse with Paimon and StarRocks and Apache Fluss — Paimon integration docs.
- AWS-native deployment supported: S3 storage + EMR/Kinesis/Lambda integration. Paimon ships first-class AWS S3 storage backend; the streaming-lakehouse pattern works end-to-end on AWS without bespoke infrastructure. Per VeloDB Glossary — Apache Paimon.
- LSM compaction is the load-bearing operational primitive. LSM-on-S3 deployments live or die by the compaction strategy — too aggressive and you waste S3 PUT budget; too lazy and read amplification grows unbounded. Production Paimon deployments treat compaction tuning as a first-class operational discipline.
Connections 4
Outbound 4
Resources 2
Apache Paimon's documentation on how LSM-tree architecture is adapted for object storage, the reference implementation of this pattern.
Deep dive into Hudi 1.1's streaming ingestion optimizations covering LSM-tree-inspired write patterns on S3.