Architecture

LSM-tree on S3

An architectural pattern adapting Log-Structured Merge-tree storage to object storage, where writes are batched into sorted append-only runs and periodically compacted into larger files.

4 connections 2 resources

Summary

What it is

An architectural pattern adapting Log-Structured Merge-tree storage to object storage, where writes are batched into sorted append-only runs and periodically compacted into larger files.

Where it fits

LSM-trees are the foundational architecture for streaming-first table formats like Apache Paimon. S3's append-friendly, immutable-object model aligns naturally with LSM's write pattern: batch writes into sorted runs, flush to S3 as immutable files, and merge asynchronously. This enables high-throughput CDC ingestion with predictable read performance.

Misconceptions / Traps
  • LSM compaction on S3 involves reading, merging, and rewriting entire files — not the in-place operations possible on local disk. Compaction costs (I/O and compute) must be budgeted.
  • Read amplification increases with the number of uncompacted levels. Compaction scheduling is critical for maintaining query performance.
Key Connections
  • enables Apache Paimon — Paimon's core storage architecture
  • solves Small Files Problem — compaction merges small files into larger ones
  • scoped_to Table Formats, S3

Definition

What it is

An architectural pattern that adapts Log-Structured Merge-tree storage to object storage, organizing writes as append-only sorted runs that are periodically compacted into larger files. This is the foundational architecture for streaming-first table formats like Apache Paimon.

Why it exists

Traditional B-tree and copy-on-write approaches assume low-latency random writes — something S3 does not provide. LSM-trees align naturally with S3's append-friendly, immutable-object model by batching writes into sorted runs and merging them asynchronously, enabling high-throughput ingestion with predictable read performance.

Primary use cases

Streaming lakehouse ingestion on S3, high-frequency write workloads (CDC, IoT) on object storage, real-time analytics with minute-level data visibility.

Connections 4

Outbound 4

Resources 2