Apache Paimon
An Apache top-level streaming lakehouse table format built on LSM-tree architecture, designed for high-frequency real-time writes and sub-minute data visibility on object storage.
Summary
An Apache top-level streaming lakehouse table format built on LSM-tree architecture, designed for high-frequency real-time writes and sub-minute data visibility on object storage.
While Iceberg and Delta focus on batch-first with streaming bolted on, Paimon is streaming-first. Its LSM-tree design on S3 enables minute-level data visibility for CDC workloads, making it the natural choice for Flink-based real-time pipelines writing to object storage.
- Paimon's strength is Flink integration. Spark support is improving but lags significantly behind Flink in maturity and performance.
- Higher metadata complexity than Iceberg. The LSM-tree compaction process adds operational overhead that batch-oriented formats do not have.
depends_onS3 API — stores data as objects on S3depends_onApache Parquet — data file formatenablesLakehouse Architecture — streaming-first lakehouse designcompetes_withApache Hudi — both target real-time ingestion workloads
Definition
An Apache top-level table format built on LSM-tree (Log-Structured Merge-tree) architecture, designed for high-frequency streaming writes and real-time analytics on object storage. Originally developed as Flink Table Store.
Traditional lakehouse table formats (Iceberg, Delta, Hudi) were designed primarily for batch workloads with streaming bolted on. Paimon is built streaming-first, using LSM-trees on S3 to enable minute-level data visibility for CDC and real-time analytics without the write amplification penalty of copy-on-write.
Real-time CDC ingestion into the lakehouse, streaming analytics with minute-level visibility, high-frequency update workloads on S3.
Connections 10
Outbound 6
scoped_to2depends_on2enables1competes_with1Inbound 4
Resources 3
Official Apache Paimon documentation covering LSM-tree architecture, streaming ingestion, and Flink integration.
Source repository with architecture docs, performance benchmarks, and connector guides.
Technical overview of Paimon's streaming lakehouse design and how it compares to Hudi for CDC workloads.