Apache Hudi Spec
Summary
What it is
The specification for managing incremental data processing on object storage — record-level upserts, deletes, change logs, and timeline-based metadata.
Where it fits
The Hudi spec defines how to efficiently mutate individual records in S3-stored datasets. It is the specification behind Hudi's Copy-on-Write and Merge-on-Read table types, and its timeline abstraction tracks all changes.
Misconceptions / Traps
- The Hudi spec's timeline model is conceptually different from Iceberg's snapshot model and Delta's transaction log. Understanding the timeline abstraction is prerequisite to operating Hudi tables.
- The RFC-based evolution model means the spec is a living document. Breaking changes can be introduced via RFCs.
Key Connections
enablesLakehouse Architecture — makes incremental processing possible on data lakes- Apache Hudi
depends_onApache Hudi Spec scoped_toTable Formats, Lakehouse
Definition
What it is
A specification for managing incremental data processing on object storage — defining record-level upserts, deletes, change logs, and timeline-based metadata.
Why it exists
Traditional data lake patterns only supported append operations. The Hudi spec defines how to efficiently update and delete individual records in S3-stored datasets, which is essential for CDC, compliance, and data correction workflows.
Primary use cases
Change data capture into S3, record-level updates without full partition rewrites, incremental query support.
Relationships
Inbound Relationships
depends_onResources
Technical specification pages documenting Hudi table format internals including the timeline, file layout, indexing mechanisms, and compaction strategies.
Official documentation covering table types (CoW and MoR), write operations, querying, and configuration.
Canonical repository containing the reference Java implementation, RFC documents, and the source of truth for the Hudi table format.
The Hudi RFC directory contains formal design documents for major features and format changes, serving as living specification amendments.