Apache Hudi Spec
The specification for managing incremental data processing on object storage — record-level upserts, deletes, change logs, and timeline-based metadata.
Summary
The specification for managing incremental data processing on object storage — record-level upserts, deletes, change logs, and timeline-based metadata.
The Hudi spec defines how to efficiently mutate individual records in S3-stored datasets. It is the specification behind Hudi's Copy-on-Write and Merge-on-Read table types, and its timeline abstraction tracks all changes.
- The Hudi spec's timeline model is conceptually different from Iceberg's snapshot model and Delta's transaction log. Understanding the timeline abstraction is prerequisite to operating Hudi tables.
- The RFC-based evolution model means the spec is a living document. Breaking changes can be introduced via RFCs.
enablesLakehouse Architecture — makes incremental processing possible on data lakes- Apache Hudi
depends_onApache Hudi Spec scoped_toTable Formats, Lakehouse
Definition
A specification for managing incremental data processing on object storage — defining record-level upserts, deletes, change logs, and timeline-based metadata.
Traditional data lake patterns only supported append operations. The Hudi spec defines how to efficiently update and delete individual records in S3-stored datasets, which is essential for CDC, compliance, and data correction workflows.
Change data capture into S3, record-level updates without full partition rewrites, incremental query support.
Connections 4
Outbound 3
scoped_to2enables1Inbound 1
depends_on1Resources 4
Technical specification pages documenting Hudi table format internals including the timeline, file layout, indexing mechanisms, and compaction strategies.
Official documentation covering table types (CoW and MoR), write operations, querying, and configuration.
Canonical repository containing the reference Java implementation, RFC documents, and the source of truth for the Hudi table format.
The Hudi RFC directory contains formal design documents for major features and format changes, serving as living specification amendments.