Iceberg Table Spec
Summary
What it is
The specification defining how a logical table is represented as metadata files, manifest lists, manifests, and data files on object storage. Provides ACID, schema evolution, hidden partitioning, and time-travel.
Where it fits
The Iceberg spec is the blueprint that Apache Iceberg implements. It defines the metadata tree structure that turns a collection of Parquet files on S3 into a reliable, evolvable table — and enables any engine to read the same table consistently.
Misconceptions / Traps
- The spec defines behavior, not implementation. Different engines (Spark, Flink, Trino) may implement the spec at different levels of completeness.
- Manifest files accumulate with every write. Without regular metadata cleanup (expire snapshots, remove orphan files), metadata overhead grows.
Key Connections
enablesLakehouse Architecture — the specification that makes Iceberg-based lakehouses possiblesolvesSchema Evolution (column-ID-based evolution), Partition Pruning Complexity (partition specs in metadata)scoped_toTable Formats, Lakehouse
Definition
What it is
A specification defining how a logical table is represented as a set of data files, metadata files, manifest lists, and snapshots on object storage. Provides ACID semantics, schema evolution, hidden partitioning, and time-travel.
Why it exists
Files on S3 have no inherent table structure. The Iceberg spec adds a metadata layer that turns a collection of Parquet files into a reliable, evolvable table — without requiring a database server.
Primary use cases
Defining lakehouse tables on S3, multi-engine table access (Spark, Trino, Flink can all read the same Iceberg table), schema evolution without rewriting data.
Relationships
Outbound Relationships
scoped_toenablesResources
The authoritative Iceberg Table Specification defining the metadata tree, manifest files, snapshot structure, schema evolution rules, and partitioning model.
Canonical repository containing the reference Java implementation and the specification source documents.
Official documentation covering API usage, configuration, integrations with Spark/Flink/Trino, and migration guides.