Iceberg V3 Spec
The 2025 evolution of the Apache Iceberg table specification, introducing Row Lineage for row-level provenance tracking, native CDC detection, enhanced deletion handling, and metadata designed to make the lakehouse "agent-ready" for AI systems.
Summary
The 2025 evolution of the Apache Iceberg table specification, introducing Row Lineage for row-level provenance tracking, native CDC detection, enhanced deletion handling, and metadata designed to make the lakehouse "agent-ready" for AI systems.
As Iceberg becomes the dominant lakehouse format, V3 addresses the gaps that emerged at scale: Row Lineage exposes where each row originated and how it was transformed, native CDC detection eliminates external change tracking, and improved deletion vectors support streaming updates. V3 is the spec that makes Iceberg both batch/streaming-capable and AI-agent-readable.
- Engine support for V3 features is not immediate. Query engines need time to implement Row Lineage and native CDC; check engine compatibility before depending on V3-specific capabilities.
- V3 is backwards-compatible with V2 data. Upgrading the spec version does not require rewriting existing tables.
- "Agent-ready" refers to metadata granularity, not an AI integration layer. V3 exposes provenance metadata that AI systems can consume, but does not include built-in agent APIs.
extendsIceberg Table Spec — evolutionary improvement to the existing standardenablesApache Iceberg — new capabilities for Iceberg implementationsscoped_toTable Formats, S3
Definition
The 2025–2026 evolution of the Apache Iceberg table specification. V3 introduces four substantive changes: **Row Lineage** (every row carries a unique row ID and a sequence number that timestamps its last modification, enabling zero-scan incremental reads), **Deletion Vectors** (Puffin-encoded Roaring bitmaps that mark logically deleted positions instead of rewriting whole Parquet files — up to 10× faster MERGE/UPDATE), **native CDC detection**, and the **VARIANT data type** for shredded semi-structured payloads (nested JSON, IoT telemetry, application logs stored alongside strict relational columns with columnar-equivalent scan performance). V3 reached **Public Preview in Snowflake (March 2026)** and entered **bidirectional interop with Databricks Unity Catalog** the same quarter; AWS announced support for v3 deletion vectors and row lineage in November 2025.
V2 revealed three structural limits at scale: copy-on-write for any update made CDC pipelines economically punishing, lack of row provenance forced full-table scans for incremental processing, and the strict relational schema required separate normalization ETL for any semi-structured ingest. V3 addresses all three at the spec layer so engines (Spark, Trino, Flink, Athena, Snowflake, Databricks) inherit the gains without bespoke patches.
Row-level data lineage for compliance and AI provenance, native CDC detection in Iceberg tables, high-frequency MERGE/UPDATE workloads via deletion vectors, querying semi-structured payloads (JSON, telemetry) without normalization ETL, agent-ready metadata exposure.
Connections 6
Outbound 5
scoped_to2extends1enables1depends_on1Inbound 1
used_by1Resources 2
Official Iceberg specification including V3 changes for row-level lineage, enhanced deletion tracking, and CDC support.
2025 year-in-review covering Iceberg V3 spec evolution, Polaris adoption, and ecosystem maturity.