Standard

Iceberg V3 Spec

The 2025 evolution of the Apache Iceberg table specification, introducing Row Lineage for row-level provenance tracking, native CDC detection, enhanced deletion handling, and metadata designed to make the lakehouse "agent-ready" for AI systems.

4 connections 2 resources

Summary

What it is

The 2025 evolution of the Apache Iceberg table specification, introducing Row Lineage for row-level provenance tracking, native CDC detection, enhanced deletion handling, and metadata designed to make the lakehouse "agent-ready" for AI systems.

Where it fits

As Iceberg becomes the dominant lakehouse format, V3 addresses the gaps that emerged at scale: Row Lineage exposes where each row originated and how it was transformed, native CDC detection eliminates external change tracking, and improved deletion vectors support streaming updates. V3 is the spec that makes Iceberg both batch/streaming-capable and AI-agent-readable.

Misconceptions / Traps
  • Engine support for V3 features is not immediate. Query engines need time to implement Row Lineage and native CDC; check engine compatibility before depending on V3-specific capabilities.
  • V3 is backwards-compatible with V2 data. Upgrading the spec version does not require rewriting existing tables.
  • "Agent-ready" refers to metadata granularity, not an AI integration layer. V3 exposes provenance metadata that AI systems can consume, but does not include built-in agent APIs.
Key Connections
  • extends Iceberg Table Spec — evolutionary improvement to the existing standard
  • enables Apache Iceberg — new capabilities for Iceberg implementations
  • scoped_to Table Formats, S3

Definition

What it is

The 2025 evolution of the Apache Iceberg table specification, introducing Row Lineage for tracking individual row provenance, native CDC detection, enhanced deletion handling, and improved metadata handling for large-scale lakehouse deployments. V3 is the spec designed to make the lakehouse "agent-ready" by exposing granular metadata that AI systems can use to understand data provenance at the row level.

Why it exists

As Iceberg adoption accelerated, the V2 spec revealed limitations in tracking row-level changes and supporting high-frequency update patterns. V3 adds native support for Row Lineage (tracking where each row originated and how it was transformed), native CDC detection, improved deletion tracking, and better interoperability with streaming ingestion patterns.

Primary use cases

Row-level data lineage for compliance and AI provenance, native CDC detection in Iceberg tables, improved streaming ingestion into Iceberg on S3, agent-ready metadata exposure.

Connections 4

Outbound 4

Resources 2