Guide 3
Why Iceberg Exists (and What It Replaces)
Problem Framing
Before Iceberg, querying data on S3 meant pointing a Hive Metastore at a directory of Parquet files and hoping for the best. There were no transactions, schema changes required rewriting data, partition layouts were user-visible and fragile, and concurrent reads/writes produced unpredictable results. Iceberg replaces this entire stack of workarounds with a formal table specification.
Relevant Nodes
- Topics: Table Formats, Lakehouse, S3
- Technologies: Apache Iceberg, Delta Lake, Apache Hudi, Apache Spark, Trino, DuckDB, Apache Flink
- Standards: Iceberg Table Spec, Apache Parquet, S3 API
- Architectures: Lakehouse Architecture
- Pain Points: Schema Evolution, Small Files Problem, Partition Pruning Complexity, Metadata Overhead at Scale, Lack of Atomic Rename
Decision Path
Understand what Iceberg replaces:
- Hive-style partitioning → Iceberg's hidden partitioning. Users no longer need to specify partition columns in queries; the table format handles pruning transparently.
- Schema rigidity → Iceberg's column-ID-based schema evolution. Add, drop, rename, and reorder columns as metadata-only operations. No data rewrite required.
- No transactions → Iceberg's snapshot isolation. Writers produce new snapshots; readers see consistent table state. Concurrent access is safe.
- Directory listing for file discovery → Iceberg's manifest files. Query planners read manifests instead of listing S3 prefixes — eliminating the object listing bottleneck.
Decide if Iceberg is right for your workload:
- Yes if you need multi-engine access (Spark, Trino, Flink, DuckDB all reading the same tables).
- Yes if schema evolution is frequent and you cannot afford data rewrites.
- Yes if you want vendor-neutral table format with the broadest ecosystem support.
- Consider alternatives if you are deeply invested in Databricks (Delta Lake has tighter integration) or need CDC-first ingestion patterns (Hudi specializes here).
Understand Iceberg's S3 constraints:
- Iceberg metadata is stored as files on S3. Metadata operations (commit, planning) are subject to S3 latency.
- Atomic commits on S3 require a catalog (Hive Metastore, Nessie, AWS Glue) to coordinate metadata pointer updates.
- Metadata grows with every commit. Snapshot expiration and orphan file cleanup are operational necessities.
Plan for metadata maintenance from day one:
- Expire old snapshots regularly (
expireSnapshots) - Remove orphan files that are no longer referenced
- Compact manifests when manifest lists grow large
- Monitor metadata file counts and planning times
- Expire old snapshots regularly (
What Changed Over Time
- Iceberg started at Netflix (2018) to solve table management problems at Netflix's scale on S3.
- Graduated to Apache Top-Level Project (2020), signaling broad industry adoption.
- Multi-engine support expanded — from Spark-only to Spark, Trino, Flink, DuckDB, ClickHouse, StarRocks.
- Iceberg REST catalog emerged as a standard catalog interface, reducing lock-in to specific metadata stores.
- Databricks began supporting Iceberg alongside Delta, effectively acknowledging Iceberg's momentum as the cross-engine standard.
Sources
- iceberg.apache.org/spec/
- iceberg.apache.org/docs/latest/
- iceberg.apache.org/docs/latest/aws/
- github.com/apache/iceberg
- iceberg.apache.org/docs/latest/evolution/
- www.dremio.com/blog/comparison-of-data-lake-table-formats-apache-icebe...
- www.dremio.com/blog/table-format-partitioning-comparison-apache-iceber...