Technology

Project Nessie

An open-source transactional catalog for data lakes that provides Git-like branching, tagging, and commit semantics for Iceberg table metadata, enabling isolated experimentation and atomic multi-table operations.

9 connections 3 resources

Summary

What it is

Where it fits

Nessie sits in the catalog layer between query engines and S3-stored Iceberg tables. Unlike Hive Metastore or Glue Catalog, Nessie tracks table state as a history of commits, enabling branch-based workflows (test a schema change on a branch, merge when validated) without duplicating data on S3.

Misconceptions / Traps

Nessie branches do not copy data files on S3. Branches are lightweight metadata pointers. Only the metadata (table snapshots, schema) is versioned; data files are shared across branches via copy-on-write semantics.
Nessie is a catalog, not a query engine. It must be integrated with Spark, Flink, Trino, or Dremio to execute queries.
Merge conflicts in Nessie follow table-level semantics. Concurrent modifications to the same table on different branches require explicit conflict resolution.

Key Connections

scoped_to Metadata Management, Data Versioning — Git-like catalog for table metadata
enables Apache Iceberg — serves as an Iceberg catalog with branching
enables Branching / Tagging — the architectural pattern Nessie implements
alternative_to AWS Glue Catalog, Hive Metastore — catalog with version control semantics

Definition

What it is

An open-source transactional catalog for data lakes that provides Git-like branching and tagging semantics for Iceberg tables stored on S3. Enables isolated experimentation on production datasets without copying data.

Why it exists

Traditional catalogs (Hive Metastore, Glue) offer no branching or isolation — every change is immediately visible to all consumers. Nessie adds Git-like version control to table metadata, enabling safe experimentation, rollback, and multi-table atomic commits.

Primary use cases

Branched experimentation on Iceberg tables, multi-table atomic commits, catalog-level versioning and rollback.

Recent developments

Latest signals

Latest release: v0.108.0 (current as of June 2026). Iceberg 1.11 bump + JLine v4 adoption. Tracking the upstream stable release line. Per projectnessie/nessie releases.
Nessie 0.107.5 — active release line on the projectnessie/nessie repo. Per the Project Nessie releases page, 0.107.5 ships the nessie-cli-0.107.5.jar CLI attachment plus changelog entries covering Google Cloud Secret Manager upgrade notes. The 0.10x cadence positions Nessie as a continuously maintained reference implementation for catalog versioning on Iceberg, ahead of the Iceberg REST Catalog spec absorbing those semantics into a vendor-neutral standard.
Editorial framing: "Git for data catalogs." Per Dremio's "What Is Nessie?" guide, Nessie's enduring contribution is treating data-catalog changes as Git-style transactions — branches, tags, atomic multi-table commits, time-travel by reference. For organizations doing experimentation on production Iceberg tables (schema changes, dataset reshapes, ML feature backfills), Nessie's branching model is still the canonical answer until the Iceberg REST Catalog spec absorbs equivalent semantics. Note on disambiguation: there is a separately-named "NESSiE" LLM safety benchmark (Bertram et al., arXiv:2602.16756) that is completely unrelated to Project Nessie — search engines collide them; this index tracks only the data-catalog Project Nessie.