Project Nessie
An open-source transactional catalog for data lakes that provides Git-like branching, tagging, and commit semantics for Iceberg table metadata, enabling isolated experimentation and atomic multi-table operations.
Summary
An open-source transactional catalog for data lakes that provides Git-like branching, tagging, and commit semantics for Iceberg table metadata, enabling isolated experimentation and atomic multi-table operations.
Nessie sits in the catalog layer between query engines and S3-stored Iceberg tables. Unlike Hive Metastore or Glue Catalog, Nessie tracks table state as a history of commits, enabling branch-based workflows (test a schema change on a branch, merge when validated) without duplicating data on S3.
- Nessie branches do not copy data files on S3. Branches are lightweight metadata pointers. Only the metadata (table snapshots, schema) is versioned; data files are shared across branches via copy-on-write semantics.
- Nessie is a catalog, not a query engine. It must be integrated with Spark, Flink, Trino, or Dremio to execute queries.
- Merge conflicts in Nessie follow table-level semantics. Concurrent modifications to the same table on different branches require explicit conflict resolution.
scoped_toMetadata Management, Data Versioning — Git-like catalog for table metadataenablesApache Iceberg — serves as an Iceberg catalog with branchingenablesBranching / Tagging — the architectural pattern Nessie implementsalternative_toAWS Glue Catalog, Hive Metastore — catalog with version control semantics
Definition
An open-source transactional catalog for data lakes that provides Git-like branching and tagging semantics for Iceberg tables stored on S3. Enables isolated experimentation on production datasets without copying data.
Traditional catalogs (Hive Metastore, Glue) offer no branching or isolation — every change is immediately visible to all consumers. Nessie adds Git-like version control to table metadata, enabling safe experimentation, rollback, and multi-table atomic commits.
Branched experimentation on Iceberg tables, multi-table atomic commits, catalog-level versioning and rollback.
Recent developments
- Latest release: v0.108.0 (current as of June 2026). Iceberg 1.11 bump + JLine v4 adoption. Tracking the upstream stable release line. Per projectnessie/nessie releases.
- Nessie 0.107.5 — active release line on the projectnessie/nessie repo. Per the Project Nessie releases page, 0.107.5 ships the
nessie-cli-0.107.5.jarCLI attachment plus changelog entries covering Google Cloud Secret Manager upgrade notes. The 0.10x cadence positions Nessie as a continuously maintained reference implementation for catalog versioning on Iceberg, ahead of the Iceberg REST Catalog spec absorbing those semantics into a vendor-neutral standard. - Editorial framing: "Git for data catalogs." Per Dremio's "What Is Nessie?" guide, Nessie's enduring contribution is treating data-catalog changes as Git-style transactions — branches, tags, atomic multi-table commits, time-travel by reference. For organizations doing experimentation on production Iceberg tables (schema changes, dataset reshapes, ML feature backfills), Nessie's branching model is still the canonical answer until the Iceberg REST Catalog spec absorbs equivalent semantics. Note on disambiguation: there is a separately-named "NESSiE" LLM safety benchmark (Bertram et al., arXiv:2602.16756) that is completely unrelated to Project Nessie — search engines collide them; this index tracks only the data-catalog Project Nessie.
Connections 9
Outbound 7
scoped_to3implements1enables1solves1used_by1Inbound 2
enables1depends_on1Resources 3
Official Project Nessie site for the Git-like transactional catalog providing branching, tagging, and commit history for Iceberg tables on S3.
Nessie source repository with the catalog server, CLI tools, and integrations for Spark, Flink, and Dremio.
Nessie API and configuration reference covering branch management, merge operations, and multi-table transactions.