Technology

Project Nessie

An open-source transactional catalog for data lakes that provides Git-like branching, tagging, and commit semantics for Iceberg table metadata, enabling isolated experimentation and atomic multi-table operations.

9 connections 3 resources

Summary

What it is

An open-source transactional catalog for data lakes that provides Git-like branching, tagging, and commit semantics for Iceberg table metadata, enabling isolated experimentation and atomic multi-table operations.

Where it fits

Nessie sits in the catalog layer between query engines and S3-stored Iceberg tables. Unlike Hive Metastore or Glue Catalog, Nessie tracks table state as a history of commits, enabling branch-based workflows (test a schema change on a branch, merge when validated) without duplicating data on S3.

Misconceptions / Traps
  • Nessie branches do not copy data files on S3. Branches are lightweight metadata pointers. Only the metadata (table snapshots, schema) is versioned; data files are shared across branches via copy-on-write semantics.
  • Nessie is a catalog, not a query engine. It must be integrated with Spark, Flink, Trino, or Dremio to execute queries.
  • Merge conflicts in Nessie follow table-level semantics. Concurrent modifications to the same table on different branches require explicit conflict resolution.
Key Connections
  • scoped_to Metadata Management, Data Versioning — Git-like catalog for table metadata
  • enables Apache Iceberg — serves as an Iceberg catalog with branching
  • enables Branching / Tagging — the architectural pattern Nessie implements
  • alternative_to AWS Glue Catalog, Hive Metastore — catalog with version control semantics

Definition

What it is

An open-source transactional catalog for data lakes that provides Git-like branching and tagging semantics for Iceberg tables stored on S3. Enables isolated experimentation on production datasets without copying data.

Why it exists

Traditional catalogs (Hive Metastore, Glue) offer no branching or isolation — every change is immediately visible to all consumers. Nessie adds Git-like version control to table metadata, enabling safe experimentation, rollback, and multi-table atomic commits.

Primary use cases

Branched experimentation on Iceberg tables, multi-table atomic commits, catalog-level versioning and rollback.

Connections 9

Outbound 7
Inbound 2
enables1

Resources 3