Technology

Apache Gravitino

A unified metadata lake — "catalog of catalogs" — that federates Iceberg, Hive, Kafka, and file-based data sources into a single governance layer. Apache incubating project.

7 connections 3 resources 1 post

Summary

What it is

A unified metadata lake — "catalog of catalogs" — that federates Iceberg, Hive, Kafka, and file-based data sources into a single governance layer. Apache incubating project.

Where it fits

In environments with multiple catalogs (Glue, Hive Metastore, Polaris, Unity), Gravitino sits above them all, providing a unified metadata view. Engineers discover and govern data from a single pane regardless of which catalog or storage layer holds it.

Misconceptions / Traps

Gravitino does not replace individual catalogs — it federates them. You still need Polaris, Glue, or Unity underneath.
Lineage features are still maturing. Production lineage workflows may need supplementation with OpenLineage/Marquez.

Key Connections

implements Iceberg REST Catalog Spec — exposes federated metadata via the standard REST interface
enables Apache Polaris — can federate Polaris alongside other catalogs
solves Vendor Lock-In — unified view across multi-vendor catalog environments

Definition

What it is

A unified metadata lake — a "catalog of catalogs" — that provides a single governance layer across Iceberg, Hive, Kafka, and file-based data sources. An Apache incubating project originally developed by Datastrato.

Why it exists

Enterprises run multiple data catalogs (Glue, Hive Metastore, Unity Catalog) across different environments. Gravitino federates these into a unified metadata view so engineers can discover and govern data from a single pane regardless of which catalog or storage layer holds it.

Primary use cases

Federated metadata management across hybrid and multi-cloud environments, unified data discovery, cross-catalog governance.

Recent developments

Latest signals

Latest release: v1.2.1 (current as of June 2026). Tracking the upstream stable release line. Per apache/gravitino releases.
ASF Board Meeting Minutes. dev@gravitino.apache.org had 5% increase in traffic past quarter (139 vs 132 emails). 2 committer candidate nominations underway. QCon Shanghai, Data for AI meetups (Bay Area, Shanghai), COSCon Beijing talks. Per whimsy.apache.org (2026-04-15).
ASF Releases, March 2026. Apache release roundup lists gravitino-1.2.0 (2026-03-12) plus multiple Trino connector releases (435-478) for Gravitino. Per community.apache.org (2026-04-01).
Apache Gravitino 1.2.0. Apache Gravitino 1.2.0 released with Table Maintenance Service (TMS), ClickHouse catalog, end-to-end UDF management, authorization for Iceberg view operation, redesigned Web UI, and broad connector improvements. Per gravitino.apache.org (2026-03-13).
Gravitino is now a two-sided control plane: it offloads query planning AND feeds AI agents. Two threads converge on the same "catalog as control plane" idea. On the engine side, 1.2.0 (March 13, 2026) lets query engines like DuckDB and Spark offload Iceberg scan planning to Gravitino's IRC server (with a scan-planning cache) — planning moves out of the client and into the catalog, cutting metadata I/O and client-side complexity. On the agent side, Gravitino's AI-native metadata layer — a Model Catalog plus a built-in MCP server that exposes governed metadata to AI tools, both introduced in the 1.1.0 "AI-native metadata management platform" release (December 16, 2025) — lets agents discover and reason over data context through the same governance plane. Per Gravitino 1.2.0 release notes and Gravitino 1.1.0 — An AI-native metadata management platform.