Choosing a Lakehouse Catalog — Polaris vs. Unity Catalog vs. Gravitino vs. Cloud-Native
Problem Framing
In 2024 the catalog was an afterthought — somewhere to remember where the tables lived. By mid-2026 it is the control plane of the lakehouse: the layer that mints credentials, enforces policy, plans scans, federates across clouds, and decides whether an AI agent gets to read a table. Picking the catalog is now a more consequential decision than picking the query engine, because every engine and every agent has to pass through it. This guide is the decision path for that choice.
Relevant Nodes
- Topics: Lakehouse, Table Formats
- Technologies: Apache Polaris, Unity Catalog, Apache Gravitino, Microsoft OneLake, Hive Metastore, Apache Iceberg, ClickHouse, Apache Ranger, DuckDB, Trino, Apache Spark
- Standards: Iceberg REST Catalog Spec, Model Context Protocol (MCP)
- Architectures: Catalog-Centric Control Plane, Lakehouse Architecture
- Pain Points: Vendor Lock-In, Metadata Overhead at Scale, Tool Discovery Governance Gap
Decision Path
First decide: open implementation or managed service? This is the fork everything else hangs on.
- Apache Polaris — the open Iceberg REST reference catalog (Snowflake-donated, ASF). Pick it when vendor-neutrality is the priority and you can run infrastructure. As of 1.5.0 (May 2026) its governance gap closed: pluggable Authorizer SPI + Apache Ranger (Beta), so the historical "Polaris has no enterprise policy" objection no longer holds.
- Unity Catalog — open-sourced by Databricks, but the managed Databricks version is where the feature frontier lives. Pick it when you're already in the Databricks ecosystem or want multi-format (Delta + Iceberg) governance with the deepest GA feature set. Managed/Foreign Iceberg + Iceberg v3 are GA over the Iceberg REST API.
- Apache Gravitino — the "catalog of catalogs." Pick it when your problem is federation — unifying Glue + Hive Metastore + Unity + Iceberg under one governance plane across hybrid/multi-cloud — rather than being the single catalog of record.
- Cloud-native (Amazon S3 Tables, Microsoft OneLake) — pick when you're committed to one cloud and want the catalog as managed infrastructure. Both now speak the Iceberg REST Catalog API, so "cloud-native" no longer means "closed."
Weigh federation need. If you have one catalog of record, Polaris or Unity Catalog is enough. If you have many existing catalogs you can't consolidate, Gravitino's federation is the differentiator — it sits above the others rather than replacing them.
Check the governance surface you actually need. Row/column masking, attribute-based access control, audit, and lineage maturity vary. Unity Catalog leads on breadth of GA governance features; Polaris 1.5 + Ranger closes most of the gap for open deployments; Gravitino governs at the federation layer.
Insist on credential vending. Any 2026 catalog worth choosing mints short-lived, prefix-scoped storage credentials at table-load time — Polaris, Unity Catalog, and the Iceberg REST spec itself all support it. If a candidate still expects you to hand long-lived S3 keys to engines, that's the legacy pattern; rule it out.
Score the agent story explicitly. This is the newest axis and the one most teams under-weight. The catalog is becoming how AI agents reach data: Gravitino ships an MCP server + Model Catalog; Databricks' managed MCP servers expose Unity Catalog tables/functions/Vector Search to agents natively. The question to ask a vendor: when an agent reads my data, does it inherit the same authorization boundary as a human principal, or does it get a broad side-door token? The former closes the Tool Discovery Governance Gap; the latter reopens it.
Confirm engine + planning interop. Verify your engines (Spark, Trino, DuckDB, ClickHouse, Snowflake) speak the catalog's REST API. Bonus: Gravitino 1.2 lets DuckDB/Spark offload scan planning to its IRC server — useful if metadata planning, not data I/O, is your bottleneck (Metadata Overhead at Scale).
What Changed Over Time
- 2024: Catalog = metadata lookup. Hive Metastore or table-format-embedded state. Governance lived in the engine.
- 2025: Iceberg REST Catalog spec matured; Polaris donated to ASF; Unity Catalog open-sourced; credential vending normalized. Gravitino reframed as an "AI-native metadata platform" (Model Catalog + MCP server, 1.1.0, Dec 2025).
- 2026 (Q2): Convergence — Polaris 1.5 (pluggable authz + Ranger + BigQuery federation), Unity Catalog full Iceberg GA + managed MCP servers, Gravitino 1.2 scan-planning offload. The catalog became the Catalog-Centric Control Plane, serving humans and agents through the same boundary.
Sources
- www.snowflake.com/en/blog/engineering/apache-polaris-1-5-release/
- www.databricks.com/blog/unity-catalog-and-next-era-apache-icebergtm
- gravitino.apache.org/blog/gravitino-1-2-0-release-notes/
- datalakehousehub.com/blog/2026-05-choosing-iceberg-control-plane/
- blog.fabric.microsoft.com/en-US/blog/how-to-access-your-microsoft-fabr...
- estuary.dev/blog/iceberg-catalog-apache-polaris-vs-unity-catalog/