Technology

Hive Metastore

The original metadata catalog service from the Apache Hive project that stores table schemas, partition mappings, and storage locations for data on S3 and HDFS. Commonly abbreviated as HMS.

9 connections 3 resources 1 post

Summary

What it is

The original metadata catalog service from the Apache Hive project that stores table schemas, partition mappings, and storage locations for data on S3 and HDFS. Commonly abbreviated as HMS.

Where it fits

Hive Metastore is the legacy but still widely deployed catalog underpinning Spark, Trino, Presto, and Flink workloads against S3 data. It predates dedicated Iceberg catalogs and remains the default metastore for many on-premise and hybrid lakehouse deployments.

Misconceptions / Traps

HMS was designed for Hive partition-based tables. Its data model is a poor fit for Iceberg's snapshot-based metadata, which is why dedicated Iceberg catalogs (REST, Nessie, Glue) are preferred for new deployments.
Running HMS requires a backing relational database (MySQL, PostgreSQL). That database becomes a single point of failure and a scaling bottleneck for metadata operations.
HMS is not a governance tool. It stores structural metadata but has no built-in access control, lineage tracking, or data quality features.

Key Connections

scoped_to Metadata Management — the original Hadoop-era catalog
enables Apache Spark, Trino, Apache Flink — query engines that read HMS metadata
alternative_to AWS Glue Catalog, Apache Polaris — older alternative to managed catalogs
constrained_by Metadata Overhead at Scale — HMS database becomes a bottleneck at large scale

Definition

What it is

An open-source metadata service originally built for Apache Hive that stores table schemas, partition locations, and statistics for data stored on HDFS or S3. The longest-standing catalog in the Hadoop ecosystem.

Why it exists

Before table formats, Hive Metastore was the primary way to impose table structure on files stored in distributed storage. It remains widely deployed as the default catalog for Spark, Trino, and Flink workloads reading from S3.

Primary use cases

Legacy catalog for Spark and Trino workloads, Iceberg catalog backend (HiveCatalog), schema registry for Hive-style partitioned tables on S3.

Recent developments

Latest signals

Databricks deprecates Hive Metastore for new workspaces — September 30, 2026 cutoff. Per the Databricks UC-only migration docs, starting September 30, 2026 new Databricks workspaces will be provisioned without Hive metastore access. Legacy Hive metastore is being deprecated in favor of Unity Catalog. Existing workspaces are not auto-migrated, but the writing on the wall is unambiguous: HMS is the legacy path; UC is where ecosystem investment now flows.
Iceberg ecosystem trend: REST Catalog displacing Hive Metastore as default. Per RisingWave's Iceberg 2026 analysis, Hive Metastore is now consistently characterized as "complex and stateful" relative to the Iceberg REST Catalog spec, which has become the standard. Engines like RisingWave support Hive Metastore, Glue, and REST catalogs concurrently — but new architectural decisions in 2026 are uniformly choosing REST Catalog over HMS for new deployments. For teams currently on HMS, this is a "stable but no-new-investment" signal rather than an immediate-migration signal.