Hive Metastore
The original metadata catalog service from the Apache Hive project that stores table schemas, partition mappings, and storage locations for data on S3 and HDFS. Commonly abbreviated as HMS.
Summary
The original metadata catalog service from the Apache Hive project that stores table schemas, partition mappings, and storage locations for data on S3 and HDFS. Commonly abbreviated as HMS.
Hive Metastore is the legacy but still widely deployed catalog underpinning Spark, Trino, Presto, and Flink workloads against S3 data. It predates dedicated Iceberg catalogs and remains the default metastore for many on-premise and hybrid lakehouse deployments.
- HMS was designed for Hive partition-based tables. Its data model is a poor fit for Iceberg's snapshot-based metadata, which is why dedicated Iceberg catalogs (REST, Nessie, Glue) are preferred for new deployments.
- Running HMS requires a backing relational database (MySQL, PostgreSQL). That database becomes a single point of failure and a scaling bottleneck for metadata operations.
- HMS is not a governance tool. It stores structural metadata but has no built-in access control, lineage tracking, or data quality features.
scoped_toMetadata Management — the original Hadoop-era catalogenablesApache Spark, Trino, Apache Flink — query engines that read HMS metadataalternative_toAWS Glue Catalog, Apache Polaris — older alternative to managed catalogsconstrained_byMetadata Overhead at Scale — HMS database becomes a bottleneck at large scale
Definition
An open-source metadata service originally built for Apache Hive that stores table schemas, partition locations, and statistics for data stored on HDFS or S3. The longest-standing catalog in the Hadoop ecosystem.
Before table formats, Hive Metastore was the primary way to impose table structure on files stored in distributed storage. It remains widely deployed as the default catalog for Spark, Trino, and Flink workloads reading from S3.
Legacy catalog for Spark and Trino workloads, Iceberg catalog backend (HiveCatalog), schema registry for Hive-style partitioned tables on S3.
Connections 8
Outbound 6
scoped_to2used_by2solves1alternative_to1Inbound 2
alternative_to1depends_on1Resources 3
The Apache Hive design documentation covering the Metastore architecture that became the foundational schema registry for data lakes.
Source code for the standalone Hive Metastore, which can run independently of the Hive query engine as a metadata service.
Iceberg's Hive Metastore integration guide covering catalog configuration and migration from Hive tables to Iceberg.