AWS Glue Catalog
AWS's fully managed metadata catalog service that stores table definitions, partition information, and schema metadata for data stored in S3, serving as the default metastore for AWS analytics services.
Summary
AWS's fully managed metadata catalog service that stores table definitions, partition information, and schema metadata for data stored in S3, serving as the default metastore for AWS analytics services.
Glue Catalog is the AWS-native metadata layer that connects S3-stored data to query engines like Athena, Redshift Spectrum, and EMR Spark. It replaces the need for a self-managed Hive Metastore in AWS-centric lakehouse deployments.
- Glue Catalog is not a query engine. It stores metadata only; actual query execution is handled by Athena, Spark, Trino, or other engines.
- Glue Catalog's Iceberg support requires the Glue-specific catalog implementation. Not all Iceberg features (e.g., branching, tagging) are available through Glue's catalog API.
- API call pricing can surprise at scale. Each GetTable, GetPartitions, and UpdateTable call is billed, and high-frequency metadata access patterns amplify cost.
scoped_toMetadata Management — a managed metadata catalogenablesAthena, Apache Spark — provides table metadata for query executionalternative_toHive Metastore — AWS-managed alternative to self-hosted HMSimplementsIceberg REST Catalog Spec — supports Iceberg table registration
Definition
A fully managed metadata catalog service from AWS that stores table definitions, partition information, and schema metadata for data stored in S3. Serves as the default metastore for AWS analytics services.
S3 has no built-in concept of tables, schemas, or partitions. AWS Glue Catalog provides a centralized metadata registry so that Athena, Redshift Spectrum, EMR, and other engines can discover and query S3 data as structured tables without each engine maintaining its own metadata.
Centralized table metadata for S3-based data lakes, Iceberg catalog backend on AWS, schema registry for ETL pipelines.
Connections 9
Outbound 6
scoped_to2implements1enables1solves1alternative_to1Inbound 3
alternative_to1depends_on2Resources 3
Official AWS documentation on the Glue Data Catalog, the de facto metadata store for S3-based data lakes on AWS.
Glue Catalog API reference defining programmatic operations for creating, updating, and querying table metadata.
Iceberg's AWS integration module including the Glue Catalog implementation used in production lakehouse deployments.