Technology

AWS Glue Catalog

AWS's fully managed metadata catalog service that stores table definitions, partition information, and schema metadata for data stored in S3, serving as the default metastore for AWS analytics services.

9 connections 3 resources

Summary

What it is

AWS's fully managed metadata catalog service that stores table definitions, partition information, and schema metadata for data stored in S3, serving as the default metastore for AWS analytics services.

Where it fits

Glue Catalog is the AWS-native metadata layer that connects S3-stored data to query engines like Athena, Redshift Spectrum, and EMR Spark. It replaces the need for a self-managed Hive Metastore in AWS-centric lakehouse deployments.

Misconceptions / Traps
  • Glue Catalog is not a query engine. It stores metadata only; actual query execution is handled by Athena, Spark, Trino, or other engines.
  • Glue Catalog's Iceberg support requires the Glue-specific catalog implementation. Not all Iceberg features (e.g., branching, tagging) are available through Glue's catalog API.
  • API call pricing can surprise at scale. Each GetTable, GetPartitions, and UpdateTable call is billed, and high-frequency metadata access patterns amplify cost.
Key Connections
  • scoped_to Metadata Management — a managed metadata catalog
  • enables Athena, Apache Spark — provides table metadata for query execution
  • alternative_to Hive Metastore — AWS-managed alternative to self-hosted HMS
  • implements Iceberg REST Catalog Spec — supports Iceberg table registration

Definition

What it is

A fully managed metadata catalog service from AWS that stores table definitions, partition information, and schema metadata for data stored in S3. Serves as the default metastore for AWS analytics services.

Why it exists

S3 has no built-in concept of tables, schemas, or partitions. AWS Glue Catalog provides a centralized metadata registry so that Athena, Redshift Spectrum, EMR, and other engines can discover and query S3 data as structured tables without each engine maintaining its own metadata.

Primary use cases

Centralized table metadata for S3-based data lakes, Iceberg catalog backend on AWS, schema registry for ETL pipelines.

Connections 9

Outbound 6
Inbound 3

Resources 3