Architecture

Medallion Architecture

Summary

What it is

A layered data quality pattern — Bronze (raw), Silver (cleansed), Gold (business-ready) — with each layer stored on object storage.

Where it fits

Medallion is the most widely adopted data quality pattern within lakehouses. It organizes S3 data into progressive quality tiers, giving each tier a clear contract and making it safe for different consumers to read at different quality levels.

Misconceptions / Traps

  • Three layers is a convention, not a rule. Some organizations use two layers; others add more. The pattern is about progressive refinement, not a fixed number of tiers.
  • Medallion does not solve the small files problem — it can worsen it. Each layer transformation may produce many small output files, especially with streaming Silver→Gold pipelines.

Key Connections

  • is_a Lakehouse Architecture — a specialization of the lakehouse pattern
  • constrained_by Legacy Ingestion Bottlenecks, Small Files Problem
  • AWS S3 used_by Medallion Architecture — each layer resides on S3
  • Apache Spark, Apache Flink used_by Medallion Architecture — compute engines for tier transformations
  • scoped_to Lakehouse, Data Lake

Definition

What it is

A layered data quality pattern that organizes data into three tiers — Bronze (raw), Silver (cleansed/conformed), Gold (aggregated/business-ready) — with each layer stored on object storage.

Why it exists

Raw data arriving in S3 is messy, inconsistent, and not query-ready. The Medallion pattern provides a structured progression from raw ingestion to business-quality data, with clear contracts at each tier.

Primary use cases

Data lake quality management, incremental data refinement, separation of raw ingestion from analytics-ready data.

Relationships

Inbound Relationships

Resources