Technology

Marquez

The reference implementation for OpenLineage — an open-source metadata and lineage service with a web UI for visualizing data flows across S3-based pipelines.

5 connections 3 resources

Summary

What it is

The reference implementation for OpenLineage — an open-source metadata and lineage service with a web UI for visualizing data flows across S3-based pipelines.

Where it fits

Marquez is the backend that makes OpenLineage actionable. It collects lineage events from Spark, Airflow, dbt, and other tools, stores them in a searchable database, and provides a UI for engineers to trace data provenance and debug pipeline failures.

Misconceptions / Traps
  • Marquez requires instrumentation. Pipelines must emit OpenLineage events via integrations or SDKs — lineage does not appear automatically.
  • Metadata storage can become a bottleneck at massive scale. Production deployments need careful indexing and retention policies.
Key Connections
  • implements OpenLineage — reference implementation of the lineage standard
  • enables Lakehouse Architecture — governance and observability layer
  • scoped_to S3, Lakehouse

Definition

What it is

An open-source metadata and lineage service that serves as the reference implementation for the OpenLineage standard. Provides a web UI and REST API for collecting, storing, and visualizing data lineage across S3-based data pipelines.

Why it exists

As data pipelines on S3 grow in complexity, engineers need visibility into where data comes from, how it transforms, and where it flows. Marquez collects OpenLineage events from Spark, Airflow, and other tools and provides a searchable, visual lineage graph.

Primary use cases

Data lineage visualization for S3 lakehouse pipelines, pipeline debugging and impact analysis, regulatory compliance and data auditing.

Connections 5

Outbound 4
Inbound 1
enables1

Resources 3