Technology

Marquez

The reference implementation for OpenLineage — an open-source metadata and lineage service with a web UI for visualizing data flows across S3-based pipelines.

5 connections 3 resources

Summary

What it is

The reference implementation for OpenLineage — an open-source metadata and lineage service with a web UI for visualizing data flows across S3-based pipelines.

Where it fits

Marquez is the backend that makes OpenLineage actionable. It collects lineage events from Spark, Airflow, dbt, and other tools, stores them in a searchable database, and provides a UI for engineers to trace data provenance and debug pipeline failures.

Misconceptions / Traps

Marquez requires instrumentation. Pipelines must emit OpenLineage events via integrations or SDKs — lineage does not appear automatically.
Metadata storage can become a bottleneck at massive scale. Production deployments need careful indexing and retention policies.

Key Connections

implements OpenLineage — reference implementation of the lineage standard
enables Lakehouse Architecture — governance and observability layer
scoped_to S3, Lakehouse

Definition

What it is

An open-source metadata and lineage service that serves as the reference implementation for the OpenLineage standard. Provides a web UI and REST API for collecting, storing, and visualizing data lineage across S3-based data pipelines.

Why it exists

As data pipelines on S3 grow in complexity, engineers need visibility into where data comes from, how it transforms, and where it flows. Marquez collects OpenLineage events from Spark, Airflow, and other tools and provides a searchable, visual lineage graph.

Primary use cases

Data lineage visualization for S3 lakehouse pipelines, pipeline debugging and impact analysis, regulatory compliance and data auditing.

Recent developments

Latest signals

Source mix note: Marquez's recent corpus is dominated by lineage-tool aggregator content rather than primary project posts.

Reference implementation status for OpenLineage holds. Per the MarquezProject/marquez repository, Marquez continues to serve as the reference OpenLineage implementation — collecting, aggregating, and visualizing OpenLineage events from Spark, Airflow, and other pipeline tools. Per BaseDash's "Best data lineage tools 2026" survey, Marquez remains the recommended option for organizations adopting OpenLineage end-to-end without committing to a managed catalog like DataHub or OpenMetadata.