OpenLineage
An open standard that defines a common JSON schema for capturing data lineage events — what datasets were consumed, what was produced, and how transformations connected them.
Summary
An open standard that defines a common JSON schema for capturing data lineage events — what datasets were consumed, what was produced, and how transformations connected them.
OpenLineage is the missing observability layer for S3 lakehouses. As pipelines span Spark, Airflow, Flink, and dbt across multiple S3-backed tables, OpenLineage provides the standard format for stitching lineage together into a complete graph, regardless of which orchestrator runs the job.
- OpenLineage is a standard, not a product. It has no UI — you need a backend like Marquez or Datakin to store and visualize the lineage events.
- Integration quality varies by tool. Some integrations (Spark) are mature; others (Flink) are still developing.
enablesMarquez — the reference implementation that stores and visualizes OpenLineage eventsscoped_toLakehouse, S3 — lineage tracking for S3 lakehouse pipelines
Definition
An open standard for data lineage collection that defines a common JSON schema for capturing metadata about data pipeline runs — what datasets were consumed, what was produced, and what transformations occurred.
Data lineage information was historically locked inside individual orchestration tools (Airflow, Spark, dbt). OpenLineage provides a vendor-neutral, open standard so that lineage events from any tool can be collected, correlated, and queried in a consistent format.
Cross-tool data lineage tracking for S3 lakehouse pipelines, regulatory compliance auditing, pipeline impact analysis and debugging.
Connections 6
Inbound 3
implements3Resources 3
Official OpenLineage specification site with the JSON schema, integration guides, and ecosystem documentation.
Source repository for the OpenLineage spec with integration libraries for Python, Java, and popular orchestrators.
Overview of the data lineage ecosystem in 2025 covering OpenLineage's role as the emerging standard.