Standard

Data Contracts

A formal agreement between data producers and data consumers that specifies the schema, semantics, SLAs, and quality expectations for a dataset, typically enforced as a machine-readable specification applied at ingestion boundaries.

6 connections 3 resources

Summary

What it is

A formal agreement between data producers and data consumers that specifies the schema, semantics, SLAs, and quality expectations for a dataset, typically enforced as a machine-readable specification applied at ingestion boundaries.

Where it fits

Data contracts operate at the interface between data producers writing to S3 and consumers querying that data. In lakehouse architectures, they prevent schema drift, enforce data quality at write time, and make the Write-Audit-Publish pattern enforceable rather than advisory.

Misconceptions / Traps
  • Data contracts are not just JSON Schema files. A useful contract includes ownership, SLAs, quality rules, and semantic definitions — not just structural schema.
  • Enforcing contracts at write time adds latency to ingestion pipelines. The tradeoff between strict enforcement and ingestion throughput must be explicitly managed.
  • Contracts require organizational buy-in from producers. A contract imposed unilaterally by consumers without producer commitment is effectively a validation check, not a contract.
Key Connections
  • scoped_to Table Formats, Lakehouse — enforced at the table boundary on S3
  • enables Write-Audit-Publish — contracts define what "valid" means in the audit step
  • enables Schema Evolution — contracts govern how schemas are allowed to change
  • relates_to Compliance-Aware Architectures — contracts formalize data governance requirements

Definition

What it is

A specification that defines the schema, semantics, quality guarantees, and SLAs of a dataset as a formal agreement between data producers and data consumers. Typically expressed as a YAML or JSON document versioned alongside the data pipeline.

Why it exists

In S3-based data lakes, producers write files without coordination with consumers, leading to schema drift, quality degradation, and broken downstream pipelines. Data contracts establish explicit, enforceable agreements that prevent uncoordinated changes from propagating through the lake.

Primary use cases

Schema enforcement at ingestion boundaries, producer-consumer SLA definition, automated data quality validation in lakehouse pipelines.

Connections 6

Outbound 5
Inbound 1

Resources 3