Architecture

Write-Audit-Publish

Summary

What it is

A data quality pattern where data lands in a raw S3 zone, undergoes validation, and is promoted to a curated zone only after passing audits.

Where it fits

WAP is the quality gate for S3 data lakes. It prevents bad data from reaching production consumers by isolating writes in a staging area, running validation checks, and only publishing data that passes.

Misconceptions / Traps

  • WAP requires branching or snapshot isolation. Without a table format that supports branches (Iceberg) or staging areas (lakeFS), implementing WAP on raw S3 is manual and error-prone.
  • Audit logic must be idempotent. If audits fail and data is re-submitted, the system must handle duplicates gracefully.

Key Connections

  • depends_on S3 API — data lands in S3 for staging
  • solves Schema Evolution — catches incompatible changes before they affect consumers
  • scoped_to Data Lake, S3

Definition

What it is

A data quality pattern where incoming data lands in a raw S3 zone, undergoes automated validation (schema checks, data quality rules), and is promoted to a curated zone only after passing audits.

Why it exists

Publishing unchecked data directly into production zones causes downstream breakage. This pattern gates promotion behind validation, ensuring only quality data reaches consumers.

Primary use cases

Data lake quality gates, regulated data pipelines, self-service data onboarding with automated validation.

Relationships

Outbound Relationships

scoped_to
depends_on

Resources