Write-Audit-Publish
Summary
What it is
A data quality pattern where data lands in a raw S3 zone, undergoes validation, and is promoted to a curated zone only after passing audits.
Where it fits
WAP is the quality gate for S3 data lakes. It prevents bad data from reaching production consumers by isolating writes in a staging area, running validation checks, and only publishing data that passes.
Misconceptions / Traps
- WAP requires branching or snapshot isolation. Without a table format that supports branches (Iceberg) or staging areas (lakeFS), implementing WAP on raw S3 is manual and error-prone.
- Audit logic must be idempotent. If audits fail and data is re-submitted, the system must handle duplicates gracefully.
Key Connections
depends_onS3 API — data lands in S3 for stagingsolvesSchema Evolution — catches incompatible changes before they affect consumersscoped_toData Lake, S3
Definition
What it is
A data quality pattern where incoming data lands in a raw S3 zone, undergoes automated validation (schema checks, data quality rules), and is promoted to a curated zone only after passing audits.
Why it exists
Publishing unchecked data directly into production zones causes downstream breakage. This pattern gates promotion behind validation, ensuring only quality data reaches consumers.
Primary use cases
Data lake quality gates, regulated data pipelines, self-service data onboarding with automated validation.
Relationships
Resources
lakeFS's definitive engineering blog on the WAP pattern, explaining the write-audit-publish workflow with practical implementation guidance for data lakes.
Apache Iceberg documentation covers branch-based WAP workflows natively, with hidden snapshots that enable the audit step before publishing to production.
Analysis of how Netflix implements WAP using Iceberg's hidden snapshots and internal auditor tools to guarantee data quality.