Architecture

Write-Audit-Publish

Summary

What it is

A data quality pattern where data lands in a raw S3 zone, undergoes validation, and is promoted to a curated zone only after passing audits.

Where it fits

WAP is the quality gate for S3 data lakes. It prevents bad data from reaching production consumers by isolating writes in a staging area, running validation checks, and only publishing data that passes.

Misconceptions / Traps

WAP requires branching or snapshot isolation. Without a table format that supports branches (Iceberg) or staging areas (lakeFS), implementing WAP on raw S3 is manual and error-prone.
Audit logic must be idempotent. If audits fail and data is re-submitted, the system must handle duplicates gracefully.

Key Connections

depends_on S3 API — data lands in S3 for staging
solves Schema Evolution — catches incompatible changes before they affect consumers
scoped_to Data Lake, S3

Definition

What it is

A data quality pattern where incoming data lands in a raw S3 zone, undergoes automated validation (schema checks, data quality rules), and is promoted to a curated zone only after passing audits.

Why it exists

Publishing unchecked data directly into production zones causes downstream breakage. This pattern gates promotion behind validation, ensuring only quality data reaches consumers.

Primary use cases

Data lake quality gates, regulated data pipelines, self-service data onboarding with automated validation.

Relationships

Outbound Relationships

scoped_to

Data Lake S3

depends_on

S3 API

solves

Schema Evolution

Resources

BlogHigh

lakefs.io/blog/data-engineering-patterns-write-audit-publish...

lakeFS's definitive engineering blog on the WAP pattern, explaining the write-audit-publish workflow with practical implementation guidance for data lakes.

DocsHigh

iceberg.apache.org/docs/latest/

Apache Iceberg documentation covers branch-based WAP workflows natively, with hidden snapshots that enable the audit step before publishing to production.

BlogMedium

vutr.substack.com/p/how-does-netflix-ensure-the-data

Analysis of how Netflix implements WAP using Iceberg's hidden snapshots and internal auditor tools to guarantee data quality.