Architecture

Compliance-Aware Architectures

Lakehouse design patterns that embed regulatory requirements (GDPR, CCPA, HIPAA, SOX) directly into the data architecture rather than bolting compliance on as an afterthought, covering data retention, access control, audit trails, and deletion rights.

14 connections 3 resources

Summary

What it is

Lakehouse design patterns that embed regulatory requirements (GDPR, CCPA, HIPAA, SOX) directly into the data architecture rather than bolting compliance on as an afterthought, covering data retention, access control, audit trails, and deletion rights.

Where it fits

Compliance-aware architectures are the governance wrapper around S3-based lakehouses. They combine encryption, row/column security, PII tokenization, Object Lock, and audit logging into a cohesive design that satisfies regulatory requirements while maintaining analytics utility.

Misconceptions / Traps
  • Compliance is not just access control. GDPR's right to erasure requires the ability to delete specific records from immutable Parquet files, which is architecturally expensive in table formats.
  • Retention policies on S3 (lifecycle rules, Object Lock) operate at the object level, not the record level. Deleting a single row from a Parquet file requires rewriting the entire file.
  • Compliance requirements differ by regulation. A HIPAA-compliant lakehouse may not satisfy GDPR, and vice versa. Architectures must be designed for the union of applicable regulations.
Key Connections
  • depends_on Encryption / KMS — encryption at rest is a baseline requirement
  • depends_on Row / Column Security — fine-grained access control for regulated data
  • depends_on Audit Trails — tamper-evident logging for compliance evidence
  • depends_on PII Tokenization — data minimization and pseudonymization
  • scoped_to Lakehouse, S3 — compliance within S3-based data architectures

Definition

What it is

System designs for S3-based data lakes that embed regulatory compliance requirements (GDPR, HIPAA, SOX, data residency laws) directly into the architecture rather than bolting them on after the fact.

Why it exists

Retrofitting compliance onto an existing data lake is expensive and error-prone. Compliance-aware architectures build in encryption, access control, audit logging, retention management, and data residency constraints from the initial design, reducing regulatory risk.

Primary use cases

Regulated industry data lakes (healthcare, finance), GDPR-compliant lakehouse design, sovereign cloud data architectures.

Connections 14

Outbound 9
Inbound 5

Resources 3