Architecture

Audit Trails

The practice of recording a tamper-evident history of all data access, modification, and governance events within an S3-based lakehouse, enabling regulatory compliance and forensic investigation.

6 connections 3 resources

Summary

What it is

The practice of recording a tamper-evident history of all data access, modification, and governance events within an S3-based lakehouse, enabling regulatory compliance and forensic investigation.

Where it fits

Audit trails span the full stack from S3 access logs (who read/wrote which objects) through table format commit history (which snapshots were created when) to catalog-level governance events (who changed table permissions). They are a mandatory component of compliance-aware architectures.

Misconceptions / Traps
  • S3 server access logs and CloudTrail events are necessary but not sufficient. They record HTTP-level operations, not semantic operations (e.g., "user X queried customer PII in table Y").
  • Table format commit metadata provides a logical audit trail (who committed what changes) but does not capture read access. Full audit requires both write-side and read-side logging.
  • Audit log storage on S3 itself must be protected with Object Lock/WORM to prevent tampering. Auditing is only useful if the audit logs are immutable.
Key Connections
  • scoped_to Lakehouse, S3 — audit logging across the S3 data stack
  • depends_on Object Lock / WORM Semantics — immutable audit log storage
  • enables Compliance-Aware Architectures — audit trails are a regulatory requirement
  • depends_on OpenMetadata, DataHub — metadata platforms that track governance events

Definition

What it is

The pattern of recording all data access, modification, and administrative operations on S3-stored datasets into immutable, append-only logs for compliance, forensics, and governance purposes.

Why it exists

Regulatory requirements (SOX, HIPAA, GDPR) mandate that organizations maintain records of who accessed what data and when. S3 server access logs and CloudTrail provide raw events, but a structured audit trail architecture layers accountability on top.

Primary use cases

Compliance auditing for S3 data lakes, forensic investigation of data breaches, access pattern analysis for governance.

Connections 6

Outbound 5
Inbound 1

Resources 3