Audit Trails
The practice of recording a tamper-evident history of all data access, modification, and governance events within an S3-based lakehouse, enabling regulatory compliance and forensic investigation.
Summary
The practice of recording a tamper-evident history of all data access, modification, and governance events within an S3-based lakehouse, enabling regulatory compliance and forensic investigation.
Audit trails span the full stack from S3 access logs (who read/wrote which objects) through table format commit history (which snapshots were created when) to catalog-level governance events (who changed table permissions). They are a mandatory component of compliance-aware architectures.
- S3 server access logs and CloudTrail events are necessary but not sufficient. They record HTTP-level operations, not semantic operations (e.g., "user X queried customer PII in table Y").
- Table format commit metadata provides a logical audit trail (who committed what changes) but does not capture read access. Full audit requires both write-side and read-side logging.
- Audit log storage on S3 itself must be protected with Object Lock/WORM to prevent tampering. Auditing is only useful if the audit logs are immutable.
scoped_toLakehouse, S3 — audit logging across the S3 data stackdepends_onObject Lock / WORM Semantics — immutable audit log storageenablesCompliance-Aware Architectures — audit trails are a regulatory requirementdepends_onOpenMetadata, DataHub — metadata platforms that track governance events
Definition
The pattern of recording all data access, modification, and administrative operations on S3-stored datasets into immutable, append-only logs for compliance, forensics, and governance purposes.
Regulatory requirements (SOX, HIPAA, GDPR) mandate that organizations maintain records of who accessed what data and when. S3 server access logs and CloudTrail provide raw events, but a structured audit trail architecture layers accountability on top.
Compliance auditing for S3 data lakes, forensic investigation of data breaches, access pattern analysis for governance.
Connections 6
Outbound 5
scoped_to2enables1Inbound 1
depends_on1Resources 3
S3 server access logging documentation for recording all requests made to a bucket, forming the foundation of S3 audit trails.
AWS CloudTrail documentation for capturing API-level audit logs of all S3 management and data events.
Iceberg table inspection queries providing snapshot history, manifest lists, and change tracking for table-level audit trails.