Architecture

Audit Trails

The practice of recording a tamper-evident history of all data access, modification, and governance events within an S3-based lakehouse, enabling regulatory compliance and forensic investigation.

6 connections 3 resources

Summary

What it is

The practice of recording a tamper-evident history of all data access, modification, and governance events within an S3-based lakehouse, enabling regulatory compliance and forensic investigation.

Where it fits

Audit trails span the full stack from S3 access logs (who read/wrote which objects) through table format commit history (which snapshots were created when) to catalog-level governance events (who changed table permissions). They are a mandatory component of compliance-aware architectures.

Misconceptions / Traps
  • S3 server access logs and CloudTrail events are necessary but not sufficient. They record HTTP-level operations, not semantic operations (e.g., "user X queried customer PII in table Y").
  • Table format commit metadata provides a logical audit trail (who committed what changes) but does not capture read access. Full audit requires both write-side and read-side logging.
  • Audit log storage on S3 itself must be protected with Object Lock/WORM to prevent tampering. Auditing is only useful if the audit logs are immutable.
Key Connections
  • scoped_to Lakehouse, S3 — audit logging across the S3 data stack
  • depends_on Object Lock / WORM Semantics — immutable audit log storage
  • enables Compliance-Aware Architectures — audit trails are a regulatory requirement
  • depends_on OpenMetadata, DataHub — metadata platforms that track governance events

Definition

What it is

The pattern of recording all data access, modification, and administrative operations on S3-stored datasets into immutable, append-only logs for compliance, forensics, and governance purposes.

Why it exists

Regulatory requirements (SOX, HIPAA, GDPR) mandate that organizations maintain records of who accessed what data and when. S3 server access logs and CloudTrail provide raw events, but a structured audit trail architecture layers accountability on top.

Primary use cases

Compliance auditing for S3 data lakes, forensic investigation of data breaches, access pattern analysis for governance.

Recent developments

Latest signals
  • CloudTrail Lake closes to new customers May 31, 2026 — CloudWatch is the replacement path. AWS is closing CloudTrail Lake to new sign-ups; existing customers can continue but new deployments should plan on CloudWatch as the unified logs + analytics surface. The architectural shift consolidates security + ops + compliance into one. Per AWS Docs — CloudTrail Lake Availability Change.
  • CloudWatch now exposes native Apache Iceberg API access for CloudTrail logs. The replacement architecture: CloudWatch ingests CloudTrail data + exposes it via Apache Iceberg APIs, with OCSF + OpenTelemetry as native formats. Audit logs become a lakehouse-queryable Iceberg table — no separate analytics warehouse required. Per AWS Docs — CloudTrail Lake Availability Change.
  • S3 data events (GetObject/PutObject/DeleteObject) are now compliance-mandated for FedRAMP + SOC 2. Multiple compliance frameworks (FedRAMP Moderate, SOC 2 Communication & Information) explicitly require CloudTrail S3 data-event logging — moving from "nice-to-have" to "you fail the audit without it." Per AWS Docs — Enabling CloudTrail Logging for S3 Buckets.
  • S3 access logging + CloudTrail are now run side-by-side, not either/or. 2026 best-practice consensus: S3 server access logs give the per-request HTTP-layer view, CloudTrail gives the AWS-control-plane view. Mature audit-trail architectures run both — the two log streams cover different threat models. Per OneUptime — How to Set Up S3 Access Logging for Audit Trails.
  • OCSF (Open Cybersecurity Schema Framework) becoming the standard log shape. AWS, Microsoft, Google + the major SIEM vendors converged on OCSF as the common audit-log schema — log producers + consumers no longer need bespoke parsers per source. CloudWatch's native OCSF support codifies this. Per AWS Docs — CloudTrail Lake Availability Change.
  • TocConsulting CloudTrail Best Practices 2026: multi-region trail + S3 data-events + Bucket-Lock destination. The 2026 hardened CloudTrail config: enable in every region; opt in to S3 data events for all critical buckets; ship to a destination bucket protected by Object Lock (ransomware can't delete the audit trail it generated). Per TocConsulting — AWS CloudTrail Security Best Practices 2026.

Connections 6

Outbound 5
Inbound 1

Resources 3