Topic

Time Travel

The ability to query a dataset as it existed at a previous point in time by leveraging immutable snapshots and metadata history maintained by table formats on object storage.

4 connections 3 resources

Summary

What it is

The ability to query a dataset as it existed at a previous point in time by leveraging immutable snapshots and metadata history maintained by table formats on object storage.

Where it fits

Time travel is a core capability enabled by table formats (Iceberg, Delta, Hudi) on S3. Each write operation produces a new snapshot rather than mutating files in place, and the snapshot history allows any prior version of a table to be read without restoring from backup.

Misconceptions / Traps
  • Time travel is not free storage. Every snapshot retains references to data files; without periodic snapshot expiration and orphan file cleanup, storage costs grow linearly with write frequency.
  • Time travel depth is bounded by retention policy, not by the format itself. Once snapshots are expired and their data files garbage-collected, those points in time are gone permanently.
  • Time travel queries on S3 incur the same GET request costs as current queries. Reading historical data does not bypass S3 pricing.
Key Connections
  • scoped_to Table Formats — time travel is a table format capability
  • enabled_by Apache Iceberg, Delta Lake, Apache Hudi — all three formats support snapshot-based time travel
  • scoped_to Data Versioning — time travel is a form of data versioning at the table level
  • constrained_by Metadata Overhead at Scale — deep snapshot history increases metadata volume

Definition

What it is

The ability to query a dataset as it existed at a previous point in time by referencing historical snapshots maintained by a table format on object storage.

Why it exists

S3 objects are immutable, but logical datasets change constantly through inserts, updates, and deletes. Time travel leverages the snapshot history maintained by table formats (Iceberg, Delta, Hudi) to provide deterministic, reproducible reads of past states without maintaining separate backup copies.

Connections 4

Outbound 3
Inbound 1

Resources 3