Tiered Storage
Summary
What it is
Moving data between hot, warm, and cold storage tiers based on access frequency. S3 itself offers tiering (Standard, Infrequent Access, Glacier).
Where it fits
Tiered storage is the cost optimization layer for S3 data. It ensures frequently accessed data is fast and expensive while archival data is slow and cheap — a critical pattern for large data lakes where 80%+ of data is rarely accessed.
Misconceptions / Traps
- Retrieval from cold tiers (Glacier, Deep Archive) has latency measured in minutes to hours. Do not tier data that might be needed for interactive queries.
- S3 Intelligent-Tiering automates tier transitions but has per-object monitoring charges. For predictable access patterns, explicit lifecycle rules are cheaper.
Key Connections
solvesEgress Cost — keeps hot data close to compute, cold data in cheap tiersconstrained_byVendor Lock-In — tiering policies and pricing are provider-specificscoped_toS3, Object Storage
Definition
What it is
The pattern of moving data between hot, warm, and cold storage tiers based on access frequency, with S3 or S3-compatible stores serving as one or more tiers.
Why it exists
Not all data is accessed equally. Frequently queried data benefits from fast (expensive) storage; archival data can reside in cheap, slow tiers. S3 itself offers tiering (Standard, Infrequent Access, Glacier), and cross-provider tiering adds further cost optimization.
Primary use cases
Cost optimization for large data lakes, lifecycle management of S3-stored datasets, compliance archival.
Relationships
Resources
Official AWS documentation of all S3 storage classes (Standard, Intelligent-Tiering, Glacier, Deep Archive), the canonical reference for S3-based tiered storage.
Official Apache Kafka documentation for tiered storage (KIP-405), which offloads older log segments to S3/HDFS while keeping recent data on local brokers.
Confluent's production-grade tiered storage documentation showing how Kafka integrates with S3 for virtually unlimited, low-cost retention.