Architecture

Capacity Planning

The practice of forecasting and provisioning storage, compute, and network resources for S3-based data systems based on projected data volumes, query patterns, ingestion rates, and growth trajectories.

6 connections 3 resources

Summary

What it is

Where it fits

Capacity planning is the operational discipline that prevents S3-based lakehouses from either over-provisioning (wasting money) or under-provisioning (hitting throttling limits, running out of catalog capacity, or degrading query performance under load).

Misconceptions / Traps

S3 storage is "infinite" but S3 request rates are not. Capacity planning must account for request-per-second limits (3,500 PUT/5,500 GET per prefix partition), not just storage volume.
Catalog capacity is often the binding constraint. Hive Metastore databases, Glue API rate limits, and Nessie commit throughput all have finite capacity that must be planned for.
Data growth rate is not the same as metadata growth rate. A single streaming ingestion job can produce millions of small files (and millions of metadata entries) per day even if total data volume is modest.

Key Connections

scoped_to S3, Lakehouse — resource planning for S3-based data systems
constrains Request Amplification — capacity limits determine acceptable request patterns
constrains Metadata Overhead at Scale — catalog sizing must be planned
relates_to Benchmarking Methodology — benchmarks provide the data for capacity models

Definition

What it is

The discipline of forecasting storage, throughput, and API call requirements for S3-based data systems based on growth trends, ingestion rates, query patterns, and retention policies.

Why it exists

S3 scales elastically, but costs scale linearly with usage. Without capacity planning, organizations face surprise bills from unchecked data growth, unplanned API costs from small files, and throughput limits from request rate partitioning.

Primary use cases

S3 cost forecasting, request rate planning for high-throughput ingestion, storage growth modeling for data lakes.

Recent developments

Latest signals

2026 reality: storage architecture matters as much as GPU count. Capacity planning shifted in 2026 from GPU-centric to storage-architecture-centric — retrieval efficiency, latency consistency, caching strategy, data movement optimization determine whether expensive accelerator capacity runs at 90%+ or starves. Per Prolime Host — Why AI Storage Architecture Is Becoming More Important Than GPU Count in 2026.
Hyperscaler 2026 capex hits $600B; ~75% AI-data-center + accelerators. AWS + Microsoft + Google + Meta + Oracle combined hit $600B capex in 2026, ~75% tied to AI infrastructure — capacity planning at the platform level is now a board-room conversation, not just an SRE concern. Per BloombergNEF — AI Data Center Build Advances at Full Speed: Five Things to Know.
McKinsey 2030 forecast: 156 GW data-center power, $5.2T CapEx. The longer-horizon capacity-planning anchor: 156 GW of data-center power demand by 2030, $5.2T cumulative CapEx to get there. Power is the binding constraint, not silicon. Per IoT Analytics — Data Center Infrastructure Market: AI-Driven CapEx.
Google Managed Lustre: 10 TB/s bandwidth (10× year-over-year), 80 PB capacity. Hyperscaler reference numbers for "what AI capacity planning targets" — Google Cloud's Managed Lustre service is 10× last year's bandwidth at 80 PB capacity. Rapid Buckets delivers sub-ms latency + 20M ops/sec to keep 95%+ accelerator utilization. Per Google Cloud Blog — AI Infrastructure at Next '26.
GPU roadmap context: GB300 +50% over Blackwell; Vera Rubin 8 exaflops/rack in 2026. Capacity planning has to absorb the GPU-perf-doubling cycle: GB300 Blackwell Ultra +50% perf, Vera Rubin at 8 exaflops per rack arriving in 2026, Virgo Network up to 80,000 GPUs per DC. Per IESVE — From Data Centres to AI Factories: NVIDIA GTC 2026.
Liquid cooling is now mandatory at high density. Vera Rubin + Kyber rack architectures require predominantly liquid cooling — air can't move enough heat at the new density. Capacity planning that doesn't model liquid-cooling power + plumbing is incomplete. Per Data Center Knowledge — 2026 Predictions: AI Sparks Data Center Power Revolution.