Pain Point

Small Files Amplification

The compounding negative effect of large numbers of small files on object storage operations — not just query performance (the Small Files Problem), but also metadata operations, compaction jobs, object listing, and garbage collection.

3 connections 2 resources

Summary

What it is

The compounding negative effect of large numbers of small files on object storage operations — not just query performance (the Small Files Problem), but also metadata operations, compaction jobs, object listing, and garbage collection.

Where it fits

Small files amplification extends the Small Files Problem beyond query performance into operational burden. Each small file incurs metadata overhead, lifecycle evaluation cost, listing time, and compaction work. At billions of small files, these operational costs dominate storage management.

Misconceptions / Traps
  • Compaction reduces the number of data files but generates new metadata (manifest files, commit logs). In extreme cases, compaction of billions of small files can itself become a bottleneck.
  • Small files often originate from streaming ingestion (Flink, Kafka Connect) where each micro-batch produces a separate file. Fixing the source is more effective than compacting after the fact.
Key Connections
  • amplifies Small Files Problem — operational impact beyond query performance
  • constrains Metadata Overhead at Scale — each small file adds metadata entries
  • SeaweedFS solves Small Files Amplification — O(1) lookup architecture
  • scoped_to S3, Object Storage, Table Formats

Definition

What it is

The compounding effect where small files degrade not just query performance but also metadata operations, compaction efficiency, listing throughput, and garbage collection — each degraded operation amplifying the original problem.

Connections 3

Outbound 2
Inbound 1
solves1

Resources 2