Rebuild Window Risk
The vulnerability period after a disk or node failure in an object storage cluster, during which the system operates with reduced redundancy until the failed component's data is reconstructed on healthy nodes.
Summary
The vulnerability period after a disk or node failure in an object storage cluster, during which the system operates with reduced redundancy until the failed component's data is reconstructed on healthy nodes.
Rebuild window risk is the durability concern for self-managed object storage (MinIO, Ceph, SeaweedFS). While the system remains operational during rebuilds, a second failure during the rebuild window could cause data loss — and larger disks mean longer rebuild times.
- Larger drives increase rebuild window proportionally. A 20TB HDD takes much longer to rebuild than a 4TB drive, extending the vulnerability period. This is a key argument for SSDs in durability-critical deployments.
- Erasure coding reduces but does not eliminate rebuild window risk. The risk depends on the number of simultaneous failures the erasure code can tolerate.
constrained_byRepair Bandwidth Saturation — rebuild speed is limited by available bandwidth- Geo-Dispersed Erasure Coding
solvesRebuild Window Risk — geographic redundancy reduces single-site vulnerability - Zoned Namespace (ZNS) SSD
solvesRebuild Window Risk — faster reconstruction scoped_toObject Storage
Definition
The vulnerability period after a disk, node, or site failure in an erasure-coded or replicated object store during which data has reduced redundancy, and a second failure could cause permanent data loss before reconstruction completes.
Connections 2
Outbound 1
scoped_to1Inbound 1
constrained_by1Resources 2
Ceph recovery operations documentation covering OSD rebuild procedures and the risk window during data reconstruction.
MinIO erasure coding documentation covering healing processes and the trade-offs between rebuild speed and I/O impact.