Architecture

Geo-Dispersed Erasure Coding

An erasure coding scheme that distributes data fragments and parity blocks across geographically separated sites, providing durability and data locality at lower storage overhead than full replication.

5 connections 3 resources

Summary

What it is

Where it fits

Geo-dispersed erasure coding extends the durability model of object storage beyond a single data center. Instead of replicating full copies to each site (3x overhead), data is erasure-coded across sites (typically 1.2-1.5x overhead) while maintaining the ability to reconstruct from any subset of sites.

Misconceptions / Traps

Geo-dispersed erasure coding increases read latency. Reconstruction requires fetching fragments from multiple geographic sites, adding network round-trip time to every read.
Failure domain is now geographic. If too many sites are unreachable simultaneously (beyond the erasure code's tolerance), data becomes temporarily unavailable — unlike multi-copy replication where any single copy suffices.

Key Connections

solves Rebuild Window Risk — erasure coding across sites reduces single-site vulnerability
constrained_by Repair Bandwidth Saturation — cross-site repair consumes WAN bandwidth
scoped_to Object Storage, Geo / Edge Object Storage

Definition

What it is

An erasure coding scheme that distributes data fragments across multiple geographic sites, so that any configurable subset of sites can reconstruct the full object. Provides both durability and data locality across regions.

Why it exists

Traditional replication (3 copies across 3 AZs) is expensive. Geo-dispersed erasure coding achieves equivalent or better durability at lower storage overhead (typically 1.2-1.5x vs. 3x for replication) while keeping data fragments close to multiple compute locations.

Primary use cases

Multi-region durable storage with low overhead, cross-site data availability, disaster-resilient object storage.

Recent developments

Latest signals

Hierarchical multi-region EC: two-tier encode (cross-region + within-region). Production architectures encode data once for cross-region distribution, then re-encode shards within each region — combines geo-durability with site-local reconstruction speed. USPTO 11356120 documents the pattern. Per USPTO 11356120 — Hierarchical Erasure Coding for Multi-Region Storage.
Storage overhead: 1.2-1.5× vs 3× for triple replication. Reed-Solomon (k+m schemes) achieves equivalent durability at half the storage cost of triple-replication — the structural economic argument for geo-EC in petabyte-scale deployments. Per USPTO 11356120 — Hierarchical EC for Multi-Region Storage.
Repair-efficient placement remains the active research frontier. ScienceDirect's "Optimal placement for repair-efficient erasure codes in geo-diverse storage centres" frames the placement-vs-repair-bandwidth tradeoff academically — choosing where to put each shard determines how much cross-region bandwidth a recovery costs. Per ScienceDirect — Optimal Placement for Repair-Efficient Erasure Codes in Geo-Diverse Storage Centres.
Hybrid geo-dispersed: local shards + predictive-network EC slice placement. Production-grade approach: store some EC slices locally; use network-condition predictions to decide where to disperse the rest. Reduces tail-latency vs naive geo-dispersion. Per USPTO 11662938 — Object Storage and Access Management Systems and Methods.
Group-by-geography chunk-service architecture is the canonical implementation. Production systems group chunk services by geographic location + create separate EC chunk-service groups per region with their own EC schemes — region-specific durability + cross-region failover. Per USPTO 9665428 — Distributing EC Fragments in a Geo-Distributed Storage System.
Multi-site object storage as the 2026 high-availability default for petabyte-scale. USPTO 10887416 codifies efficient HA + storage-efficiency for multi-site object storage — petabyte+ deployments now reach for geo-EC by default rather than active-active replication, which scales worse on storage cost. Per USPTO 10887416 — Efficient HA + Storage Efficiency in Multi-Site Object Storage.

Connections 5

Outbound 5

scoped_to3

Object Storage S3 Geo / Edge Object Storage

constrained_by2

Rebuild Window Risk Repair Bandwidth Saturation

Resources 3

DocsHigh

min.io/docs/minio/linux/operations/concepts/erasure-coding.h...

MinIO erasure coding documentation covering data protection, healing, and the trade-offs between parity and storage efficiency.

BlogHigh

ceph.io/

Ceph community analysis of erasure coding overhead comparing storage efficiency and rebuild costs across coding profiles.

DocsHigh

docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability...

AWS S3 durability model documentation explaining how S3 achieves 11 nines of durability through cross-AZ erasure coding.