Technology

AWS S3

Amazon's fully managed object storage service — the origin and reference implementation of the S3 API. As of December 2025, the maximum object size is 50 TB (up from 5 TB).

11 connections 4 resources

Summary

What it is

Amazon's fully managed object storage service — the origin and reference implementation of the S3 API. As of December 2025, the maximum object size is 50 TB (up from 5 TB).

Where it fits

AWS S3 is the gravitational center of the ecosystem. It defined the API that became the de-facto standard, and most tools in this index were built to work with AWS S3 first and other providers second. The 50 TB object limit shift means massive AI training datasets and 8K video can now be stored as single atomic objects rather than complex multi-part sequences.

Misconceptions / Traps
  • AWS S3 is now strongly consistent (read-after-write), but code written against the old eventual consistency model may still contain unnecessary workarounds.
  • S3 storage is cheap; S3 API calls and egress are not. Cost optimization requires understanding request pricing and transfer charges, not just storage GB.
  • The 50 TB limit applies to individual objects; existing tooling that splits datasets at the old 5 TB boundary may need updating.
Key Connections
  • implements S3 API — the reference implementation of the standard
  • enables Lakehouse Architecture — provides the storage layer for lakehouses
  • enables Separation of Storage and Compute — foundational to the pattern
  • used_by Medallion Architecture — each layer stores data on S3
  • constrained_by Object Listing Performance, Lack of Atomic Rename, Egress Cost — key operational limitations

Definition

What it is

Amazon's fully managed object storage service. The original implementation that defined the S3 API and established object storage as a category. As of December 2025, supports individual objects up to 50 TB (increased from 5 TB), enabling massive AI training datasets and high-resolution media to be stored as single atomic objects.

Why it exists

To provide scalable, durable, low-cost storage accessible over HTTP, decoupled from any specific compute or filesystem.

Primary use cases

Data lake storage, static asset hosting, backup and archival, analytics data staging, ML training data storage, large-object AI dataset ingest (50 TB single-object).

Connections 11

Outbound 9
Inbound 2
indexes1
depends_on1

Resources 4