CoreWeave AI Object Storage
Fully managed S3-compatible object storage from CoreWeave, purpose-built for AI workloads (training datasets, model weights, checkpoints, embedding stores). Architecture separates compute from storage but keeps GPU-local caching tight via **Local Object Transport Accelerator (LOTA)** — a proxy service that runs on the GPU-node hardware, presents an S3 endpoint locally, and uses the node's disks as a tiered cache. Net effect: up to **7 GB/s per GPU** sustained read throughput for AI training pipelines.
Definition
Fully managed S3-compatible object storage from CoreWeave, purpose-built for AI workloads (training datasets, model weights, checkpoints, embedding stores). Architecture separates compute from storage but keeps GPU-local caching tight via **Local Object Transport Accelerator (LOTA)** — a proxy service that runs on the GPU-node hardware, presents an S3 endpoint locally, and uses the node's disks as a tiered cache. Net effect: up to **7 GB/s per GPU** sustained read throughput for AI training pipelines.
Generic S3-compatible object storage was built for general-purpose workloads — high throughput per request but not per-GPU optimized. AI training spends most of its wall-time in dataloader code waiting for objects to arrive; bottlenecking on object-storage egress costs effective utilization (GPU starvation). CoreWeave's bet: co-locate the storage access layer on the GPU node itself via LOTA, exposing S3 API while transparently caching to local NVMe. The 2026 expansion plan extends LOTA acceleration beyond CoreWeave's own cloud into other public clouds and on-prem environments.
Large LLM training pipelines where data movement is the gating factor, checkpoint shuffle stores during long training runs, model-weight distribution across multi-GPU inference fleets, embedding stores for retrieval-augmented training workflows, and any AI workload where per-GPU sustained throughput matters more than raw aggregate capacity.
Recent developments
- LOTA — Local Object Transport Accelerator — delivers 7 GB/s per GPU. Proxy service running on the GPU nodes themselves, acting as an S3 endpoint with local-disk caching. Industry-leading per-GPU throughput. Per CoreWeave AI Object Storage docs.
- LOTA expanding to other clouds and on-prem in early 2026. The acceleration layer was originally CoreWeave-cloud-only; the 2026 roadmap extends it to other public clouds and customer-owned on-prem GPU clusters. Per CoreWeave AI storage blog.
- S3-compatible API — drop-in for existing AI tooling. Despite the LOTA architecture, the external surface is S3 — any S3-compatible tool (boto3, rclone, vLLM, etc.) works without modification. Per CoreWeave S3 compatibility reference.
- Distributed architecture: compute/storage separation with GPU-node-local caching. Engineered to serve large training datasets through a distributed design where data is spread across the GPU nodes themselves, enabling highly parallelized reads and writes. Per CoreWeave product launch announcement.
Connections 5
Outbound 4
scoped_to1solves1alternative_to1enables1Inbound 1
competes_with1