Architecture

Offline Embedding Pipeline

Summary

What it is

A batch pattern where embeddings are generated from S3-stored data on a schedule, with resulting vectors written back to object storage or a vector index.

Where it fits

This pattern is the cost-effective way to add semantic search to S3 data. Instead of real-time embedding on every query, data is vectorized in batch — keeping inference costs predictable and avoiding always-on GPU infrastructure.

Misconceptions / Traps

"Offline" means batch, not "never updated." A daily or weekly refresh is typical. Freshness requirements determine the schedule.
Embedding pipeline failures can leave the vector index out of sync with S3 data. Idempotent, resumable pipelines are essential.

Key Connections

depends_on S3 API — reads source data from and writes embeddings to S3
constrained_by High Cloud Inference Cost — the motivating economic constraint
scoped_to LLM-Assisted Data Systems, S3

Definition

What it is

A batch pattern where embeddings are generated from S3-stored data on a schedule, and the resulting vectors are written back to object storage or a vector index.

Why it exists

Real-time embedding generation is expensive and unnecessary for many use cases. Processing S3 data in batch keeps inference costs predictable and avoids the need for always-on GPU infrastructure.

Primary use cases

Periodic embedding refresh for document corpora on S3, bulk vectorization of historical data, populating vector indexes for RAG systems.

Relationships

Outbound Relationships

scoped_to

LLM-Assisted Data Systems S3

depends_on

S3 API

constrained_by

High Cloud Inference Cost

Resources

BlogHigh

aws.amazon.com/blogs/big-data/generate-vector-embeddings-for...

AWS Big Data Blog showing how to build a batch embedding pipeline that reads from S3, generates vectors via Lambda, and ingests into OpenSearch.

GitHubHigh

github.com/aws-samples/text-embeddings-pipeline-for-rag

Official AWS sample repository providing a complete pipeline to convert documents stored in S3 into text embeddings for RAG applications.

BlogMedium

blog.skypilot.co/large-scale-embedding/

SkyPilot engineering blog demonstrating 9x faster embedding generation at scale across cloud GPUs for 30M+ records, with S3 as the source/sink.