Architecture

Offline Embedding Pipeline

Summary

What it is

A batch pattern where embeddings are generated from S3-stored data on a schedule, with resulting vectors written back to object storage or a vector index.

Where it fits

This pattern is the cost-effective way to add semantic search to S3 data. Instead of real-time embedding on every query, data is vectorized in batch — keeping inference costs predictable and avoiding always-on GPU infrastructure.

Misconceptions / Traps

  • "Offline" means batch, not "never updated." A daily or weekly refresh is typical. Freshness requirements determine the schedule.
  • Embedding pipeline failures can leave the vector index out of sync with S3 data. Idempotent, resumable pipelines are essential.

Key Connections

  • depends_on S3 API — reads source data from and writes embeddings to S3
  • constrained_by High Cloud Inference Cost — the motivating economic constraint
  • scoped_to LLM-Assisted Data Systems, S3

Definition

What it is

A batch pattern where embeddings are generated from S3-stored data on a schedule, and the resulting vectors are written back to object storage or a vector index.

Why it exists

Real-time embedding generation is expensive and unnecessary for many use cases. Processing S3 data in batch keeps inference costs predictable and avoids the need for always-on GPU infrastructure.

Primary use cases

Periodic embedding refresh for document corpora on S3, bulk vectorization of historical data, populating vector indexes for RAG systems.

Relationships

Outbound Relationships

depends_on

Resources