Architecture

Online Embedding Refresh Pipeline

A continuous pipeline that regenerates vector embeddings as source data in S3 changes, keeping vector indexes in sync with the latest content without full re-embedding.

5 connections 2 resources

Summary

What it is

A continuous pipeline that regenerates vector embeddings as source data in S3 changes, keeping vector indexes in sync with the latest content without full re-embedding.

Where it fits

This pattern solves the stale-embedding problem in RAG and semantic search systems. When S3 objects are created, updated, or deleted, the pipeline detects changes (via S3 events), re-embeds affected content, and updates the vector index — maintaining search accuracy.

Misconceptions / Traps

"Online" does not mean real-time for all practical purposes. Pipeline latency depends on event processing, embedding model inference time, and index update propagation. Minutes-level latency is typical.
Change detection at scale is not trivial. S3 event notifications can lose events under high throughput. Consider combining event-driven and periodic full-scan reconciliation.

Key Connections

depends_on Embedding Model — requires an embedding model for re-vectorization
solves Hybrid S3 + Vector Index drift — keeps embeddings in sync with source data
constrained_by High Cloud Inference Cost — continuous embedding has ongoing cost
scoped_to Vector Indexing on Object Storage, LLM-Assisted Data Systems

Definition

What it is

A continuous or near-real-time pipeline that detects changes in S3-stored source data, regenerates affected embeddings, and updates vector indexes — keeping semantic search results fresh without full re-indexing.

Why it exists

Offline batch embedding pipelines create stale vector indexes. For applications where data changes frequently (knowledge bases, product catalogs), continuous embedding refresh ensures search results reflect the latest content.

Primary use cases

Near-real-time RAG index updates, continuous product catalog embedding, fresh knowledge base vectorization.

Recent developments

Latest signals

Shadow Index pattern is the production default for zero-downtime updates. Production 2026 pattern: instead of overwriting the live index, write to a "shadow index" — users continue querying the existing index while the new version builds. Cut over only when the shadow is fully ready. Per Medium — Vector Database Reindexing Pipeline (March 2026).
In-flight embedding generation collapses the data-sync + index-update steps. 2026 architecture: generate embeddings while data is being synced (rather than syncing data and then updating embeddings as a separate batch step). Removes a full pipeline phase + cuts end-to-end staleness latency. Per Striim — Real-Time RAG: Streaming Vector Embeddings + Low-Latency AI Search.
CDC platforms now identify themselves as embedding-pipeline backbones. Domo's "11 Best CDC Platforms of 2026" survey explicitly names vector-DB destinations as a target — CDC events (insert/update/delete) drive embedding refresh in the same pipeline that drives lakehouse refresh. Per Domo — 11 Best Change Data Capture Platforms 2026.
Drift-Adapter (arXiv 2509.23471): near-zero-downtime embedding-model upgrades. Academic paper formalizing the "model upgrade without full re-encode" problem — when you switch from text-embedding-3-small to a newer model, you can't make users wait for the full corpus re-embed. Drift-Adapter proposes a learnable mapping between old + new model embeddings that bridges the cutover. Per arXiv 2509.23471 — Drift-Adapter: Zero-Downtime Embedding Model Upgrades.
Microsoft Fabric Eventhouse ships vector similarity search natively. Eventhouse (Fabric's real-time engine) now supports vector similarity search inline with streaming data — the "stream + vector search" pattern is no longer a custom build. Per Microsoft Fabric Blog — Empowering Real-Time Searches: Vector Similarity Search with Eventhouse.
2026 framing: monitor drift, measure retrieval quality, refresh when models or data change. The Encord "Complete Guide to Embeddings in 2026" formalizes the three operational disciplines: drift detection (per-query embedding-distance shifts), retrieval-quality measurement (recall@k, MRR), refresh triggering (corpus changes + model upgrades). Per Encord — Complete Guide to Embeddings 2026.

Connections 5

Outbound 5

scoped_to3

S3 LLM-Assisted Data Systems Vector Indexing on Object Storage

depends_on1

Embedding Model

constrained_by1

High Cloud Inference Cost

Resources 2

BlogHigh

aws.amazon.com/blogs/big-data/generate-vector-embeddings-for...

AWS Big Data Blog showing a Lambda-based embedding refresh pipeline that processes S3 events to keep vector indexes current.

GitHubHigh

github.com/aws-samples/text-embeddings-pipeline-for-rag

AWS sample repository providing a complete pipeline for continuous embedding generation from S3-stored documents.