Topic

Vector Indexing on Object Storage

Summary

What it is

The practice of building and querying vector indexes over embeddings derived from data stored in S3.

Where it fits

This topic connects the LLM side of the index to the storage side. Embeddings are generated from S3-stored content, indexed for similarity search, and the results point back to S3 objects.

Misconceptions / Traps

Vector indexes are not a replacement for structured queries. They answer "what's semantically similar?" not "what matches this predicate?"
Storing vector indexes on S3 (e.g., LanceDB) is viable but query latency is higher than dedicated vector databases with in-memory indexes.

Key Connections

scoped_to Object Storage, S3 — vectors are derived from and point to S3 data
LanceDB scoped_to Vector Indexing on Object Storage — S3-native vector database
Embedding Model scoped_to Vector Indexing on Object Storage — produces the vectors
Hybrid S3 + Vector Index scoped_to Vector Indexing on Object Storage — the architectural pattern
Embedding Generation scoped_to Vector Indexing on Object Storage — the capability that feeds vectors

Definition

What it is

The practice of building and querying vector indexes over embeddings that are derived from data stored in S3.

Why it exists

Semantic retrieval (finding content by meaning) requires vector representations. When the source data lives in S3, the vector index must bridge the gap between unstructured storage and structured similarity search.

Relationships

Outbound Relationships

scoped_to

Object Storage S3

Inbound Relationships

scoped_to

LanceDB Hybrid S3 + Vector Index Embedding Model Embedding Generation Semantic Search

Resources

BlogHigh

aws.amazon.com/blogs/architecture/a-scalable-elastic-databas...

AWS Architecture Blog describes a production-grade 1B+ vector search solution built on LanceDB with S3 as the storage layer.

DocsHigh

lancedb.github.io/lancedb/

LanceDB documentation for this serverless vector database built on the Lance columnar format, designed for S3-native storage.

DocsHigh

milvus.io/docs/overview.md

Milvus is the leading open-source vector database with S3-backed storage support; its architecture docs explain how vector indexes are persisted to object storage.