Guide 13

AWS S3's New Native Features — Tables, Vectors, Metadata, Express

Problem Framing

AWS is expanding S3 from pure object storage into a data platform. S3 Tables provides managed Apache Iceberg tables with automatic compaction. S3 Vectors adds native embedding storage and similarity search. S3 Metadata makes object metadata SQL-queryable via auto-generated Iceberg tables. S3 Express One Zone delivers single-digit millisecond latency in a single-AZ storage class. Each feature solves a real engineering problem — but each also deepens AWS lock-in. Engineers need to evaluate these features against open-source alternatives on three dimensions: capability, operational overhead saved, and lock-in cost incurred.

Relevant Nodes

  • Topics: Directory Buckets / Hot Object Storage, Metadata-First Object Storage
  • Technologies: S3 Express One Zone, Amazon S3 Tables, Amazon S3 Vectors, Amazon S3 Metadata
  • Standards: Iceberg REST Catalog Spec
  • Pain Points: Vendor Lock-In, Directory Namespace / Listing Bottlenecks

Decision Path

  1. S3 Express One Zone — for latency-sensitive hot data:

    • What it does: Single-digit ms first-byte latency (vs. ~100ms for S3 Standard). Uses directory-based namespace in a single AZ.
    • Use when: ML checkpointing at high frequency, interactive analytics needing fast data access, cache-tier replacement where ElastiCache/DAX would otherwise be used.
    • Skip when: Multi-AZ durability is required (Express is single-AZ), or your workload is throughput-bound rather than latency-bound.
    • Lock-in assessment: Moderate. Directory bucket API has minor differences from general-purpose buckets. Data can be copied out, but the latency benefit is AWS-specific.
  2. S3 Tables — managed Iceberg tables:

    • What it does: Creates and manages Apache Iceberg tables as a native S3 feature. Handles compaction, snapshot management, and storage optimization automatically.
    • Use when: You want Iceberg tables without operating a catalog, compaction jobs, or metadata maintenance. Reduces the operational burden described in Guide 3.
    • Skip when: You need multi-cloud table portability, use non-AWS query engines that may not integrate with S3 Tables, or want full control over compaction scheduling and table maintenance.
    • vs. self-managed Iceberg: S3 Tables eliminates operational toil (compaction, snapshot expiry, orphan cleanup) but removes fine-grained control. Self-managed Iceberg with Glue Catalog or Nessie preserves portability.
    • Lock-in assessment: Moderate-high. Data format is standard Iceberg/Parquet (portable), but the management layer is AWS-proprietary. Migrating means re-deploying all catalog and maintenance infrastructure.
  3. S3 Vectors — native embedding storage:

    • What it does: Stores vector embeddings natively in S3 and provides similarity search without a separate vector database.
    • Use when: You need simple vector search at moderate scale and want to avoid operating a dedicated vector database (Milvus, Weaviate).
    • Skip when: You need advanced vector search features (filtering, hybrid search, custom distance metrics), high-throughput real-time search, or want to avoid AWS lock-in for your embedding infrastructure.
    • vs. LanceDB: LanceDB stores vectors as files on any S3-compatible store (portable, open format). S3 Vectors is AWS-only but zero-infrastructure.
    • Lock-in assessment: High. Vector storage format is AWS-proprietary. Migrating means re-indexing all embeddings on another platform.
  4. S3 Metadata — queryable object metadata:

    • What it does: Automatically generates and maintains an Apache Iceberg table containing metadata for all objects in a bucket. Query with Athena, Spark, or any Iceberg-compatible engine.
    • Use when: You need to query object metadata at scale (find objects by custom metadata, analyze storage patterns, audit access) without running S3 Inventory exports or custom crawlers.
    • Skip when: You already have a metadata catalog (Glue Data Catalog, custom solution) that meets your needs, or you need metadata for objects across multiple providers.
    • vs. Glue Data Catalog: S3 Metadata is automatic and object-level. Glue Catalog is table/partition-level and requires manual registration. They complement rather than replace each other.
    • Lock-in assessment: Low-moderate. The generated metadata table is standard Iceberg format. The auto-generation feature is AWS-specific, but the data it produces is portable.
  5. Directory buckets vs. general-purpose buckets:

    • S3 Express One Zone uses directory buckets, which have a hierarchical namespace (actual directory structure) rather than the flat key-value namespace of general-purpose buckets.
    • This eliminates the Directory Namespace / Listing Bottlenecks pain point — LIST operations on directory buckets return results for a specific directory, not a prefix scan across the entire namespace.
    • Trade-off: directory buckets are single-AZ only and have a different pricing model (per-request + per-GB, no free tier).
  6. General adoption framework:

    • Adopt now: S3 Express One Zone for proven latency-sensitive workloads (checkpointing, hot caching). The latency benefit is immediate and the lock-in is manageable.
    • Evaluate carefully: S3 Tables if you are already committed to AWS and Iceberg. The operational savings are real, but test with your specific query engines.
    • Watch and wait: S3 Vectors and S3 Metadata are newer features. Evaluate against open alternatives (LanceDB, custom metadata catalogs) before committing.

What Changed Over Time

  • S3 was originally pure object storage — PUT, GET, DELETE, LIST. No query capability, no data awareness, no storage classes beyond Standard and Glacier.
  • Storage classes proliferated (Intelligent-Tiering, Glacier Instant Retrieval, Express One Zone), turning S3 into a tiered storage platform.
  • S3 Tables (2024) marked the first time S3 natively understood table semantics (Iceberg), crossing from "storage" into "data platform" territory.
  • S3 Vectors (2025) added native AI/ML capability to the storage layer, competing directly with dedicated vector databases.
  • S3 Metadata (2024) made the storage layer self-describing, reducing dependency on external metadata catalogs.
  • Each new feature follows the same pattern: solve a real operational pain point, use open formats where possible (Iceberg, Parquet), but make the management layer AWS-proprietary.

Sources