Vortex
A next-generation open-source columnar file format incubating at the Linux Foundation AI & Data Foundation, designed to supersede Apache Parquet for AI and analytics workloads via zero-copy Arrow integration and compute-on-encoded-data kernels (ALP for floats, FSST for strings).
Summary
A next-generation open-source columnar file format incubating at the Linux Foundation AI & Data Foundation, designed to supersede Apache Parquet for AI and analytics workloads via zero-copy Arrow integration and compute-on-encoded-data kernels (ALP for floats, FSST for strings).
Vortex sits where Parquet has historically lived — as the file format underneath Iceberg/Delta tables and as DuckDB's input layer — but optimizes for AI access patterns Parquet was never designed for. The Linux Foundation transition (formerly SpiralDB) signals a vendor-neutral path, with first-class DuckDB integration shipped January 2026.
- Vortex is not a database. It is a file format and encoding layer, equivalent in scope to Parquet.
- The "100× faster" headline applies to random access — sequential scans are also 10–20× faster, but the random-access gap is the differentiator.
- Compute-on-encoded-data requires the engine to understand the encoding tree. DuckDB does (via the official extension); arbitrary Parquet readers do not.
alternative_toApache Parquet — successor format for AI workloadsused_byDuckDB — official extension since January 2026scoped_toTable Formats, S3
Definition
An open-source columnar file format incubating at the **Linux Foundation AI & Data Foundation**, designed as a next-generation successor to Apache Parquet for AI and analytics workloads. Operates with **zero-copy Apache Arrow integration** so the on-disk and in-memory representations match exactly, eliminating the deserialization tax. Compute kernels execute **directly on encoded data** via specialized encodings (**ALP** for floating-point tensors, **FSST** for variable-length strings) rather than decompressing first. Originally developed at SpiralDB; gained an official **DuckDB extension** in January 2026.
Parquet was architected over a decade ago for batch CPU analytics. Modern AI workloads — wide tables, sparse arrays, high-dimensional vectors, random-access RAG retrieval — exposed its structural limits. Vortex replaces Parquet's Thrift-based metadata, eager decompression, and rigid row-group layout with a **pluggable encoding tree**, lazy evaluation, and Arrow-native memory. Published benchmarks show **100× faster random access** and **10–20× faster sequential scans** vs Parquet, with substantially lower CPU and host memory footprint.
Successor format for AI training and inference data, RAG retrieval requiring fast random reads, lakehouse table formats migrating off Parquet for ML feature stores, DuckDB analytical queries with embedded compute kernels.
Connections 5
Outbound 5
Resources 4
Source repository for the Vortex columnar format with encoding tree spec, ALP/FSST kernel implementations, and Arrow-native memory layout.
DuckDB's official announcement of the Vortex extension shipped January 2026 — concrete integration path for query engines.
LF AI & Data Foundation press release on Vortex's transition from SpiralDB — vendor-neutral governance signal.
Dremio's comparative analysis of Vortex vs Parquet/Lance/Nimble in the AI workload regime, with benchmarks.