Technology

Velox

A C++ vectorized execution engine developed by Meta that provides a unified, high-performance data processing backend usable by multiple front-end query engines including Presto, Spark, and custom data systems.

6 connections 3 resources

Summary

What it is

A C++ vectorized execution engine developed by Meta that provides a unified, high-performance data processing backend usable by multiple front-end query engines including Presto, Spark, and custom data systems.

Where it fits

Velox sits beneath query planners as a shared execution layer. For S3-backed workloads, it accelerates scan, filter, aggregation, and join operations against Parquet files on object storage, and is the engine behind Presto's Velox-based execution (Prestissimo).

Misconceptions / Traps
  • Velox is not a standalone query engine. It is an execution library that must be embedded in a host system (Presto, Spark via Gluten, or a custom application).
  • Velox's performance gains come from vectorized execution and adaptive filtering, not from caching. It still needs to read data from S3 on cache misses.
  • Integration with existing query engines (e.g., Spark via Gluten project) is still maturing. Not all Spark operations have Velox equivalents.
Key Connections
  • scoped_to S3, Lakehouse — accelerates query execution over S3 data
  • depends_on Apache Arrow — uses Arrow-compatible columnar memory layout
  • enables Trino — Prestissimo uses Velox as its execution engine
  • enables Apache Spark — Gluten project integrates Velox with Spark

Definition

What it is

A C++ vectorized database acceleration library created by Meta, designed to be embedded into query engines to provide a unified, high-performance execution layer for data processing on S3-stored data.

Why it exists

Multiple query engines (Spark, Presto, Flink) each implement their own execution runtimes with varying performance characteristics. Velox provides a shared, hardware-optimized execution core that any engine can embed, raising the performance floor for S3-based analytics.

Primary use cases

Accelerating Spark and Presto queries over S3 data, unified vectorized execution for lakehouse queries, hardware-optimized data processing.

Connections 6

Outbound 6

Resources 3