DataFusion
An extensible query execution framework written in Rust, built on Apache Arrow, that provides a SQL query planner and execution engine for building custom analytics applications over S3-stored data.
Summary
An extensible query execution framework written in Rust, built on Apache Arrow, that provides a SQL query planner and execution engine for building custom analytics applications over S3-stored data.
DataFusion is the embedded query engine layer used by projects like Ballista, InfluxDB IOx, and Delta-rs. Rather than being a standalone analytics product, it is the foundation that other S3-native tools build upon for SQL query planning and columnar execution.
- DataFusion is a library, not a database. It provides query planning and execution but requires integration work to become a deployable analytics system.
- DataFusion's Rust implementation offers memory safety and performance but limits extensibility to Rust or languages with Rust FFI bindings (Python via PyO3, C via extern).
- Distributed execution requires Ballista or a custom scheduler. DataFusion alone runs single-node.
scoped_toS3, Lakehouse — query execution over S3-stored datadepends_onApache Arrow — Arrow columnar format is the in-memory representationdepends_onApache Parquet — reads Parquet files from S3enablesApache Iceberg — used by the iceberg-rust implementation
Definition
An extensible, embeddable query engine written in Rust, built on Apache Arrow. Provides SQL and DataFrame APIs for querying data on S3, used as the query core in tools like Ballista, InfluxDB IOx, and Delta-rs.
Many projects need a fast, embeddable SQL engine that can read from S3 without deploying a full distributed query cluster. DataFusion provides a modular, Arrow-native query engine that can be embedded into Rust, Python, or other applications.
Embedded SQL analytics over S3, building custom query engines on object storage, serverless query execution against Parquet/Iceberg on S3.
Connections 7
Outbound 6
Inbound 1
alternative_to1Resources 3
Official Apache DataFusion site for the extensible query engine built on Arrow, designed for building custom analytics systems on object storage.
DataFusion source repository with the Rust-based query engine, object store integration, and Parquet/Iceberg readers.
DataFusion SQL reference documenting the query capabilities available for S3-backed analytical workloads.