Standard

Apache Arrow

A cross-language in-memory columnar data format specification with libraries for zero-copy reads, IPC, and efficient analytics.

9 connections 4 resources

Summary

What it is

A cross-language in-memory columnar data format specification with libraries for zero-copy reads, IPC, and efficient analytics.

Where it fits

Arrow sits between S3 storage (Parquet on disk) and compute (query execution in memory). It defines how columnar data is laid out in memory, eliminating serialization overhead when processing S3-stored Parquet data.

Misconceptions / Traps
  • Arrow is an in-memory format, not a storage format. You do not "store Arrow files on S3" (though Arrow IPC files exist, they are not the primary use case).
  • Arrow and Parquet are complementary, not competing. Parquet is the on-disk format; Arrow is the in-memory format. Most engines read Parquet into Arrow for processing.
Key Connections
  • used_by DuckDB, Apache Spark — in-memory processing format
  • scoped_to S3, Table Formats

Definition

What it is

A cross-language in-memory columnar data format specification with libraries for zero-copy reads, IPC, and efficient analytics. Defines how columnar data is laid out in memory.

Why it exists

Every analytics engine historically had its own in-memory format, requiring costly serialization between systems. Arrow provides a universal in-memory representation that eliminates serialization overhead, which matters especially when processing large volumes of S3-stored Parquet data.

Primary use cases

Zero-copy data sharing between processing engines, efficient Parquet deserialization, in-memory analytics acceleration.

Recent developments

Latest signals
  • arrow-rs 58.2.0 (April 28, 2026) — Variant + Parquet improvements + security policy. Per the apache/arrow-rs releases page, the latest Rust implementation ships Variant type enhancements (aligning with the parquet/Iceberg/Delta Variant convergence), Parquet read/write improvements, and a published security policy — the project is now operating on the formal-disclosure rails that enterprise consumers expect.
  • ADBC is the new database driver layer — Snowflake, DuckDB, Microsoft Power BI, dbt Fusion all adopting. Per The New Stack's coverage, Arrow's ADBC (Arrow Database Connectivity) is rapidly replacing ODBC/JDBC as the columnar-first database driver standard. Snowflake, DuckDB, and Microsoft Power BI have adopted; per the dbt Fusion ADBC docs, dbt's new Rust-based Fusion engine uses ADBC as its unified driver layer, and the project published technical guidance for vendors building ADBC drivers for Fusion. The structural shift: analytical-tool plumbing is moving from row-based driver protocols to columnar-native ones, with Arrow as the spec everyone implements.

Connections 9

Outbound 4
Inbound 5

Resources 4