Apache Arrow
Summary
What it is
A cross-language in-memory columnar data format specification with libraries for zero-copy reads, IPC, and efficient analytics.
Where it fits
Arrow sits between S3 storage (Parquet on disk) and compute (query execution in memory). It defines how columnar data is laid out in memory, eliminating serialization overhead when processing S3-stored Parquet data.
Misconceptions / Traps
- Arrow is an in-memory format, not a storage format. You do not "store Arrow files on S3" (though Arrow IPC files exist, they are not the primary use case).
- Arrow and Parquet are complementary, not competing. Parquet is the on-disk format; Arrow is the in-memory format. Most engines read Parquet into Arrow for processing.
Key Connections
used_byDuckDB, Apache Spark — in-memory processing formatscoped_toS3, Table Formats
Definition
What it is
A cross-language in-memory columnar data format specification with libraries for zero-copy reads, IPC, and efficient analytics. Defines how columnar data is laid out in memory.
Why it exists
Every analytics engine historically had its own in-memory format, requiring costly serialization between systems. Arrow provides a universal in-memory representation that eliminates serialization overhead, which matters especially when processing large volumes of S3-stored Parquet data.
Primary use cases
Zero-copy data sharing between processing engines, efficient Parquet deserialization, in-memory analytics acceleration.
Relationships
Inbound Relationships
depends_onResources
Formal specification of the Arrow columnar memory layout, defining the in-memory representation for arrays, buffers, null bitmaps, and nested types.
Specifies the Arrow Flight RPC protocol for high-performance data transport over gRPC, a key part of the Arrow ecosystem.
The monorepo containing canonical implementations of Arrow in C++, Java, Python, Rust, Go, plus the format specification source files.
Official Apache Arrow project homepage — entry point to all format specs, language-specific docs, and community resources.