Nimble
A columnar file format from Meta, purpose-built for ML feature engineering on wide tables (10K+ columns), using block encoding for bounded memory and Flatbuffers metadata for SIMD/GPU-efficient decode. Open-sourced as `facebookincubator/nimble`.
Summary
A columnar file format from Meta, purpose-built for ML feature engineering on wide tables (10K+ columns), using block encoding for bounded memory and Flatbuffers metadata for SIMD/GPU-efficient decode. Open-sourced as `facebookincubator/nimble`.
Nimble targets a workload Parquet was never designed for — ultra-wide ML feature stores with tens of thousands of columns, where Parquet's metadata overhead and stream encoding become prohibitive. Joins Vortex and Lance as the post-Parquet AI-format trio, each optimizing a different access pattern.
- Nimble is not a general-purpose Parquet replacement. It optimizes specifically for wide-table ML workloads; for narrow analytics tables Parquet remains efficient.
- Ecosystem support is narrower than Parquet — primarily Meta-internal tooling plus the public open-source build.
- Block encoding gives bounded memory but reads more data per block than streaming readers — only beneficial when column count justifies the trade.
alternative_toApache Parquet — wide-table ML workloadsscoped_toTable Formats, S3
Definition
A columnar file format developed at **Meta**, purpose-built for **machine-learning feature engineering on wide tables** — datasets with tens of thousands of columns. Replaces Parquet's stream encoding with **block encoding** for predictable, bounded memory usage and substitutes Thrift/Protobuf with **Flatbuffers** for lightweight metadata that decodes efficiently on SIMD CPUs and GPUs. Open-sourced as `facebookincubator/nimble`.
Production ML feature stores at hyperscaler scale routinely exceed 10,000 columns per table — a regime Parquet was never designed for. Parquet's Thrift metadata, eager column-chunk loading, and per-column statistics overhead become prohibitive at that width. Nimble guarantees per-column memory bounds via block encoding and enables fast metadata traversal via Flatbuffers, making wide-table reads viable on accelerator hardware.
ML feature engineering on wide tables (10K+ columns), recommender system training datasets, large-scale embedding tables, GPU-accelerated columnar reads, Meta-scale feature stores.
Connections 4
Outbound 4
Resources 3
Meta's official Nimble repository with the wide-table block-encoding format spec, Flatbuffers metadata schemas, and SIMD/GPU decode paths.
Concise technical writeup explaining Nimble's block encoding rationale and Flatbuffers vs Thrift trade-offs for ML feature stores.
Comparative file-format analysis covering Lance, Nimble, and Bullion in AI/ML workload context.