Polars
A high-performance DataFrame library written in Rust with Python and Node.js bindings, designed for fast columnar analytics with lazy evaluation and native S3 read support.
Summary
A high-performance DataFrame library written in Rust with Python and Node.js bindings, designed for fast columnar analytics with lazy evaluation and native S3 read support.
Polars occupies the single-node analytics layer alongside DuckDB, providing an alternative to pandas for data engineering workloads that read from and write to S3. Its lazy execution model and Rust-based engine make it significantly faster than pandas for Parquet/S3 workloads.
- Polars is not a distributed engine. It runs on a single machine and cannot scale across a cluster like Spark. For datasets larger than available RAM, it uses out-of-core streaming but does not distribute work.
- Polars and DuckDB solve similar problems but have different APIs. Polars uses a DataFrame API; DuckDB uses SQL. Choose based on workflow preference, not raw performance alone.
- Lazy evaluation in Polars is not the same as Spark's lazy evaluation. Polars optimizes a single-node query plan; it does not create distributed stages.
scoped_toS3 — reads Parquet and CSV files directly from S3depends_onApache Arrow — uses Arrow as the in-memory columnar formatdepends_onApache Parquet — primary file format for S3 readsalternative_toDuckDB — both serve single-node S3 analytics use cases
Definition
A high-performance DataFrame library written in Rust with Python and Node.js bindings, built on Apache Arrow. Designed as a faster alternative to pandas with native support for lazy evaluation and reading directly from S3.
Pandas is single-threaded and memory-inefficient for large datasets. Polars exploits multi-core parallelism and Arrow's columnar format to process S3-stored Parquet files at speeds that approach or exceed Spark on single-node workloads, without cluster overhead.
High-performance single-node analytics over S3-stored Parquet, data engineering transformations, ETL processing of lakehouse data.
Recent developments
- Latest release: Python Polars v1.41.2 (May 29, 2026). The Python and Rust crates version separately (
py-*vsrust-*tags); the Python line is on 1.41.x. Per pola-rs/polars releases. - Python Polars 1.38.x — Iceberg sink (unstable) + business-day holidays + scan_csv missing_columns. Per the pola-rs/polars releases page, Python Polars 1.38.0 and 1.38.1 ship Expr support for holidays in business-day calculations, an unstable
sink_icebergwriter for direct Iceberg-table output, amissing_columnsparameter onscan_csv, plus broad small-fix work. The unstable Iceberg sink is the load-bearing new capability: Polars can now write directly to Iceberg tables on S3 without round-tripping through Parquet files first. - Independent research: ~8× less energy than Pandas; 5–30× faster on real workloads. Per endjin's "Under the Hood" article, the EASE 2024 study measured Polars consuming ~8× less energy than Pandas on large-dataframe synthetic workloads and ~40% more efficient on TPC-H benchmarks. Practical performance summaries put Polars 5-30× faster than Pandas on real workloads, with the gap widening as data grows. For S3-backed analytics teams making the Pandas → Polars switch, this is the reference data behind the migration case.
Connections 6
Outbound 5
Inbound 1
alternative_to1Resources 3
Official Polars documentation for the high-performance DataFrame library with native S3 and Parquet support via Rust-based execution.
Polars source repository showcasing the columnar query engine that achieves order-of-magnitude speedups over pandas for S3 data workflows.
Polars cloud storage guide covering direct reads from S3, GCS, and Azure Blob Storage with credential configuration.