Performance-per-Dollar
The composite metric that evaluates S3-based data system efficiency by normalizing query throughput, scan latency, or ingestion rate against total cost (storage, requests, compute, egress, and caching), enabling apples-to-apples comparison of architectural choices.
Summary
The composite metric that evaluates S3-based data system efficiency by normalizing query throughput, scan latency, or ingestion rate against total cost (storage, requests, compute, egress, and caching), enabling apples-to-apples comparison of architectural choices.
Performance-per-dollar is the ultimate evaluation criterion for S3-based architecture decisions. Choosing between Parquet and ORC, Iceberg and Delta, Trino and Spark, or AWS S3 and MinIO should be grounded in measured performance-per-dollar, not raw performance alone.
- Raw performance benchmarks (queries per second, scan throughput) are meaningless without cost context. A system that is 2x faster but 5x more expensive is not a better choice.
- Cost in S3-based systems has many components: storage per GB, request pricing, compute (spot vs on-demand), egress, and metadata API calls. Benchmarks that omit any component are misleading.
- Performance-per-dollar changes with scale. A system that is cost-efficient at 1 TB may be uneconomical at 1 PB due to metadata overhead, request amplification, or catalog limits.
scoped_toS3, Lakehouse — cost efficiency across S3-based systemsdepends_onBenchmarking Methodology — measured by controlled benchmarksconstrainsRequest Pricing Models — request costs are a key componentconstrainsEgress Cost — egress is a significant cost factor in multi-region designs
Definition
The metric of query throughput, latency, or processing speed normalized to total cost (storage + compute + API calls + egress) for S3-based data systems, used to compare architectures, engines, and storage configurations.
Recent developments
- TPC-DS 10TB 2025-2026 results: Trino ~17s/query avg vs Spark ~38s/query avg. Recent TPC-DS benchmarks on 10TB datasets place Trino at ~17.46s average query latency vs Spark at ~38.24s — ~2× faster. The "Trino is faster" argument is now data-backed for analytical workloads. Per Hive on MR3 — TPC-DS Benchmark: Trino 476, Spark 4.0.0, Hive 4 on MR3 2.1.
- Starburst Enterprise (Trino) 2.5×-7.1× faster than EMR alternatives in production. Cloud-deployment benchmark: Starburst Enterprise vs AWS EMR — 2.5× faster than EMR Presto, 3.9× faster than EMR Spark, 7.1× faster than EMR Hive. Performance-per-dollar improves further once you factor that you pay similar EC2 rates for slower-completing jobs. Per Concurrency Labs — Querying 6.35B Records: TPC-DS Performance + Cost Comparison Starburst Enterprise vs EMR.
- StarRocks publishes TPC-DS benchmarks against the field. Open-source StarRocks (analytics-focused MPP) now publishes its own TPC-DS results — extends the engine-comparison field beyond the Trino/Spark/Hive trio. Per StarRocks Docs — TPC-DS Benchmarking.
- Databricks SQL ships TPC-DS evaluation tooling natively. Databricks-on-AWS lets users run TPC-DS against their own deployments to validate cost-performance vs the spec — closes the gap between published benchmarks + customer-environment numbers. Per Databricks Docs — Use TPC-DS Sample Dataset to Evaluate System Performance.
- Cost-efficiency calculation requires combining benchmark + current cloud pricing. Critical 2026 framing: raw benchmark numbers (sec/query) don't equal performance-per-dollar — must combine with hourly EC2/GCE pricing, query concurrency, and idle-cluster cost. Many published benchmarks don't include the cost-side math; practitioners must do it. Per Hive on MR3 — TPC-DS Benchmark 476 + Spark 4.0.0.
- IBM ships open-source spark-tpc-ds-performance-test for reproducible Spark benchmarking. Open-source repo IBM/spark-tpc-ds-performance-test lets teams reproducibly run TPC-DS against their own Spark deployment configs. The trust-but-verify pattern for vendor-published benchmark numbers. Per GitHub — IBM/spark-tpc-ds-performance-test.
Connections 4
Outbound 2
scoped_to2Inbound 2
enables1optimizes_for1Resources 3
S3 pricing tiers (Standard, IA, Glacier, Express One Zone) are the foundation for calculating storage performance-per-dollar.
S3 storage class documentation explaining the performance-cost spectrum from Express One Zone to Deep Glacier.
S3 Intelligent-Tiering documentation for automatic cost optimization based on access patterns, directly improving performance-per-dollar.