Guide 20

The True Cost of Managed Iceberg — S3 Tables vs. Self-Managed Compaction

Problem Framing

AWS S3 Tables automates Iceberg table maintenance — compaction (binpack, sort, z-order), snapshot expiration, and orphan file cleanup — but this convenience comes with opaque $0.005/GB processing charges that scale dramatically under streaming workloads. Engineers need to model the cost breakpoint where managed compaction justifies its 20–29x premium over self-managed EMR or Glue compaction, and understand the operational trade-offs including compaction delay windows of up to three hours.

Relevant Nodes

  • Topics: S3, Table Formats
  • Technologies: Amazon S3 Tables, Apache Iceberg, Apache Spark
  • Architectures: Compaction
  • Pain Points: Small Files Problem, Request Pricing Models, Performance-per-Dollar

Decision Path

  1. Understand S3 Tables compaction mechanics. S3 Tables runs compaction automatically when it detects file count or size thresholds. The compaction types — binpack, sort, and z-order — vary in cost per GB processed. You do not control when compaction runs or which strategy is applied. Monitoring is limited to CloudWatch metrics on compaction lag and file counts.

  2. Calculate compaction cost per table based on write frequency. Multiply your daily write volume (in GB) by the $0.005/GB processing charge. For a table ingesting 100 GB/day, managed compaction costs ~$15/day or ~$450/month. For streaming workloads producing 1 TB/day across multiple tables, costs scale to thousands per month.

  3. Compare with EMR or Glue self-managed compaction costs. Self-managed compaction on EMR Serverless or Glue costs approximately $0.00017/GB — roughly 29x cheaper than S3 Tables. The trade-off is operational overhead: you must schedule compaction jobs, monitor file sizes, configure bin-pack thresholds, and handle job failures.

    • EMR Serverless eliminates cluster management but still requires job orchestration.
    • Glue ETL provides a managed scheduler but has higher per-DPU cost than EMR.
  4. Evaluate compaction delay impact. S3 Tables may delay compaction by up to three hours after writes. During this window, queries read uncompacted small files, increasing scan latency and S3 GET costs. For interactive analytics or dashboards requiring fresh data, this delay may be unacceptable.

    • Self-managed compaction can run on tighter schedules (every 15–30 minutes) if latency sensitivity requires it.
  5. Apply the decision framework. Use S3 Tables managed compaction for low-write-volume tables (under 10 GB/day) where operational simplicity outweighs cost. Use self-managed compaction for high-velocity streaming tables where the cost premium is material and compaction timing matters.

    • Hybrid approaches work: use S3 Tables for dimension tables and reference data, self-managed for high-volume fact tables.

What Changed Over Time

  • S3 Tables launched in late 2024 as AWS's first managed Iceberg offering, positioning automated maintenance as the primary value proposition.
  • Early adopters discovered that compaction costs were not prominently documented and could exceed expectations under streaming workloads.
  • Independent benchmarks (Onehouse, 2025) quantified the 20–29x cost premium relative to self-managed EMR compaction, shifting the conversation from convenience to unit economics.
  • AWS has since added more granular CloudWatch metrics for compaction monitoring, but pricing has not changed.

Sources