Technology

Athena

AWS's serverless, pay-per-query SQL engine that runs queries directly against data stored in S3 without requiring infrastructure provisioning or cluster management.

7 connections 3 resources

Summary

What it is

AWS's serverless, pay-per-query SQL engine that runs queries directly against data stored in S3 without requiring infrastructure provisioning or cluster management.

Where it fits

Athena is the lowest-friction entry point for querying S3 data in the AWS ecosystem. It reads Parquet, ORC, JSON, CSV, and Iceberg tables registered in Glue Catalog, making it the default ad-hoc analytics tool for AWS-centric data lakes.

Misconceptions / Traps

Athena charges per terabyte scanned, not per query. Without columnar formats (Parquet) and partition pruning, costs escalate rapidly on large datasets.
Athena v3 (Trino-based) and Athena v2 (Presto-based) have different SQL compatibility and performance characteristics. Engine version must be explicitly selected.
Athena is not suitable for low-latency, high-concurrency workloads. Each query has cold-start overhead and there are per-account concurrency limits.

Key Connections

scoped_to S3, Lakehouse — serverless SQL over S3
depends_on AWS Glue Catalog — reads table metadata from Glue
depends_on Apache Parquet — optimal performance requires columnar formats
constrained_by Cold Scan Latency — full-table scans on large S3 datasets are slow and expensive

Definition

What it is

AWS's serverless, pay-per-query SQL engine that reads data directly from S3. Supports Iceberg, Delta, and Hudi table formats via integration with AWS Glue Catalog.

Why it exists

Running always-on query clusters (Spark, Trino) for ad-hoc analytics is expensive when usage is sporadic. Athena provides instant SQL access to S3 data with no infrastructure to manage and a pure per-query pricing model.

Primary use cases

Ad-hoc SQL queries over S3 data lakes, serverless Iceberg table queries, log analysis, cost-efficient exploratory analytics.

Recent developments

Latest signals

LocalStack 2026.04.0 ships Athena + S3 Tables Federation via Glue Catalogs — Trino 480 upgrade. Per the LocalStack release announcement, the local-emulator release adds Athena and S3 Tables federation through Glue catalogs, and the Trino engine that powers Athena's bigdata container has been upgraded from Trino 440 to Trino 480 on Java 25. For developer-loop iteration this matters: Athena workloads can now be developed and tested locally against the same Trino version that AWS runs in production, closing a long-standing parity gap.
Federated query SDK keeps evolving for custom data sources. Per the awslabs/aws-athena-query-federation repository, the Athena Query Federation SDK continues to support custom connector development — letting teams integrate Athena with proprietary data sources, build new aggregation operators, or expose existing data systems through the Athena SQL surface. The active SDK is what makes Athena the workhorse for "I need SQL across heterogeneous AWS-and-not-AWS data sources" use cases.