Technology

Trino

A distributed SQL query engine for federated analytics across heterogeneous data sources, with deep support for S3-backed data lakes and lakehouses.

11 connections 4 resources

Summary

What it is

A distributed SQL query engine for federated analytics across heterogeneous data sources, with deep support for S3-backed data lakes and lakehouses.

Where it fits

Trino is the multi-engine query layer for S3 lakehouses. It queries Iceberg, Delta, Hudi, and raw Parquet on S3 through connectors — and can join S3 data with operational databases in a single query.

Misconceptions / Traps
  • Trino is a query engine, not a storage engine. It reads from S3 but does not manage data. Writes go through table format commit protocols.
  • Trino requires a coordinator and workers — operational overhead is higher than DuckDB. Use DuckDB for single-user exploration; Trino for multi-user production queries.
Key Connections
  • depends_on Apache Parquet — reads Parquet files from S3
  • used_by Lakehouse Architecture — a primary query engine for lakehouses
  • constrained_by Small Files Problem, Object Listing Performance — performance affected by S3 access patterns
  • Natural Language Querying augments Trino — LLMs generate SQL for Trino
  • scoped_to S3, Lakehouse

Definition

What it is

A distributed SQL query engine designed for federated analytics across heterogeneous data sources, with deep support for querying S3-backed data lakes and lakehouses.

Why it exists

Organizations store data across many systems. Trino provides a single SQL interface to query data wherever it lives — including directly on S3 via Parquet, ORC, Iceberg, Delta, and Hudi connectors — without moving data.

Primary use cases

Federated SQL across S3-backed sources, interactive lakehouse queries, cross-source joins between S3 data and operational databases.

Recent developments

Latest signals
  • Trino 478 and 479 ship community broadcast cadence. Per the Trino Community Broadcast episode list, Trino 478 and 479 continue the project's high-cadence release pattern with topics covering virtual view hierarchies (with Rob Dickinson), AI agents for query development, Trino Query UI updates, and Trino Gateway 16 → 18 progression. The community-broadcast cadence is itself a competitive signal: Trino maintains the analyst-engagement velocity that closed-source data-warehouse vendors struggle to match.
  • Trino Gateway 18 — in-memory caching + Java 25. Per the Trino Gateway release notes, Gateway 18 adds in-memory caching of backend metadata, query-history deactivation, and UI timezone selection. Gateway 17 (the predecessor) requires Java 25 and ships on the UBI10 micro base image with JMX metrics enabled. For organizations running Trino as a federated query layer over S3-backed lakehouses, the Gateway tier is now first-class infrastructure with its own release cadence rather than an afterthought.

Connections 11

Outbound 6
Inbound 5

Resources 4