Technology

Trino

Summary

What it is

A distributed SQL query engine for federated analytics across heterogeneous data sources, with deep support for S3-backed data lakes and lakehouses.

Where it fits

Trino is the multi-engine query layer for S3 lakehouses. It queries Iceberg, Delta, Hudi, and raw Parquet on S3 through connectors — and can join S3 data with operational databases in a single query.

Misconceptions / Traps

  • Trino is a query engine, not a storage engine. It reads from S3 but does not manage data. Writes go through table format commit protocols.
  • Trino requires a coordinator and workers — operational overhead is higher than DuckDB. Use DuckDB for single-user exploration; Trino for multi-user production queries.

Key Connections

  • depends_on Apache Parquet — reads Parquet files from S3
  • used_by Lakehouse Architecture — a primary query engine for lakehouses
  • constrained_by Small Files Problem, Object Listing Performance — performance affected by S3 access patterns
  • Natural Language Querying augments Trino — LLMs generate SQL for Trino
  • scoped_to S3, Lakehouse

Definition

What it is

A distributed SQL query engine designed for federated analytics across heterogeneous data sources, with deep support for querying S3-backed data lakes and lakehouses.

Why it exists

Organizations store data across many systems. Trino provides a single SQL interface to query data wherever it lives — including directly on S3 via Parquet, ORC, Iceberg, Delta, and Hudi connectors — without moving data.

Primary use cases

Federated SQL across S3-backed sources, interactive lakehouse queries, cross-source joins between S3 data and operational databases.

Relationships

Outbound Relationships

scoped_to
depends_on

Inbound Relationships

Resources