LLM Capability

Natural Language Querying

Summary

What it is

Using LLMs to translate natural language questions into executable queries (SQL, API calls) over S3-backed datasets.

Where it fits

Natural language querying is the accessibility layer of S3-backed data systems. It lets business users ask questions in plain language and get results from Iceberg, Parquet, or other S3-backed tables — without knowing SQL.

Misconceptions / Traps

Natural language to SQL is not solved. LLMs generate plausible-looking SQL that may be wrong. Guardrails (schema validation, result sampling, SQL review) are essential.
Query accuracy depends heavily on schema metadata quality. Well-documented columns, table descriptions, and sample values improve LLM-generated SQL dramatically.

Key Connections

depends_on General-Purpose LLM — requires language understanding and SQL generation
augments Trino, DuckDB — generates SQL for these engines
scoped_to LLM-Assisted Data Systems, Lakehouse

Definition

What it is

Using LLMs to translate natural language questions into executable queries (SQL, API calls) over S3-backed datasets, making data accessible to non-technical users.

Why it exists

S3-backed lakehouses contain valuable data accessible only through SQL or programming interfaces. Natural language querying removes this barrier, allowing business users to ask questions in plain language and get results from Iceberg, Parquet, or other S3-backed data.

Primary use cases

Self-service analytics over lakehouse data, natural language to SQL for Trino/DuckDB queries, conversational interfaces over S3-backed datasets.

Relationships

Outbound Relationships

scoped_to

LLM-Assisted Data Systems Lakehouse

depends_on

General-Purpose LLM

augments

Trino DuckDB

Inbound Relationships

enables

General-Purpose LLM Code-Focused LLM

Resources

GitHubHigh

github.com/aws-samples/natural-language-querying-of-data-in-...

Official AWS sample repository for natural language querying of S3 data using Athena and generative AI text-to-SQL, a reference architecture for the pattern.

BlogHigh

aws.amazon.com/blogs/machine-learning/build-a-robust-text-to...

AWS ML Blog detailing a production text-to-SQL architecture using Bedrock (Claude), Glue Data Catalog metadata, and Athena for querying S3 data lakes with natural language.

BlogHigh

aws.amazon.com/blogs/big-data/enriching-metadata-for-accurat...

AWS Big Data Blog on improving text-to-SQL accuracy by enriching Glue Data Catalog metadata, addressing the schema-to-SQL grounding challenge for S3 data.