LLM Capability

Schema Drift Detection

Monitoring S3-stored datasets for unexpected schema changes — new columns, type changes, missing fields, structural shifts — and alerting before downstream consumers break.

5 connections 2 resources

Summary

What it is

Monitoring S3-stored datasets for unexpected schema changes — new columns, type changes, missing fields, structural shifts — and alerting before downstream consumers break.

Where it fits

Schema drift detection is the proactive complement to schema evolution. While table formats handle planned schema changes, drift detection catches unplanned changes — a data producer silently adding a column, changing a type, or dropping a field — before they propagate to dashboards and ML models.

Misconceptions / Traps
  • Schema drift is different from schema evolution. Evolution is intentional and managed; drift is unintentional and must be detected. Both need handling, but the tools are different.
  • LLM-based drift detection goes beyond structural comparison (which tools like Great Expectations handle). LLMs can detect semantic drift — when a field's meaning changes even if its type does not.
Key Connections
  • solves Schema Evolution — catches unplanned schema changes
  • augments Write-Audit-Publish — automated drift check in the audit step
  • depends_on General-Purpose LLM — for semantic drift detection
  • scoped_to LLM-Assisted Data Systems, Table Formats

Definition

What it is

Using LLMs to continuously monitor S3-stored datasets for unexpected schema changes — new columns, altered types, renamed fields, missing required fields — and alert before downstream pipelines break.

Why it exists

Schema changes in S3-stored data (new JSON fields, altered CSV headers, Parquet column additions) propagate silently through data lakes. LLM-based detection understands semantic schema meaning, catching breaking changes that rule-based checks miss.

Primary use cases

Automated schema monitoring for S3 data lakes, pre-ingestion schema validation, schema change impact analysis.

Connections 5

Outbound 4
Inbound 1

Resources 2