Schema Drift Detection
Monitoring S3-stored datasets for unexpected schema changes — new columns, type changes, missing fields, structural shifts — and alerting before downstream consumers break.
Summary
Monitoring S3-stored datasets for unexpected schema changes — new columns, type changes, missing fields, structural shifts — and alerting before downstream consumers break.
Schema drift detection is the proactive complement to schema evolution. While table formats handle planned schema changes, drift detection catches unplanned changes — a data producer silently adding a column, changing a type, or dropping a field — before they propagate to dashboards and ML models.
- Schema drift is different from schema evolution. Evolution is intentional and managed; drift is unintentional and must be detected. Both need handling, but the tools are different.
- LLM-based drift detection goes beyond structural comparison (which tools like Great Expectations handle). LLMs can detect semantic drift — when a field's meaning changes even if its type does not.
solvesSchema Evolution — catches unplanned schema changesaugmentsWrite-Audit-Publish — automated drift check in the audit stepdepends_onGeneral-Purpose LLM — for semantic drift detectionscoped_toLLM-Assisted Data Systems, Table Formats
Definition
Using LLMs to continuously monitor S3-stored datasets for unexpected schema changes — new columns, altered types, renamed fields, missing required fields — and alert before downstream pipelines break.
Schema changes in S3-stored data (new JSON fields, altered CSV headers, Parquet column additions) propagate silently through data lakes. LLM-based detection understands semantic schema meaning, catching breaking changes that rule-based checks miss.
Automated schema monitoring for S3 data lakes, pre-ingestion schema validation, schema change impact analysis.
Connections 5
Outbound 4
scoped_to2depends_on1solves1Inbound 1
enables1Resources 2
Great Expectations documentation for automated schema validation and drift detection on S3-stored datasets.
Databricks Auto Loader schema evolution detection for automatically identifying and handling schema changes in S3 data.