Architecture

AI-Safe Views

The practice of creating constrained, pre-filtered views over lakehouse tables that limit what data AI/LLM systems can access, preventing models from inadvertently reading PII, confidential, or out-of-scope data during RAG retrieval or automated querying.

10 connections 3 resources

Summary

What it is

Where it fits

AI-Safe Views are the security boundary between LLM-assisted data systems and the full lakehouse. As organizations deploy RAG and text-to-SQL applications against their data lakes, these views ensure that the model's effective query scope is explicitly bounded, regardless of the model's intent or prompt injection attempts.

Misconceptions / Traps

AI-Safe Views are not just regular database views. They must be enforced at the catalog/engine level so that LLM-generated queries cannot bypass them by referencing underlying tables directly.
View definitions must evolve with the underlying table schema. A schema change that adds a PII column to the base table automatically exposes it through a SELECT * view.
Performance of views depends on the engine's ability to push predicates through the view definition. Complex views with multiple joins may not benefit from partition pruning.

Key Connections

scoped_to LLM-Assisted Data Systems, Lakehouse — AI access control
depends_on Row / Column Security — underlying access control mechanism
enables RAG over Structured Data — safe retrieval scope for LLM queries
relates_to PII Tokenization — complementary PII protection strategy

Definition

What it is

Logical views over lakehouse tables on S3 that are specifically designed for LLM consumption — filtering out PII, applying column masking, and reshaping data into formats suitable for embedding generation or retrieval-augmented generation.

Why it exists

Exposing raw lakehouse tables to LLM pipelines risks leaking sensitive data into model contexts, embeddings, or generated outputs. AI-Safe Views provide a governed data access layer that ensures LLMs only see sanitized, policy-compliant data.

Primary use cases

PII-safe data for embedding generation, governed table access for RAG pipelines, compliance-compliant LLM data feeds.

Recent developments

Latest signals

2026 framing: "agent behavior is determined by the data it has access to" — AI governance IS data governance. Databricks-led 2026 framing: what an agent can read, how fresh the data is, whether sensitive fields are masked — these are data governance questions, not AI governance questions. AI-safe views are how the data-governance layer reaches into the LLM stack. Per Databricks Blog — Governing AI Agents at Scale with Unity Catalog.
Automated PII detection: new data scanned within 24 hours. Unity Catalog auto-scans new data for PII within 24 hours, applies masking policies automatically. Removes the "we forgot to mask the new column" failure mode that historically broke AI-safe views in practice. Per Hoop.dev — Building a PII Catalog and Data Masking in Databricks.
Cross-engine ABAC: define masking once, apply everywhere agents touch data. Unity Catalog ABAC extends row filters + column masks across Databricks + Spark + Iceberg engines. AI agents that bypass the SQL layer (read Parquet directly, for instance) inherit the same masking. Per Databricks Blog — Completing the Lakehouse Vision.
Lakehouse Monitoring: unified quality for Data + AI. Databricks' Lakehouse Monitoring surface treats data quality + model quality + agent behavior under one observability surface — drift in any of the three triggers alerts. AI-safe-view drift (e.g., a new unmasked column) shows up alongside data-quality drift. Per Databricks Blog — Lakehouse Monitoring: Unified Solution for Data + AI.
Policy-as-Code for lakehouse governance gaining traction. 2026 trend: AI-safe-view definitions live in Git, deployed via CI/CD, audited by code review — "click-ops" PII masking is retreating to legacy environments. Per DataLakehouseHub — Policy as Code for Lakehouse Governance (May 2026).
Genie Code (Databricks 2026): AI-generated code respects AI-safe views automatically. Genie Code generates SQL/Python that runs under the user's Unity Catalog ABAC policies — agents can't generate code that bypasses the data-governance layer, by construction. Per Databricks Blog — Introducing Genie Code.