Model Class

Classification / Tagging Models

Models that automatically categorize S3 objects by content type, sensitivity level, domain, or business unit — enabling automated governance, routing, and lifecycle management.

5 connections 2 resources

Summary

What it is

Models that automatically categorize S3 objects by content type, sensitivity level, domain, or business unit — enabling automated governance, routing, and lifecycle management.

Where it fits

Classification models scale the data governance function across S3 data lakes. They automatically tag objects with metadata that drives downstream processes — routing sensitive data to encrypted tiers, classifying documents for compliance, or tagging assets for search.

Misconceptions / Traps
  • Classification accuracy is domain-dependent. A model trained on general documents may perform poorly on domain-specific content (medical, legal, financial). Fine-tuning or domain-specific models improve accuracy.
  • Classification tags are metadata, not access controls. Tagging data as "confidential" does not prevent access — IAM policies must enforce the classification.
Key Connections
  • enables Data Classification — the model class behind automated classification
  • augments Metadata Management — enriches object metadata with classification tags
  • constrained_by High Cloud Inference Cost — per-object classification cost
  • scoped_to LLM-Assisted Data Systems, Metadata Management

Definition

What it is

Models that automatically categorize S3-stored objects by content type, sensitivity level, business domain, regulatory category, or custom taxonomies — enabling automated governance and discovery.

Why it exists

S3 buckets accumulate vast quantities of unlabeled data. Classification models enable governance (PII detection), discovery (finding relevant data), and routing (directing data to appropriate pipelines) at scales impossible for manual review.

Primary use cases

PII detection across S3 data lakes, automated sensitivity tagging, content-based data routing, regulatory classification.

Connections 5

Outbound 4
Inbound 1

Resources 2