Classification / Tagging Models
Models that automatically categorize S3 objects by content type, sensitivity level, domain, or business unit — enabling automated governance, routing, and lifecycle management.
Summary
Models that automatically categorize S3 objects by content type, sensitivity level, domain, or business unit — enabling automated governance, routing, and lifecycle management.
Classification models scale the data governance function across S3 data lakes. They automatically tag objects with metadata that drives downstream processes — routing sensitive data to encrypted tiers, classifying documents for compliance, or tagging assets for search.
- Classification accuracy is domain-dependent. A model trained on general documents may perform poorly on domain-specific content (medical, legal, financial). Fine-tuning or domain-specific models improve accuracy.
- Classification tags are metadata, not access controls. Tagging data as "confidential" does not prevent access — IAM policies must enforce the classification.
enablesData Classification — the model class behind automated classificationaugmentsMetadata Management — enriches object metadata with classification tagsconstrained_byHigh Cloud Inference Cost — per-object classification costscoped_toLLM-Assisted Data Systems, Metadata Management
Definition
Models that automatically categorize S3-stored objects by content type, sensitivity level, business domain, regulatory category, or custom taxonomies — enabling automated governance and discovery.
S3 buckets accumulate vast quantities of unlabeled data. Classification models enable governance (PII detection), discovery (finding relevant data), and routing (directing data to appropriate pipelines) at scales impossible for manual review.
PII detection across S3 data lakes, automated sensitivity tagging, content-based data routing, regulatory classification.
Connections 5
Outbound 4
Inbound 1
depends_on1Resources 2
Amazon Comprehend documentation for NLP-based text classification, entity recognition, and sentiment analysis on S3 data.
Grab engineering blog on deploying LLM-powered classification at petabyte scale for PII tagging and sensitivity tiering.