- DatasetManifest expanded: description, owner, sensitivity, columns,
lineage, freshness contract, tags, row_count
- All new fields use #[serde(default)] for backward compatibility
- PII auto-detection: scans column names for email, phone, SSN, salary,
address, DOB, medical terms — flags as PII/PHI/Financial
- Column-level metadata: name, type, sensitivity, is_pii flag
- Lineage tracking: source_system, source_file, ingest_job, timestamp
- Ingest pipeline auto-populates: PII scan, column meta, lineage, row count
- PATCH /catalog/datasets/by-name/{name}/metadata — update metadata
- Catalog responses now include all rich fields
- 25 unit tests passing (5 new PII detection tests)
Per ADR-013: datasets without metadata become mystery files.
This makes every ingested file self-describing from day one.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>