root b03521a506 validator: port FillValidator + EmailValidator from Rust validator crate
Per architecture_comparison.md universal-win for Go side: ports the
Rust crates/validator/src/staffing/ to internal/validator/. Production
safety net Go was missing — FillValidator catches phantom worker IDs
+ status/blacklist/geo/role mismatches; EmailValidator catches
SSN-shape PII + salary disclosure + wrong-target name in email/SMS
drafts.

Files:
- types.go: Artifact (FillProposal | EmailDraft), Validator interface,
  WorkerLookup interface, ValidationError + Finding + Severity
- lookup.go: InMemoryWorkerLookup with case-insensitive ID lookup
- fill.go: FillValidator — schema → completeness → cross-roster
  (phantom ID / status / blacklist / geo / role)
- email.go: EmailValidator — schema → length → PII (SSN + salary)
  → worker-name consistency
- fill_test.go + email_test.go: 24 tests covering happy path +
  every error variant + the load-bearing edge cases (phone-pattern
  not flagged as SSN, flanking-digit guard rejects extended
  numeric runs)

Validator names match Rust (staffing.fill / staffing.email) so
cross-runtime audit logs share the same identifier. PII scanners
(containsSSNPattern, containsSalaryDisclosure) ported byte-for-byte
so a draft flagged by one runtime is flagged by the other.

Caveat: the Rust validator crate also has parquet_lookup.rs (loads
workers_500k.parquet at startup) and playbook.rs (additional
checks). Those weren't ported in this wave — only the two
load-bearing validators that were named in the comparison doc.

Closes one of the two universal-win items for Go side. The other
(materializer port) remains deferred — it's a bigger surface change
and depends on transforms.ts source-class adapters.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 04:49:55 -05:00

57 lines
1.6 KiB
Go

package validator
import "strings"
// InMemoryWorkerLookup is a zero-deps WorkerLookup useful for tests
// and small-fixture validation. Mirrors Rust's
// `InMemoryWorkerLookup::from_records`.
//
// Lookup is case-insensitive on candidate_id since Rust's
// HashMap with PartialEq + the source data's casing inconsistency
// (some IDs uppercase, some lowercase, some mixed) means
// case-sensitive lookup misses real matches. Lower-casing on
// insert keeps the contract.
type InMemoryWorkerLookup struct {
byID map[string]WorkerRecord
}
// NewInMemoryWorkerLookup builds a lookup from a list of records.
// Duplicate candidate_ids: last-write-wins. Empty candidate_id: skipped.
func NewInMemoryWorkerLookup(records []WorkerRecord) *InMemoryWorkerLookup {
m := make(map[string]WorkerRecord, len(records))
for _, r := range records {
if r.CandidateID == "" {
continue
}
m[strings.ToLower(strings.TrimSpace(r.CandidateID))] = r
}
return &InMemoryWorkerLookup{byID: m}
}
// Find satisfies WorkerLookup. Returns (rec, true) on hit,
// (nil, false) on miss.
func (l *InMemoryWorkerLookup) Find(candidateID string) (*WorkerRecord, bool) {
if l == nil {
return nil, false
}
r, ok := l.byID[strings.ToLower(strings.TrimSpace(candidateID))]
if !ok {
return nil, false
}
// Return a copy so callers can't mutate the lookup's internal state.
cp := r
return &cp, true
}
// Len exposes the size for tests + admin endpoints.
func (l *InMemoryWorkerLookup) Len() int {
if l == nil {
return 0
}
return len(l.byID)
}
// strPtr is a tiny convenience for tests that need *string fields
// on WorkerRecord.City/State/Role.
func strPtr(s string) *string { return &s }