2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
ebd9ab7c77 |
validator: Phase 43 v3 — production WorkerLookup backed by workers_500k.parquet
Some checks failed
lakehouse/auditor 13 blocking issues: cloud: claim not backed — "Verified end-to-end:"
Closes the Phase 43 v2 loose end. The validator scaffolds (FillValidator,
EmailValidator) take Arc<dyn WorkerLookup> at construction; this commit
ships the parquet-snapshot impl that production code wires in.
Schema mapping (workers_500k.parquet → WorkerRecord):
worker_id (int64) → candidate_id = "W-{id}" (matches what the
staffing executor
emits)
name (string) → name (already concatenated upstream)
role (string) → role
city, state (string) → city, state
availability (double) → status: "active" if >0 else "inactive"
Workers_500k has no `status` column; we derive from `availability`
since 0.0 means vacationing/suspended/etc in this dataset's
convention. Once Track A.B's `_safe` view ships with proper status,
flip the loader to read it directly — schema mapping is in one
function (load_workers_parquet), so the swap is trivial.
In-memory snapshot model:
- Loads all 500K rows at startup → ~75MB resident
- Sync .find() — no per-call I/O on the validation hot path
- Refresh = call load_workers_parquet again to rebuild
- Caller-driven refresh (no auto-watch) — operators pick the cadence
Why workers_500k and not candidates.parquet:
candidates.parquet has the right shape (string candidate_id, status,
first/last_name) but lacks `role` — and the staffing executor matches
the W-* convention from workers_500k_v8 corpus. So the production
data path goes through workers_500k. The schema mismatch between the
two parquets is documented in `reports/staffing/synthetic-data-gap-
report.md` (gap A); resolution is operator's call.
Errors are typed (LookupLoadError):
- Open: file not found / permission
- Parse: invalid parquet
- MissingColumn: schema doesn't have required field
- BadRow: row missing worker_id or name
Schema check happens before iteration, so a wrong-shape file fails
loud immediately rather than silently building an empty lookup.
Verification:
cargo build -p validator compiles
cargo test -p validator 33 pass / 0 fail
(was 31; +2 for parquet)
load_real_workers_500k smoke test passes against the
live 500K-row file:
W-1 resolves, status +
role + city/state all
populated.
Phase 43 v3 part 2 (next):
- /v1/validate gateway endpoint that takes a JSON artifact + dispatches
to FillValidator/EmailValidator/PlaybookValidator with a shared
WorkerLookup loaded from the parquet at gateway startup.
- That closes the "any caller can validate" surface; execution-loop
wiring (Phase 43 PRD's "generate → validate → correct → retry")
becomes a thin wrapper on top of /v1/validate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
b5b0c00efe |
phase-43: new crates/validator — trait, staffing impls, devops scaffold
Some checks failed
lakehouse/auditor 3 blocking issues: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Phase 43 PRD (docs/CONTROL_PLANE_PRD.md:161) was the one audit finding
truly unimplemented — no crate, no trait, no tests, no workspace entry.
Neither PHASES.md nor the source tree had any Phase 43 presence.
Genuine greenfield gap.
Lands the scaffold as a real crate, registered in workspace Cargo.toml:
crates/validator/
src/lib.rs — Validator trait, Artifact enum (5 variants:
FillProposal, EmailDraft, Playbook,
TerraformPlan, AnsiblePlaybook), Report,
Finding, Severity, ValidationError
src/staffing/mod.rs — staffing validators module root
src/staffing/fill.rs — FillValidator (schema-level: fills array
+ per-fill {candidate_id, name}). 4 tests.
Worker-existence + status + geo checks
are TODO v2 (need catalog query handle).
src/staffing/email.rs — EmailValidator (to/body schema + SMS ≤160
+ email subject ≤78). 4 tests. PII scan +
name-consistency TODO v2.
src/staffing/playbook.rs — PlaybookValidator (operation prefix,
endorsed_names non-empty + ≤ target×2,
fingerprint present per Phase 25). 5 tests.
src/devops.rs — TerraformValidator + AnsibleValidator
scaffolds. Return Unimplemented — keeps
dispatcher shape stable, surfaces a clear
"phase 43 not wired" signal instead of
silently passing or panicking.
Total: 15 tests, all green. Covers the happy paths, the common
failure modes (missing fields, overfull arrays, length violations),
and the dispatch-error path (wrong artifact type into wrong validator).
Still open from Phase 43 (v2 work, beyond scaffold):
- FillValidator catalog integration (worker-existence, status,
geo/role match) — needs catalog handle in constructor
- EmailValidator PII scan (shared::pii::strip_pii integration) +
name-consistency cross-check
- Execution loop wiring: generate → validate → observer correction
+ retry (bounded by max_iterations=3) — spans crates, follow-up
- Observer logging: validation results to data/_observer/ops.jsonl
and data/_kb/outcomes.jsonl
- Scenario fixture tests against tests/multi-agent/playbooks/*
Workspace warnings still at 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|