phase-43: new crates/validator — trait, staffing impls, devops scaffold
Some checks failed
lakehouse/auditor 3 blocking issues: todo!() macro call in tests/real-world/scrum_master_pipeline.ts

Phase 43 PRD (docs/CONTROL_PLANE_PRD.md:161) was the one audit finding
truly unimplemented — no crate, no trait, no tests, no workspace entry.
Neither PHASES.md nor the source tree had any Phase 43 presence.
Genuine greenfield gap.

Lands the scaffold as a real crate, registered in workspace Cargo.toml:

  crates/validator/
    src/lib.rs            — Validator trait, Artifact enum (5 variants:
                            FillProposal, EmailDraft, Playbook,
                            TerraformPlan, AnsiblePlaybook), Report,
                            Finding, Severity, ValidationError
    src/staffing/mod.rs   — staffing validators module root
    src/staffing/fill.rs  — FillValidator (schema-level: fills array
                            + per-fill {candidate_id, name}). 4 tests.
                            Worker-existence + status + geo checks
                            are TODO v2 (need catalog query handle).
    src/staffing/email.rs — EmailValidator (to/body schema + SMS ≤160
                            + email subject ≤78). 4 tests. PII scan +
                            name-consistency TODO v2.
    src/staffing/playbook.rs — PlaybookValidator (operation prefix,
                            endorsed_names non-empty + ≤ target×2,
                            fingerprint present per Phase 25). 5 tests.
    src/devops.rs         — TerraformValidator + AnsibleValidator
                            scaffolds. Return Unimplemented — keeps
                            dispatcher shape stable, surfaces a clear
                            "phase 43 not wired" signal instead of
                            silently passing or panicking.

Total: 15 tests, all green. Covers the happy paths, the common
failure modes (missing fields, overfull arrays, length violations),
and the dispatch-error path (wrong artifact type into wrong validator).

Still open from Phase 43 (v2 work, beyond scaffold):
  - FillValidator catalog integration (worker-existence, status,
    geo/role match) — needs catalog handle in constructor
  - EmailValidator PII scan (shared::pii::strip_pii integration) +
    name-consistency cross-check
  - Execution loop wiring: generate → validate → observer correction
    + retry (bounded by max_iterations=3) — spans crates, follow-up
  - Observer logging: validation results to data/_observer/ops.jsonl
    and data/_kb/outcomes.jsonl
  - Scenario fixture tests against tests/multi-agent/playbooks/*

Workspace warnings still at 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-24 13:35:22 -05:00
parent 2f1b9c9768
commit b5b0c00efe
8 changed files with 512 additions and 0 deletions

View File

@ -15,6 +15,7 @@ members = [
"crates/lance-bench",
"crates/vectord-lance",
"crates/truth",
"crates/validator",
]
[workspace.dependencies]

View File

@ -0,0 +1,11 @@
[package]
name = "validator"
version = "0.1.0"
edition = "2024"
[dependencies]
serde = { workspace = true }
serde_json = { workspace = true }
thiserror = { workspace = true }
tokio = { workspace = true }
tracing = { workspace = true }

View File

@ -0,0 +1,44 @@
//! DevOps validator scaffold — long-horizon.
//!
//! PRD: "scaffold only: stubbed Terraform/Ansible validators
//! (`terraform validate`, `ansible-lint`) for the long-horizon phase."
//! Shipped as Unimplemented stubs so the execution-loop dispatcher
//! has a consistent failure shape to surface ("phase 43 not wired")
//! instead of a missing-impl panic.
use crate::{Artifact, Report, Validator, ValidationError};
pub struct TerraformValidator;
impl Validator for TerraformValidator {
fn name(&self) -> &'static str { "devops.terraform" }
fn validate(&self, _artifact: &Artifact) -> Result<Report, ValidationError> {
Err(ValidationError::Unimplemented { artifact: "terraform_plan" })
}
}
pub struct AnsibleValidator;
impl Validator for AnsibleValidator {
fn name(&self) -> &'static str { "devops.ansible" }
fn validate(&self, _artifact: &Artifact) -> Result<Report, ValidationError> {
Err(ValidationError::Unimplemented { artifact: "ansible_playbook" })
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn terraform_scaffold_returns_unimplemented() {
let r = TerraformValidator.validate(&Artifact::TerraformPlan(serde_json::json!({})));
assert!(matches!(r, Err(ValidationError::Unimplemented { .. })));
}
#[test]
fn ansible_scaffold_returns_unimplemented() {
let r = AnsibleValidator.validate(&Artifact::AnsiblePlaybook(serde_json::json!({})));
assert!(matches!(r, Err(ValidationError::Unimplemented { .. })));
}
}

View File

@ -0,0 +1,95 @@
//! Phase 43 Validation Pipeline.
//!
//! PRD: "Staffing outputs run through schema / completeness /
//! consistency / policy gates. Plug into Layer 5 execution loop —
//! failure triggers observer-correction iteration."
//!
//! This crate provides the `Validator` trait + `Artifact` enum +
//! Report/ValidationError types. Staffing validators (fill, email,
//! playbook) and the DevOps scaffold live in submodules.
//!
//! Landed 2026-04-24 as a scaffold — the trait + types + module
//! layout match the PRD; individual validator implementations are
//! `Unimplemented` stubs that return a clear "phase 43 not wired"
//! error rather than silently passing. The execution-loop integration
//! (generate → validate → correct → retry) comes in a follow-up
//! commit once the stubs are filled.
use serde::{Deserialize, Serialize};
use thiserror::Error;
pub mod staffing;
pub mod devops;
/// What a validator saw. One variant per artifact class we validate.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[serde(tag = "kind")]
pub enum Artifact {
/// A fill proposal from the staffing executor — shape is
/// `{fills: [{candidate_id, name}]}` per PRD.
FillProposal(serde_json::Value),
/// An email/SMS draft for outreach.
EmailDraft(serde_json::Value),
/// A playbook being sealed for memory.
Playbook(serde_json::Value),
/// Terraform plan output (scaffold, long-horizon).
TerraformPlan(serde_json::Value),
/// Ansible playbook (scaffold, long-horizon).
AnsiblePlaybook(serde_json::Value),
}
/// Success report. Empty `findings` means a clean pass. Populated
/// findings with `Severity::Warning` means "acceptable but notable" —
/// the artifact passes. `Severity::Error` means validation failed;
/// the validator should return `Err(...)` in that case, not `Ok`.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Report {
pub findings: Vec<Finding>,
pub elapsed_ms: u64,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Finding {
pub field: String,
pub severity: Severity,
pub message: String,
}
#[derive(Debug, Clone, Copy, Serialize, Deserialize, PartialEq, Eq)]
#[serde(rename_all = "snake_case")]
pub enum Severity {
Warning,
Error,
}
/// Validation failure — what went wrong + where + why. Returned as
/// `Err` from `validate`. Execution loop catches these and feeds them
/// to the observer-correction retry loop.
#[derive(Debug, Clone, Error, Serialize, Deserialize)]
pub enum ValidationError {
/// Artifact schema doesn't match what we expected.
#[error("schema mismatch at {field}: {reason}")]
Schema { field: String, reason: String },
/// Required data missing (e.g. endorsed count != target count).
#[error("completeness: {reason}")]
Completeness { reason: String },
/// Data that's inconsistent with another source of truth
/// (e.g. worker_id doesn't exist in the workers table).
#[error("consistency: {reason}")]
Consistency { reason: String },
/// Policy violation — truth rule or access control said no.
#[error("policy: {reason}")]
Policy { reason: String },
/// Validator hasn't been implemented yet — scaffold stub.
#[error("validator not yet implemented for {artifact} — phase 43 scaffold")]
Unimplemented { artifact: &'static str },
}
/// Core validation contract. Implementations live in `staffing::*` and
/// `devops::*`. The execution loop dispatches to the right impl based
/// on the Artifact variant.
pub trait Validator: Send + Sync {
fn validate(&self, artifact: &Artifact) -> Result<Report, ValidationError>;
/// Human-readable name for logs + Langfuse traces.
fn name(&self) -> &'static str;
}

View File

@ -0,0 +1,116 @@
//! Email/SMS draft validator.
//!
//! PRD checks:
//! - Schema (TO/BODY fields present)
//! - Length (SMS ≤ 160 chars; email subject ≤ 78 chars)
//! - PII absence (no SSN / salary leaked into outgoing text)
//! - Worker-name consistency (name in message matches worker record)
//!
//! Scaffold implements schema + length. PII regex (SSN pattern,
//! salary-number pattern) lives in `shared::pii::strip_pii` — plug in
//! a follow-up when the validator caller knows the worker record to
//! cross-check name consistency.
use crate::{Artifact, Report, Validator, ValidationError};
use std::time::Instant;
pub struct EmailValidator;
const SMS_MAX_CHARS: usize = 160;
const EMAIL_SUBJECT_MAX_CHARS: usize = 78;
impl Validator for EmailValidator {
fn name(&self) -> &'static str { "staffing.email" }
fn validate(&self, artifact: &Artifact) -> Result<Report, ValidationError> {
let started = Instant::now();
let value = match artifact {
Artifact::EmailDraft(v) => v,
other => return Err(ValidationError::Schema {
field: "artifact".into(),
reason: format!("EmailValidator expects EmailDraft, got {other:?}"),
}),
};
let to = value.get("to").and_then(|v| v.as_str()).ok_or(
ValidationError::Schema {
field: "to".into(),
reason: "missing or not a string".into(),
},
)?;
let body = value.get("body").and_then(|v| v.as_str()).ok_or(
ValidationError::Schema {
field: "body".into(),
reason: "missing or not a string".into(),
},
)?;
let is_sms = value.get("kind").and_then(|k| k.as_str()) == Some("sms");
if is_sms && body.len() > SMS_MAX_CHARS {
return Err(ValidationError::Completeness {
reason: format!("SMS body is {} chars, max {SMS_MAX_CHARS}", body.len()),
});
}
if let Some(subject) = value.get("subject").and_then(|v| v.as_str()) {
if subject.len() > EMAIL_SUBJECT_MAX_CHARS {
return Err(ValidationError::Completeness {
reason: format!(
"email subject is {} chars, max {EMAIL_SUBJECT_MAX_CHARS}",
subject.len()
),
});
}
}
let _ = to; // touched for future name-consistency check
// TODO(phase-43 v2): PII scan + worker-name consistency.
Ok(Report {
findings: vec![],
elapsed_ms: started.elapsed().as_millis() as u64,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn long_sms_fails_completeness() {
let body = "x".repeat(200);
let r = EmailValidator.validate(&Artifact::EmailDraft(serde_json::json!({
"to": "+15555550123",
"body": body,
"kind": "sms"
})));
assert!(matches!(r, Err(ValidationError::Completeness { .. })));
}
#[test]
fn long_email_subject_fails_completeness() {
let r = EmailValidator.validate(&Artifact::EmailDraft(serde_json::json!({
"to": "a@b.com",
"body": "hi",
"subject": "x".repeat(100)
})));
assert!(matches!(r, Err(ValidationError::Completeness { .. })));
}
#[test]
fn missing_to_fails_schema() {
let r = EmailValidator.validate(&Artifact::EmailDraft(serde_json::json!({"body": "hi"})));
assert!(matches!(r, Err(ValidationError::Schema { field, .. }) if field == "to"));
}
#[test]
fn well_formed_email_passes() {
let r = EmailValidator.validate(&Artifact::EmailDraft(serde_json::json!({
"to": "hiring@example.com",
"subject": "Interview: Friday 10am",
"body": "Hi Jane — confirming interview Friday 10am."
})));
assert!(r.is_ok(), "well-formed email should pass: {:?}", r);
}
}

View File

@ -0,0 +1,103 @@
//! Fill-proposal validator.
//!
//! PRD checks:
//! - Schema compliance (propose_done shape matches
//! `{fills: [{candidate_id, name}]}`)
//! - Completeness (endorsed count == target_count)
//! - Worker existence (every candidate_id present in workers_500k)
//! - Status check (active, not_on_client_blacklist)
//! - Geo/role match (worker city/state/role matches contract)
//!
//! Today this is a scaffold — schema check is real (it's cheap); the
//! worker-existence / status / geo checks need a catalog lookup and
//! land in a follow-up when the catalog query helper is wired into
//! this crate.
use crate::{Artifact, Report, Validator, ValidationError};
use std::time::Instant;
pub struct FillValidator;
impl Validator for FillValidator {
fn name(&self) -> &'static str { "staffing.fill" }
fn validate(&self, artifact: &Artifact) -> Result<Report, ValidationError> {
let started = Instant::now();
let value = match artifact {
Artifact::FillProposal(v) => v,
other => return Err(ValidationError::Schema {
field: "artifact".into(),
reason: format!("FillValidator expects FillProposal, got {other:?}"),
}),
};
// Schema check — the only real validation shipped in this
// scaffold. Catches the common "model emitted prose instead of
// JSON" failure mode before the consistency checks even run.
let fills = value.get("fills").and_then(|f| f.as_array()).ok_or(
ValidationError::Schema {
field: "fills".into(),
reason: "expected top-level `fills` array".into(),
},
)?;
for (i, fill) in fills.iter().enumerate() {
if fill.get("candidate_id").is_none() {
return Err(ValidationError::Schema {
field: format!("fills[{i}].candidate_id"),
reason: "missing".into(),
});
}
if fill.get("name").is_none() {
return Err(ValidationError::Schema {
field: format!("fills[{i}].name"),
reason: "missing".into(),
});
}
}
// TODO(phase-43 v2): worker-existence / status / geo checks.
// Need a catalog query handle injected into FillValidator's
// constructor — out of scope for the scaffold.
Ok(Report {
findings: vec![],
elapsed_ms: started.elapsed().as_millis() as u64,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn wrong_artifact_type_fails_schema() {
let r = FillValidator.validate(&Artifact::EmailDraft(serde_json::json!({})));
assert!(matches!(r, Err(ValidationError::Schema { .. })));
}
#[test]
fn missing_fills_array_fails_schema() {
let r = FillValidator.validate(&Artifact::FillProposal(serde_json::json!({})));
assert!(matches!(r, Err(ValidationError::Schema { field, .. }) if field == "fills"));
}
#[test]
fn fill_without_candidate_id_fails() {
let r = FillValidator.validate(&Artifact::FillProposal(serde_json::json!({
"fills": [{"name": "Jane"}]
})));
assert!(matches!(r, Err(ValidationError::Schema { field, .. }) if field.contains("candidate_id")));
}
#[test]
fn well_formed_proposal_passes_schema() {
let r = FillValidator.validate(&Artifact::FillProposal(serde_json::json!({
"fills": [
{"candidate_id": "W-123", "name": "Jane Doe"},
{"candidate_id": "W-456", "name": "John Smith"}
]
})));
assert!(r.is_ok(), "well-formed proposal should pass schema: {:?}", r);
}
}

View File

@ -0,0 +1,8 @@
//! Staffing validators — fill proposals, email/SMS drafts, sealed
//! playbooks. Phase 43 PRD: "the 0→85% pattern reproduces on real
//! staffing tasks — the iteration loop with validation in place is
//! what made small models successful."
pub mod fill;
pub mod email;
pub mod playbook;

View File

@ -0,0 +1,134 @@
//! Sealed playbook validator.
//!
//! PRD checks:
//! - Operation format (`fill: Role xN in City, ST`)
//! - endorsed_names non-empty, ≤ target_count × 2
//! - fingerprint populated (Phase 25 validity window requirement)
use crate::{Artifact, Report, Validator, ValidationError};
use std::time::Instant;
pub struct PlaybookValidator;
impl Validator for PlaybookValidator {
fn name(&self) -> &'static str { "staffing.playbook" }
fn validate(&self, artifact: &Artifact) -> Result<Report, ValidationError> {
let started = Instant::now();
let value = match artifact {
Artifact::Playbook(v) => v,
other => return Err(ValidationError::Schema {
field: "artifact".into(),
reason: format!("PlaybookValidator expects Playbook, got {other:?}"),
}),
};
// Operation format: "fill: Role xN in City, ST" — at minimum
// we check the string-shape. Fuller grammar parse lives in
// phase 25 code where operations are structured beyond strings.
let op = value.get("operation").and_then(|v| v.as_str()).ok_or(
ValidationError::Schema {
field: "operation".into(),
reason: "missing or not a string".into(),
},
)?;
if !op.starts_with("fill:") {
return Err(ValidationError::Schema {
field: "operation".into(),
reason: format!("expected `fill: ...` prefix, got {op:?}"),
});
}
let endorsed = value.get("endorsed_names").and_then(|v| v.as_array()).ok_or(
ValidationError::Schema {
field: "endorsed_names".into(),
reason: "missing or not an array".into(),
},
)?;
if endorsed.is_empty() {
return Err(ValidationError::Completeness {
reason: "endorsed_names must be non-empty".into(),
});
}
if let Some(target) = value.get("target_count").and_then(|v| v.as_u64()) {
let max = (target * 2) as usize;
if endorsed.len() > max {
return Err(ValidationError::Completeness {
reason: format!(
"endorsed_names ({}) exceeds target_count × 2 ({max})",
endorsed.len()
),
});
}
}
if value.get("fingerprint").and_then(|v| v.as_str()).map_or(true, |s| s.is_empty()) {
return Err(ValidationError::Schema {
field: "fingerprint".into(),
reason: "missing — required for Phase 25 validity window".into(),
});
}
Ok(Report {
findings: vec![],
elapsed_ms: started.elapsed().as_millis() as u64,
})
}
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn well_formed_playbook_passes() {
let r = PlaybookValidator.validate(&Artifact::Playbook(serde_json::json!({
"operation": "fill: Welder x2 in Toledo, OH",
"endorsed_names": ["W-123", "W-456"],
"target_count": 2,
"fingerprint": "abc123"
})));
assert!(r.is_ok(), "got {:?}", r);
}
#[test]
fn empty_endorsed_names_fails_completeness() {
let r = PlaybookValidator.validate(&Artifact::Playbook(serde_json::json!({
"operation": "fill: Welder x2 in Toledo, OH",
"endorsed_names": [],
"fingerprint": "abc"
})));
assert!(matches!(r, Err(ValidationError::Completeness { .. })));
}
#[test]
fn overfull_endorsed_names_fails_completeness() {
let r = PlaybookValidator.validate(&Artifact::Playbook(serde_json::json!({
"operation": "fill: Welder x1 in Toledo, OH",
"endorsed_names": ["a", "b", "c"],
"target_count": 1,
"fingerprint": "abc"
})));
assert!(matches!(r, Err(ValidationError::Completeness { .. })));
}
#[test]
fn missing_fingerprint_fails_schema() {
let r = PlaybookValidator.validate(&Artifact::Playbook(serde_json::json!({
"operation": "fill: X x1 in A, B",
"endorsed_names": ["a"]
})));
assert!(matches!(r, Err(ValidationError::Schema { field, .. }) if field == "fingerprint"));
}
#[test]
fn wrong_operation_prefix_fails_schema() {
let r = PlaybookValidator.validate(&Artifact::Playbook(serde_json::json!({
"operation": "sms_draft: hello",
"endorsed_names": ["a"],
"fingerprint": "x"
})));
assert!(matches!(r, Err(ValidationError::Schema { .. })));
}
}