golangLAKEHOUSE/scripts/validatord_smoke.sh
root 1a3a82aedb validatord: coordinator session JSONL for offline analysis (B follow-up)
Closes the second half of J's 2026-05-02 multi-call observability
concern. Trace-id propagation (commit d6d2fdf) gave us the *live*
view in Langfuse; this gives us the *longitudinal* view for ad-hoc
DuckDB queries over thousands of sessions:

  "show me every session where the model produced a real candidate
   without ever needing a retry"
  "find sessions where validation rejected three times in a row"
  "first-shot success rate per model — did we feed it enough corpus?"

## What's in

internal/validator/session_log.go:
  - SessionRecord type (schema=session.iterate.v1)
  - SessionLogger writer — mutex-guarded append, best-effort posture,
    nil-safe (NewSessionLogger("") = nil = no-op on Append)
  - BuildSessionRecord helper — assembles a row from any
    iterate response/failure/infra-error combination, callable from
    other daemons that wrap iterate (cross-daemon shared schema)
  - 7 unit tests including concurrent-append safety + the three
    code paths (success / max_iter_exhausted / infra_error)

cmd/validatord/main.go:
  - handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath
  - Iterate handler: build + append a SessionRecord on every call
  - rosterCheckFor("fill") closure stamps grounded_in_roster — the
    load-bearing forensic property J flagged ("we can never
    hallucinate available staff members to contracts")

internal/shared/config.go + lakehouse.toml:
  - [validatord].session_log_path field; empty = disabled
  - Production: /var/lib/lakehouse/validator/sessions.jsonl

scripts/validatord_smoke.sh:
  - Adds a probe verifying validatord announces session log path on
    startup. Smoke is now 6/6 (was 5/5).

docs/SESSION_LOG.md:
  - Schema reference + 5 worked DuckDB query examples including the
    "alarm" query (sessions where grounded_in_roster=false on an
    accepted fill — should always be empty; if not, something is
    bypassing FillValidator).

## What this is NOT

This is NOT a duplicate of replay_runs.jsonl. They're siblings:
  - replay_runs.jsonl: replay tool's per-task retrieval+model output
  - sessions.jsonl: validatord's per-iterate full retry chain +
    grounded-in-roster verdict

A single coordinator session can produce rows in both streams; the
session_id (= Langfuse trace_id) is the join key.

## Layered observability now in place

  Live view:  Langfuse trace tree (X-Lakehouse-Trace-Id propagation)
              `iterate.attempt[N]` spans with prompt/raw/verdict
  Offline:    coordinator_sessions.jsonl (this commit)
              DuckDB-queryable; longitudinal forensics
  Hard gate:  FillValidator + WorkerLookup (existing)
              phantom IDs structurally rejected, never reach
              session log's grounded_in_roster=true bucket

Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE
section — these layers are wired; future work targets the data, not
the wiring.

## Verification

- internal/validator: 7 new tests (session_log_test.go) — all PASS
- cmd/validatord: 3 new integration tests covering the success,
  failure, and grounded=false paths — all PASS
- validatord_smoke.sh: 6/6 PASS through gateway :3110
- Full go test ./... green across 33 packages

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:22:09 -05:00

172 lines
7.0 KiB
Bash
Executable File

#!/usr/bin/env bash
# validatord smoke — Phase 43 PRD parity acceptance gate.
#
# Validates:
# - validatord boots, reports /health
# - POST /v1/validate with kind=playbook returns 200 + Report on
# well-formed input
# - POST /v1/validate with kind=playbook returns 422 + ValidationError
# when fingerprint is missing
# - POST /v1/validate with kind=fill consults the JSONL roster
# (phantom candidate → 422 Consistency)
# - POST /v1/validate with unknown kind returns 400
# - All assertions go through gateway :3110 (proxy correct)
#
# Doesn't exercise /iterate — that needs a live chat backend, covered
# by cmd/validatord/main_test.go's fakeChatd helper. CI-friendly.
#
# Usage: ./scripts/validatord_smoke.sh
set -euo pipefail
cd "$(dirname "$0")/.."
export PATH="$PATH:/usr/local/go/bin"
echo "[validatord-smoke] building validatord + gateway..."
go build -o bin/ ./cmd/validatord ./cmd/gateway
pkill -f "bin/(validatord|gateway)$" 2>/dev/null || true
sleep 0.3
PIDS=()
TMP="$(mktemp -d)"
ROSTER="$TMP/roster.jsonl"
CFG="$TMP/validatord.toml"
cleanup() {
echo "[validatord-smoke] cleanup"
for p in "${PIDS[@]:-}"; do [ -n "${p:-}" ] && kill "$p" 2>/dev/null || true; done
rm -rf "$TMP"
}
trap cleanup EXIT INT TERM
# Tiny synthetic roster so /v1/validate fill-kind has something to
# pass / fail against. Two real candidates + one inactive.
cat > "$ROSTER" <<EOF
{"candidate_id":"W-1","name":"Ada","status":"active","city":"Toledo","state":"OH","role":"Welder","blacklisted_clients":[]}
{"candidate_id":"W-2","name":"Bea","status":"active","city":"Toledo","state":"OH","role":"Welder","blacklisted_clients":["C-EVIL"]}
{"candidate_id":"W-3","name":"Cleo","status":"inactive","city":"Toledo","state":"OH","role":"Welder","blacklisted_clients":[]}
EOF
cat > "$CFG" <<EOF
[gateway]
bind = "127.0.0.1:3110"
storaged_url = "http://127.0.0.1:3211"
catalogd_url = "http://127.0.0.1:3212"
ingestd_url = "http://127.0.0.1:3213"
queryd_url = "http://127.0.0.1:3214"
vectord_url = "http://127.0.0.1:3215"
embedd_url = "http://127.0.0.1:3216"
pathwayd_url = "http://127.0.0.1:3217"
matrixd_url = "http://127.0.0.1:3218"
observerd_url = "http://127.0.0.1:3219"
chatd_url = "http://127.0.0.1:3220"
validatord_url = "http://127.0.0.1:3221"
[validatord]
bind = "127.0.0.1:3221"
chatd_url = "http://127.0.0.1:3220"
roster_path = "$ROSTER"
default_max_iterations = 3
default_max_tokens = 4096
chat_timeout_secs = 240
session_log_path = "$TMP/sessions.jsonl"
EOF
poll_health() {
local port="$1" deadline=$(($(date +%s) + 5))
while [ "$(date +%s)" -lt "$deadline" ]; do
if curl -sS --max-time 1 "http://127.0.0.1:$port/health" >/dev/null 2>&1; then return 0; fi
sleep 0.05
done
return 1
}
echo "[validatord-smoke] launching validatord → gateway..."
./bin/validatord -config "$CFG" > /tmp/validatord.log 2>&1 & PIDS+=($!)
poll_health 3221 || { echo "validatord failed"; tail /tmp/validatord.log; exit 1; }
./bin/gateway -config "$CFG" > /tmp/validatord_gateway.log 2>&1 & PIDS+=($!)
poll_health 3110 || { echo "gateway failed"; tail /tmp/validatord_gateway.log; exit 1; }
# 1. Roster loaded with 3 records — surface via the daemon's startup log.
if ! grep -q '"records":3' /tmp/validatord.log && ! grep -q 'records=3' /tmp/validatord.log; then
echo " ✗ expected validatord to log records=3 from roster; got:"
grep "validatord roster" /tmp/validatord.log || true
exit 1
fi
echo " ✓ validatord roster loaded with 3 records"
# 2. /v1/validate playbook happy path → 200
echo "[validatord-smoke] /v1/validate playbook happy path:"
RESP="$(curl -sS -X POST http://127.0.0.1:3110/v1/validate \
-H 'Content-Type: application/json' \
-d '{"kind":"playbook","artifact":{"operation":"fill: Welder x2 in Toledo, OH","endorsed_names":["W-1","W-2"],"target_count":2,"fingerprint":"abc123"}}')"
if ! echo "$RESP" | jq -e '.elapsed_ms != null and (.findings | type == "array")' >/dev/null; then
echo " ✗ unexpected response: $RESP"
exit 1
fi
echo " ✓ playbook OK ($RESP)"
# 3. /v1/validate playbook schema error → 422 with ValidationError
echo "[validatord-smoke] /v1/validate playbook missing fingerprint → 422:"
STATUS="$(curl -sS -o /tmp/playbook_422.json -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/validate \
-H 'Content-Type: application/json' \
-d '{"kind":"playbook","artifact":{"operation":"fill: X x1 in A, B","endorsed_names":["a"]}}')"
if [ "$STATUS" != "422" ]; then
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/playbook_422.json)"
exit 1
fi
# Rust serde-tagged-enum shape (parity with crates/validator):
# {"Schema":{"field":"fingerprint","reason":"..."}}
VARIANT="$(jq -r 'keys[0]' /tmp/playbook_422.json)"
FIELD="$(jq -r '.Schema.field' /tmp/playbook_422.json)"
if [ "$VARIANT" != "Schema" ] || [ "$FIELD" != "fingerprint" ]; then
echo " ✗ expected variant=Schema field=fingerprint; got variant=$VARIANT field=$FIELD"
exit 1
fi
echo " ✓ playbook missing fingerprint → 422 Schema/fingerprint"
# 4. /v1/validate fill with phantom candidate → 422 Consistency
echo "[validatord-smoke] /v1/validate fill with phantom candidate → 422:"
STATUS="$(curl -sS -o /tmp/fill_422.json -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/validate \
-H 'Content-Type: application/json' \
-d '{"kind":"fill","artifact":{"fills":[{"candidate_id":"W-PHANTOM","name":"Nobody"}]},"context":{"target_count":1,"city":"Toledo","client_id":"C-1"}}')"
if [ "$STATUS" != "422" ]; then
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/fill_422.json)"
exit 1
fi
# Rust serde-tagged-enum shape: {"Consistency":{"reason":"..."}}
VARIANT="$(jq -r 'keys[0]' /tmp/fill_422.json)"
if [ "$VARIANT" != "Consistency" ]; then
echo " ✗ expected variant=Consistency; got variant=$VARIANT body=$(cat /tmp/fill_422.json)"
exit 1
fi
echo " ✓ phantom candidate W-PHANTOM → 422 Consistency"
# 5. /v1/validate unknown kind → 400
echo "[validatord-smoke] /v1/validate unknown kind → 400:"
STATUS="$(curl -sS -o /tmp/unknown_400.txt -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/validate \
-H 'Content-Type: application/json' \
-d '{"kind":"foo","artifact":{}}')"
if [ "$STATUS" != "400" ]; then
echo " ✗ expected 400; got $STATUS body=$(cat /tmp/unknown_400.txt)"
exit 1
fi
echo " ✓ unknown kind → 400"
# 6. Session log: every /v1/validate hit was a single-shot validation
# (not iterate), so the session log isn't populated yet. Verify it
# exists as a path (logger initialized) — the iterate handler is
# where rows actually land. This proves the wiring is in place
# even if the smoke doesn't drive a live iteration end-to-end.
LOG_PATH="$TMP/sessions.jsonl"
echo "[validatord-smoke] session log path wired:"
grep -q "validatord session log" /tmp/validatord.log || {
echo " ✗ expected validatord to log 'validatord session log' on startup"
grep validatord /tmp/validatord.log
exit 1
}
echo " ✓ session_log_path=$LOG_PATH announced at startup (rows land on /v1/iterate calls)"
echo "[validatord-smoke] PASS — 6/6 probes through gateway :3110"