Closes the second half of J's 2026-05-02 multi-call observability
concern. Trace-id propagation (commit d6d2fdf) gave us the *live*
view in Langfuse; this gives us the *longitudinal* view for ad-hoc
DuckDB queries over thousands of sessions:
"show me every session where the model produced a real candidate
without ever needing a retry"
"find sessions where validation rejected three times in a row"
"first-shot success rate per model — did we feed it enough corpus?"
## What's in
internal/validator/session_log.go:
- SessionRecord type (schema=session.iterate.v1)
- SessionLogger writer — mutex-guarded append, best-effort posture,
nil-safe (NewSessionLogger("") = nil = no-op on Append)
- BuildSessionRecord helper — assembles a row from any
iterate response/failure/infra-error combination, callable from
other daemons that wrap iterate (cross-daemon shared schema)
- 7 unit tests including concurrent-append safety + the three
code paths (success / max_iter_exhausted / infra_error)
cmd/validatord/main.go:
- handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath
- Iterate handler: build + append a SessionRecord on every call
- rosterCheckFor("fill") closure stamps grounded_in_roster — the
load-bearing forensic property J flagged ("we can never
hallucinate available staff members to contracts")
internal/shared/config.go + lakehouse.toml:
- [validatord].session_log_path field; empty = disabled
- Production: /var/lib/lakehouse/validator/sessions.jsonl
scripts/validatord_smoke.sh:
- Adds a probe verifying validatord announces session log path on
startup. Smoke is now 6/6 (was 5/5).
docs/SESSION_LOG.md:
- Schema reference + 5 worked DuckDB query examples including the
"alarm" query (sessions where grounded_in_roster=false on an
accepted fill — should always be empty; if not, something is
bypassing FillValidator).
## What this is NOT
This is NOT a duplicate of replay_runs.jsonl. They're siblings:
- replay_runs.jsonl: replay tool's per-task retrieval+model output
- sessions.jsonl: validatord's per-iterate full retry chain +
grounded-in-roster verdict
A single coordinator session can produce rows in both streams; the
session_id (= Langfuse trace_id) is the join key.
## Layered observability now in place
Live view: Langfuse trace tree (X-Lakehouse-Trace-Id propagation)
`iterate.attempt[N]` spans with prompt/raw/verdict
Offline: coordinator_sessions.jsonl (this commit)
DuckDB-queryable; longitudinal forensics
Hard gate: FillValidator + WorkerLookup (existing)
phantom IDs structurally rejected, never reach
session log's grounded_in_roster=true bucket
Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE
section — these layers are wired; future work targets the data, not
the wiring.
## Verification
- internal/validator: 7 new tests (session_log_test.go) — all PASS
- cmd/validatord: 3 new integration tests covering the success,
failure, and grounded=false paths — all PASS
- validatord_smoke.sh: 6/6 PASS through gateway :3110
- Full go test ./... green across 33 packages
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
172 lines
7.0 KiB
Bash
Executable File
172 lines
7.0 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# validatord smoke — Phase 43 PRD parity acceptance gate.
|
|
#
|
|
# Validates:
|
|
# - validatord boots, reports /health
|
|
# - POST /v1/validate with kind=playbook returns 200 + Report on
|
|
# well-formed input
|
|
# - POST /v1/validate with kind=playbook returns 422 + ValidationError
|
|
# when fingerprint is missing
|
|
# - POST /v1/validate with kind=fill consults the JSONL roster
|
|
# (phantom candidate → 422 Consistency)
|
|
# - POST /v1/validate with unknown kind returns 400
|
|
# - All assertions go through gateway :3110 (proxy correct)
|
|
#
|
|
# Doesn't exercise /iterate — that needs a live chat backend, covered
|
|
# by cmd/validatord/main_test.go's fakeChatd helper. CI-friendly.
|
|
#
|
|
# Usage: ./scripts/validatord_smoke.sh
|
|
|
|
set -euo pipefail
|
|
cd "$(dirname "$0")/.."
|
|
|
|
export PATH="$PATH:/usr/local/go/bin"
|
|
|
|
echo "[validatord-smoke] building validatord + gateway..."
|
|
go build -o bin/ ./cmd/validatord ./cmd/gateway
|
|
|
|
pkill -f "bin/(validatord|gateway)$" 2>/dev/null || true
|
|
sleep 0.3
|
|
|
|
PIDS=()
|
|
TMP="$(mktemp -d)"
|
|
ROSTER="$TMP/roster.jsonl"
|
|
CFG="$TMP/validatord.toml"
|
|
|
|
cleanup() {
|
|
echo "[validatord-smoke] cleanup"
|
|
for p in "${PIDS[@]:-}"; do [ -n "${p:-}" ] && kill "$p" 2>/dev/null || true; done
|
|
rm -rf "$TMP"
|
|
}
|
|
trap cleanup EXIT INT TERM
|
|
|
|
# Tiny synthetic roster so /v1/validate fill-kind has something to
|
|
# pass / fail against. Two real candidates + one inactive.
|
|
cat > "$ROSTER" <<EOF
|
|
{"candidate_id":"W-1","name":"Ada","status":"active","city":"Toledo","state":"OH","role":"Welder","blacklisted_clients":[]}
|
|
{"candidate_id":"W-2","name":"Bea","status":"active","city":"Toledo","state":"OH","role":"Welder","blacklisted_clients":["C-EVIL"]}
|
|
{"candidate_id":"W-3","name":"Cleo","status":"inactive","city":"Toledo","state":"OH","role":"Welder","blacklisted_clients":[]}
|
|
EOF
|
|
|
|
cat > "$CFG" <<EOF
|
|
[gateway]
|
|
bind = "127.0.0.1:3110"
|
|
storaged_url = "http://127.0.0.1:3211"
|
|
catalogd_url = "http://127.0.0.1:3212"
|
|
ingestd_url = "http://127.0.0.1:3213"
|
|
queryd_url = "http://127.0.0.1:3214"
|
|
vectord_url = "http://127.0.0.1:3215"
|
|
embedd_url = "http://127.0.0.1:3216"
|
|
pathwayd_url = "http://127.0.0.1:3217"
|
|
matrixd_url = "http://127.0.0.1:3218"
|
|
observerd_url = "http://127.0.0.1:3219"
|
|
chatd_url = "http://127.0.0.1:3220"
|
|
validatord_url = "http://127.0.0.1:3221"
|
|
|
|
[validatord]
|
|
bind = "127.0.0.1:3221"
|
|
chatd_url = "http://127.0.0.1:3220"
|
|
roster_path = "$ROSTER"
|
|
default_max_iterations = 3
|
|
default_max_tokens = 4096
|
|
chat_timeout_secs = 240
|
|
session_log_path = "$TMP/sessions.jsonl"
|
|
EOF
|
|
|
|
poll_health() {
|
|
local port="$1" deadline=$(($(date +%s) + 5))
|
|
while [ "$(date +%s)" -lt "$deadline" ]; do
|
|
if curl -sS --max-time 1 "http://127.0.0.1:$port/health" >/dev/null 2>&1; then return 0; fi
|
|
sleep 0.05
|
|
done
|
|
return 1
|
|
}
|
|
|
|
echo "[validatord-smoke] launching validatord → gateway..."
|
|
./bin/validatord -config "$CFG" > /tmp/validatord.log 2>&1 & PIDS+=($!)
|
|
poll_health 3221 || { echo "validatord failed"; tail /tmp/validatord.log; exit 1; }
|
|
./bin/gateway -config "$CFG" > /tmp/validatord_gateway.log 2>&1 & PIDS+=($!)
|
|
poll_health 3110 || { echo "gateway failed"; tail /tmp/validatord_gateway.log; exit 1; }
|
|
|
|
# 1. Roster loaded with 3 records — surface via the daemon's startup log.
|
|
if ! grep -q '"records":3' /tmp/validatord.log && ! grep -q 'records=3' /tmp/validatord.log; then
|
|
echo " ✗ expected validatord to log records=3 from roster; got:"
|
|
grep "validatord roster" /tmp/validatord.log || true
|
|
exit 1
|
|
fi
|
|
echo " ✓ validatord roster loaded with 3 records"
|
|
|
|
# 2. /v1/validate playbook happy path → 200
|
|
echo "[validatord-smoke] /v1/validate playbook happy path:"
|
|
RESP="$(curl -sS -X POST http://127.0.0.1:3110/v1/validate \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"kind":"playbook","artifact":{"operation":"fill: Welder x2 in Toledo, OH","endorsed_names":["W-1","W-2"],"target_count":2,"fingerprint":"abc123"}}')"
|
|
if ! echo "$RESP" | jq -e '.elapsed_ms != null and (.findings | type == "array")' >/dev/null; then
|
|
echo " ✗ unexpected response: $RESP"
|
|
exit 1
|
|
fi
|
|
echo " ✓ playbook OK ($RESP)"
|
|
|
|
# 3. /v1/validate playbook schema error → 422 with ValidationError
|
|
echo "[validatord-smoke] /v1/validate playbook missing fingerprint → 422:"
|
|
STATUS="$(curl -sS -o /tmp/playbook_422.json -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/validate \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"kind":"playbook","artifact":{"operation":"fill: X x1 in A, B","endorsed_names":["a"]}}')"
|
|
if [ "$STATUS" != "422" ]; then
|
|
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/playbook_422.json)"
|
|
exit 1
|
|
fi
|
|
# Rust serde-tagged-enum shape (parity with crates/validator):
|
|
# {"Schema":{"field":"fingerprint","reason":"..."}}
|
|
VARIANT="$(jq -r 'keys[0]' /tmp/playbook_422.json)"
|
|
FIELD="$(jq -r '.Schema.field' /tmp/playbook_422.json)"
|
|
if [ "$VARIANT" != "Schema" ] || [ "$FIELD" != "fingerprint" ]; then
|
|
echo " ✗ expected variant=Schema field=fingerprint; got variant=$VARIANT field=$FIELD"
|
|
exit 1
|
|
fi
|
|
echo " ✓ playbook missing fingerprint → 422 Schema/fingerprint"
|
|
|
|
# 4. /v1/validate fill with phantom candidate → 422 Consistency
|
|
echo "[validatord-smoke] /v1/validate fill with phantom candidate → 422:"
|
|
STATUS="$(curl -sS -o /tmp/fill_422.json -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/validate \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"kind":"fill","artifact":{"fills":[{"candidate_id":"W-PHANTOM","name":"Nobody"}]},"context":{"target_count":1,"city":"Toledo","client_id":"C-1"}}')"
|
|
if [ "$STATUS" != "422" ]; then
|
|
echo " ✗ expected 422; got $STATUS body=$(cat /tmp/fill_422.json)"
|
|
exit 1
|
|
fi
|
|
# Rust serde-tagged-enum shape: {"Consistency":{"reason":"..."}}
|
|
VARIANT="$(jq -r 'keys[0]' /tmp/fill_422.json)"
|
|
if [ "$VARIANT" != "Consistency" ]; then
|
|
echo " ✗ expected variant=Consistency; got variant=$VARIANT body=$(cat /tmp/fill_422.json)"
|
|
exit 1
|
|
fi
|
|
echo " ✓ phantom candidate W-PHANTOM → 422 Consistency"
|
|
|
|
# 5. /v1/validate unknown kind → 400
|
|
echo "[validatord-smoke] /v1/validate unknown kind → 400:"
|
|
STATUS="$(curl -sS -o /tmp/unknown_400.txt -w '%{http_code}' -X POST http://127.0.0.1:3110/v1/validate \
|
|
-H 'Content-Type: application/json' \
|
|
-d '{"kind":"foo","artifact":{}}')"
|
|
if [ "$STATUS" != "400" ]; then
|
|
echo " ✗ expected 400; got $STATUS body=$(cat /tmp/unknown_400.txt)"
|
|
exit 1
|
|
fi
|
|
echo " ✓ unknown kind → 400"
|
|
|
|
# 6. Session log: every /v1/validate hit was a single-shot validation
|
|
# (not iterate), so the session log isn't populated yet. Verify it
|
|
# exists as a path (logger initialized) — the iterate handler is
|
|
# where rows actually land. This proves the wiring is in place
|
|
# even if the smoke doesn't drive a live iteration end-to-end.
|
|
LOG_PATH="$TMP/sessions.jsonl"
|
|
echo "[validatord-smoke] session log path wired:"
|
|
grep -q "validatord session log" /tmp/validatord.log || {
|
|
echo " ✗ expected validatord to log 'validatord session log' on startup"
|
|
grep validatord /tmp/validatord.log
|
|
exit 1
|
|
}
|
|
echo " ✓ session_log_path=$LOG_PATH announced at startup (rows land on /v1/iterate calls)"
|
|
|
|
echo "[validatord-smoke] PASS — 6/6 probes through gateway :3110"
|