# Coordinator session log — `coordinator_sessions.jsonl` **Last updated:** 2026-05-02 · **Schema:** `session.iterate.v1` · **Writer:** `internal/validator.SessionLogger` · **Producer:** validatord `/v1/iterate` ## Why The Langfuse trace tree is the live view: per-session, you can scroll the retry chain and inspect every sub-call. But for **longitudinal forensics** ("show me every session in the last week where the model guessed a real worker without a retry," or "find sessions where validation rejected three times in a row"), Langfuse's UI doesn't scale — you need a queryable data plane. This JSONL is that plane. One row per `/v1/iterate` session, append- only, DuckDB-friendly. ## Where Configurable via `[validatord].session_log_path` in `lakehouse.toml`. Empty = disabled (best-effort posture; a missing log never blocks an iterate request). Production: ```toml [validatord] session_log_path = "/var/lib/lakehouse/validator/sessions.jsonl" ``` ## Schema (v1) ```jsonc { "schema": "session.iterate.v1", "session_id": "", // join key to Langfuse "timestamp": "2026-05-02T07:30:00.123456Z", "daemon": "validatord", "kind": "fill | email | playbook", "model": "qwen3.5:latest", "provider": "ollama", "prompt": "produce a fill artifact ...", // truncated to 4000 chars "iterations": 3, // attempts spent "max_iterations": 3, // cap per request "final_verdict": "accepted | max_iter_exhausted | infra_error", "attempts": [ { "iteration": 0, "verdict_kind": "validation_failed", "error": "consistency: candidate_id W-X not in roster", "span_id": "abc..." }, { "iteration": 1, "verdict_kind": "validation_failed", "error": "consistency: city mismatch", "span_id": "def..." }, { "iteration": 2, "verdict_kind": "accepted", "span_id": "ghi..." } ], "artifact": { /* final accepted artifact, omitted on failure */ }, "grounded_in_roster": true, // null when N/A (email/playbook) "duration_ms": 2840 } ``` ### Field semantics | Field | When set | What it means | |---|---|---| | `session_id` | always | Langfuse trace id. Pivot to live trace tree by URL: `${LANGFUSE_URL}/trace/`. | | `final_verdict=accepted` | success | Loop converged within `max_iterations`. `artifact` is non-null. | | `final_verdict=max_iter_exhausted` | failure | Loop hit the cap without passing validation. `artifact` is omitted. | | `final_verdict=infra_error` | failure | Chat hop or other infra crashed. Single attempt with `verdict_kind=infra_error`. | | `grounded_in_roster=true` | fill kind, success | Every `candidate_id` in the artifact exists in `WorkerLookup`. | | `grounded_in_roster=false` | fill kind, anomaly | Phantom or otherwise-invalid candidate IDs (shouldn't happen — FillValidator catches these — but the explicit check defends against future validator weakening). | | `grounded_in_roster=null` (omitted) | non-fill kinds, or failure | The roster check doesn't apply or wasn't run. | ## DuckDB queries ```sql -- Read the log directly via DuckDB's read_json_auto. ATTACH ':memory:' AS sessions; SELECT * FROM read_json_auto( '/var/lib/lakehouse/validator/sessions.jsonl', format='newline_delimited' ) LIMIT 10; ``` ### "Did the validator catch every phantom worker?" ```sql -- Sessions where iteration 0's verdict was validation_failed AND -- the error mentions 'phantom' or 'consistency'. If grounded=true on -- the same session's final state, the model recovered. SELECT session_id, model, iterations, grounded_in_roster, final_verdict FROM read_json_auto('sessions.jsonl', format='newline_delimited') WHERE final_verdict = 'accepted' AND iterations > 1 AND list_contains( list_transform(attempts, x -> x.error LIKE '%consistency%' OR x.error LIKE '%phantom%'), true ); ``` ### "First-shot success rate per model" (the "did the corpus give it enough" gate) ```sql SELECT model, COUNT(*) AS sessions, SUM(CASE WHEN iterations = 1 AND final_verdict = 'accepted' THEN 1 ELSE 0 END) AS first_shot, ROUND(100.0 * SUM(CASE WHEN iterations = 1 AND final_verdict = 'accepted' THEN 1 ELSE 0 END) / COUNT(*), 1) AS pct FROM read_json_auto('sessions.jsonl', format='newline_delimited') WHERE kind = 'fill' GROUP BY model ORDER BY pct DESC; ``` ### "Sessions that were never grounded" (the alarm query) ```sql -- Should always be empty. If it isn't, FillValidator has a hole or -- a different code path is bypassing the roster check. SELECT session_id, model, iterations, attempts FROM read_json_auto('sessions.jsonl', format='newline_delimited') WHERE kind = 'fill' AND final_verdict = 'accepted' AND grounded_in_roster = false; ``` ### "Average retry depth per model" ```sql SELECT model, AVG(iterations) AS avg_iter, COUNT(*) AS n FROM read_json_auto('sessions.jsonl', format='newline_delimited') WHERE kind = 'fill' AND final_verdict = 'accepted' GROUP BY model ORDER BY avg_iter ASC; ``` ### "What did validation reject?" (failure mode breakdown) ```sql -- Pull each rejected attempt's error string, classify by prefix. WITH errors AS ( SELECT session_id, model, unnest(attempts) AS att FROM read_json_auto('sessions.jsonl', format='newline_delimited') ) SELECT model, split_part(att.error, ':', 1) AS kind, COUNT(*) AS n FROM errors WHERE att.verdict_kind = 'validation_failed' GROUP BY model, kind ORDER BY n DESC; ``` ## Operational notes - **Append-only.** No row is ever updated; storage grows linearly with iterate calls. Operators rotate via cron when the file gets unwieldy (logrotate-style). - **Best-effort posture.** Every write goes through `slog.Warn` on failure but never blocks the iterate handler. A full disk silently drops session rows; the iterate response still ships. - **Schema versioning.** `schema=session.iterate.v1` is the contract. Future incompatible changes bump the version; consumers should branch on the field. - **PII consideration.** `prompt` is captured truncated to 4000 chars and the final `artifact` (when present) is captured verbatim. Operators handling PII-bearing prompts should set the path under a restricted-access volume or filter before retention. - **Cross-runtime parity.** The Rust gateway's `/v1/iterate` does NOT yet write this file. If you want a unified longitudinal log across runtimes, port the writer to Rust (`crates/gateway/src/v1/iterate.rs`) and target the same JSONL path. ~50 LOC. ## See also - `internal/validator/session_log.go` — writer + record types - `internal/validator/iterate.go` — `Tracer` callback + Langfuse span emission - `internal/shared/langfuse_middleware.go` — `X-Lakehouse-Trace-Id` header propagation (the `session_id` join key) - `data/_kb/replay_runs.jsonl` — the *replay* tool's own JSONL (different shape, different producer); these two streams are siblings, not duplicates