golangLAKEHOUSE/docs/SESSION_LOG.md

# Coordinator session log — `coordinator_sessions.jsonl`

**Last updated:** 2026-05-02 · **Schema:** `session.iterate.v1` · **Writer:** `internal/validator.SessionLogger` · **Producer:** validatord `/v1/iterate`

## Why

The Langfuse trace tree is the live view: per-session, you can scroll
the retry chain and inspect every sub-call. But for **longitudinal
forensics** ("show me every session in the last week where the model
guessed a real worker without a retry," or "find sessions where
validation rejected three times in a row"), Langfuse's UI doesn't
scale — you need a queryable data plane.

This JSONL is that plane. One row per `/v1/iterate` session, append-
only, DuckDB-friendly.

## Where

Configurable via `[validatord].session_log_path` in `lakehouse.toml`.
Empty = disabled (best-effort posture; a missing log never blocks an
iterate request). Production:

```toml
[validatord]
session_log_path = "/var/lib/lakehouse/validator/sessions.jsonl"
```

## Schema (v1)

```jsonc
{
  "schema":          "session.iterate.v1",
  "session_id":      "<Langfuse trace_id>",      // join key to Langfuse
  "timestamp":       "2026-05-02T07:30:00.123456Z",
  "daemon":          "validatord",
  "kind":            "fill | email | playbook",
  "model":           "qwen3.5:latest",
  "provider":        "ollama",
  "prompt":          "produce a fill artifact ...",  // truncated to 4000 chars
  "iterations":      3,                              // attempts spent
  "max_iterations":  3,                              // cap per request
  "final_verdict":   "accepted | max_iter_exhausted | infra_error",
  "attempts": [
    { "iteration": 0, "verdict_kind": "validation_failed",
      "error": "consistency: candidate_id W-X not in roster",
      "span_id": "abc..." },
    { "iteration": 1, "verdict_kind": "validation_failed",
      "error": "consistency: city mismatch", "span_id": "def..." },
    { "iteration": 2, "verdict_kind": "accepted", "span_id": "ghi..." }
  ],
  "artifact":        { /* final accepted artifact, omitted on failure */ },
  "grounded_in_roster": true,                       // null when N/A (email/playbook)
  "duration_ms":     2840
}
```

### Field semantics

| Field | When set | What it means |
|---|---|---|
| `session_id` | always | Langfuse trace id. Pivot to live trace tree by URL: `${LANGFUSE_URL}/trace/<session_id>`. |
| `final_verdict=accepted` | success | Loop converged within `max_iterations`. `artifact` is non-null. |
| `final_verdict=max_iter_exhausted` | failure | Loop hit the cap without passing validation. `artifact` is omitted. |
| `final_verdict=infra_error` | failure | Chat hop or other infra crashed. Single attempt with `verdict_kind=infra_error`. |
| `grounded_in_roster=true` | fill kind, success | Every `candidate_id` in the artifact exists in `WorkerLookup`. |
| `grounded_in_roster=false` | fill kind, anomaly | Phantom or otherwise-invalid candidate IDs (shouldn't happen — FillValidator catches these — but the explicit check defends against future validator weakening). |
| `grounded_in_roster=null` (omitted) | non-fill kinds, or failure | The roster check doesn't apply or wasn't run. |

## DuckDB queries

```sql
-- Read the log directly via DuckDB's read_json_auto.
ATTACH ':memory:' AS sessions;
SELECT * FROM read_json_auto(
  '/var/lib/lakehouse/validator/sessions.jsonl', format='newline_delimited'
) LIMIT 10;
```

### "Did the validator catch every phantom worker?"
```sql
-- Sessions where iteration 0's verdict was validation_failed AND
-- the error mentions 'phantom' or 'consistency'. If grounded=true on
-- the same session's final state, the model recovered.
SELECT session_id, model, iterations, grounded_in_roster, final_verdict
FROM read_json_auto('sessions.jsonl', format='newline_delimited')
WHERE final_verdict = 'accepted'
  AND iterations > 1
  AND list_contains(
        list_transform(attempts,
          x -> x.error LIKE '%consistency%' OR x.error LIKE '%phantom%'),
        true
      );
```

### "First-shot success rate per model" (the "did the corpus give it enough" gate)
```sql
SELECT model,
       COUNT(*) AS sessions,
       SUM(CASE WHEN iterations = 1 AND final_verdict = 'accepted' THEN 1 ELSE 0 END) AS first_shot,
       ROUND(100.0 * SUM(CASE WHEN iterations = 1 AND final_verdict = 'accepted' THEN 1 ELSE 0 END) / COUNT(*), 1) AS pct
FROM read_json_auto('sessions.jsonl', format='newline_delimited')
WHERE kind = 'fill'
GROUP BY model
ORDER BY pct DESC;
```

### "Sessions that were never grounded" (the alarm query)
```sql
-- Should always be empty. If it isn't, FillValidator has a hole or
-- a different code path is bypassing the roster check.
SELECT session_id, model, iterations, attempts
FROM read_json_auto('sessions.jsonl', format='newline_delimited')
WHERE kind = 'fill'
  AND final_verdict = 'accepted'
  AND grounded_in_roster = false;
```

### "Average retry depth per model"
```sql
SELECT model, AVG(iterations) AS avg_iter, COUNT(*) AS n
FROM read_json_auto('sessions.jsonl', format='newline_delimited')
WHERE kind = 'fill' AND final_verdict = 'accepted'
GROUP BY model
ORDER BY avg_iter ASC;
```

### "What did validation reject?" (failure mode breakdown)
```sql
-- Pull each rejected attempt's error string, classify by prefix.
WITH errors AS (
  SELECT session_id,
         model,
         unnest(attempts) AS att
  FROM read_json_auto('sessions.jsonl', format='newline_delimited')
)
SELECT model,
       split_part(att.error, ':', 1) AS kind,
       COUNT(*) AS n
FROM errors
WHERE att.verdict_kind = 'validation_failed'
GROUP BY model, kind
ORDER BY n DESC;
```

## Operational notes

- **Append-only.** No row is ever updated; storage grows linearly with iterate calls. Operators rotate via cron when the file gets unwieldy (logrotate-style).
- **Best-effort posture.** Every write goes through `slog.Warn` on failure but never blocks the iterate handler. A full disk silently drops session rows; the iterate response still ships.
- **Schema versioning.** `schema=session.iterate.v1` is the contract. Future incompatible changes bump the version; consumers should branch on the field.
- **PII consideration.** `prompt` is captured truncated to 4000 chars and the final `artifact` (when present) is captured verbatim. Operators handling PII-bearing prompts should set the path under a restricted-access volume or filter before retention.
- **Cross-runtime parity.** The Rust gateway's `/v1/iterate` does NOT yet write this file. If you want a unified longitudinal log across runtimes, port the writer to Rust (`crates/gateway/src/v1/iterate.rs`) and target the same JSONL path. ~50 LOC.

## See also

- `internal/validator/session_log.go` — writer + record types
- `internal/validator/iterate.go` — `Tracer` callback + Langfuse span emission
- `internal/shared/langfuse_middleware.go` — `X-Lakehouse-Trace-Id` header propagation (the `session_id` join key)
- `data/_kb/replay_runs.jsonl` — the *replay* tool's own JSONL (different shape, different producer); these two streams are siblings, not duplicates