5 Commits

Author SHA1 Message Date
root
6847bbc180 validatord: honor X-Lakehouse-Trace-Id even when Langfuse is off
Surfaced by the 2026-05-02 cross-runtime test: when a caller
forwarded X-Lakehouse-Trace-Id but the langfuse middleware was a
passthrough (no Langfuse env), the header was never read — Go minted
a fallback id, breaking cross-daemon parent-trace linkage.

The middleware only honored the header when its lf client was
non-nil. With LANGFUSE_URL unset on the persistent stack, every
inbound iterate request lost the parent linkage.

Fix: validatord's iterate handler reads the header DIRECTLY (matches
Rust's iterate.rs pattern) before falling through to the ctx value
+ fallback id. Now Go behavior matches Rust regardless of Langfuse
configuration.

Resolution order is:
  1. req.TraceID (caller put it in the JSON body)
  2. X-Lakehouse-Trace-Id header (read directly here)
  3. context value from langfuse middleware (when configured)
  4. fallback to a locally-minted time-ordered hex id

Verified end-to-end:
  curl -H 'X-Lakehouse-Trace-Id: go-cmp-fixed' POST /v1/iterate
  → response.trace_id = "go-cmp-fixed" ✓
  → sessions.jsonl row session_id = "go-cmp-fixed" ✓

Pre-fix (this commit's parent ran from /tmp/val-fresh3 binary):
  same call → trace_id minted as 18abbb5a008061b7-008061e9
  (header silently ignored)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:16:25 -05:00
root
1263720497 validatord: always populate session_id (fallback when Langfuse off)
Surfaced during the 2026-05-02 deploy + reality wave: the persistent
Go stack runs without LANGFUSE_URL/PUBLIC_KEY/SECRET_KEY env, so
shared.langfuseMiddleware operates as a passthrough — never minting
a trace id, never stashing it on the request context. Result:
session_id was empty on every JSONL row, breaking correlation across
the longitudinal log + replay_runs.jsonl + future Langfuse traces.

The fix: validatord falls back to a locally-generated time-ordered
hex id when both the X-Lakehouse-Trace-Id header AND the middleware
context are empty. Same shape Langfuse accepts, so a future deploy
that turns Langfuse on doesn't break correlation — already-emitted
session_ids stay valid as Langfuse trace ids.

Verified post-deploy by driving 9 /v1/iterate sessions through the
persistent stack at :4110:
  - 6 accepted on iter 0 (qwen2.5:latest first-shot 75%)
  - 2 max_iter_exhausted (no_json on prose-y prompts)
  - 1 infra_error (chatd cold-start probe timed out at 5s)

Latest row's session_id: "18abbabdc2306a83-c2306aa9" (was: "")

Probe re-runs (validator_parity, session_log_parity) included as
post-deploy artifacts; both 6/6 + 4/4 with the freshly-restarted
persistent gateway+validatord binaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 06:03:43 -05:00
root
1a3a82aedb validatord: coordinator session JSONL for offline analysis (B follow-up)
Closes the second half of J's 2026-05-02 multi-call observability
concern. Trace-id propagation (commit d6d2fdf) gave us the *live*
view in Langfuse; this gives us the *longitudinal* view for ad-hoc
DuckDB queries over thousands of sessions:

  "show me every session where the model produced a real candidate
   without ever needing a retry"
  "find sessions where validation rejected three times in a row"
  "first-shot success rate per model — did we feed it enough corpus?"

## What's in

internal/validator/session_log.go:
  - SessionRecord type (schema=session.iterate.v1)
  - SessionLogger writer — mutex-guarded append, best-effort posture,
    nil-safe (NewSessionLogger("") = nil = no-op on Append)
  - BuildSessionRecord helper — assembles a row from any
    iterate response/failure/infra-error combination, callable from
    other daemons that wrap iterate (cross-daemon shared schema)
  - 7 unit tests including concurrent-append safety + the three
    code paths (success / max_iter_exhausted / infra_error)

cmd/validatord/main.go:
  - handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath
  - Iterate handler: build + append a SessionRecord on every call
  - rosterCheckFor("fill") closure stamps grounded_in_roster — the
    load-bearing forensic property J flagged ("we can never
    hallucinate available staff members to contracts")

internal/shared/config.go + lakehouse.toml:
  - [validatord].session_log_path field; empty = disabled
  - Production: /var/lib/lakehouse/validator/sessions.jsonl

scripts/validatord_smoke.sh:
  - Adds a probe verifying validatord announces session log path on
    startup. Smoke is now 6/6 (was 5/5).

docs/SESSION_LOG.md:
  - Schema reference + 5 worked DuckDB query examples including the
    "alarm" query (sessions where grounded_in_roster=false on an
    accepted fill — should always be empty; if not, something is
    bypassing FillValidator).

## What this is NOT

This is NOT a duplicate of replay_runs.jsonl. They're siblings:
  - replay_runs.jsonl: replay tool's per-task retrieval+model output
  - sessions.jsonl: validatord's per-iterate full retry chain +
    grounded-in-roster verdict

A single coordinator session can produce rows in both streams; the
session_id (= Langfuse trace_id) is the join key.

## Layered observability now in place

  Live view:  Langfuse trace tree (X-Lakehouse-Trace-Id propagation)
              `iterate.attempt[N]` spans with prompt/raw/verdict
  Offline:    coordinator_sessions.jsonl (this commit)
              DuckDB-queryable; longitudinal forensics
  Hard gate:  FillValidator + WorkerLookup (existing)
              phantom IDs structurally rejected, never reach
              session log's grounded_in_roster=true bucket

Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE
section — these layers are wired; future work targets the data, not
the wiring.

## Verification

- internal/validator: 7 new tests (session_log_test.go) — all PASS
- cmd/validatord: 3 new integration tests covering the success,
  failure, and grounded=false paths — all PASS
- validatord_smoke.sh: 6/6 PASS through gateway :3110
- Full go test ./... green across 33 packages

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:22:09 -05:00
root
d6d2fdf81f trace-id propagation through /v1/iterate (multi-call observability)
Closes J's 2026-05-02 multi-call observability gap: a single
/v1/iterate session with N retries used to surface in Langfuse as
N+1 disconnected traces (one per /v1/chat hop + one for the iterate
request itself), with no parent/child linkage. Operators couldn't
scroll the retry chain in one trace tree to spot where grounding
failed.

## Wire-level change

- New header constant `shared.TraceIDHeader = "X-Lakehouse-Trace-Id"`
- `langfuseMiddleware` honors the header on inbound requests: if
  set, reuses that trace id instead of minting a new one. Stashes
  the trace id on the request context so handlers can attach
  application-level child spans.
- `validatord.chatCaller` forwards the header to chatd. Every chat
  hop in an iterate session lands as a child of the parent trace.

## Application-level spans

- `validator.IterateConfig` gains `Tracer` (optional callback).
  When wired, each iteration attempt emits one Langfuse span
  via `validator.AttemptSpan`:
    Name: iterate.attempt[N]
    Input: { iteration, model, provider, prompt }
    Output: { verdict, raw, error }
    Level: WARNING when verdict != accepted
- `validatord.iterTracer` is the production hook — bridges
  `validator.Tracer` → `langfuse.Client.Span`.
- `IterateRequest`/`IterateResponse`/`IterateFailure` gain
  `TraceID`; each `IterateAttempt` gains `SpanID`. The /v1/iterate
  caller can pivot from the JSON response straight into the
  Langfuse trace tree.

## What an operator sees post-cutover

  GET /v1/iterate {kind=fill, prompt=...} → Trace TR-1
    ├─ http.request span (from middleware)
    ├─ iterate.attempt[0] span (validator.Iterate emit)
    │     input: prompt+model
    │     output: { verdict: validation_failed, error: ..., raw }
    ├─ chatd /v1/chat call (X-Lakehouse-Trace-Id: TR-1)
    │     ├─ http.request span (chatd middleware)
    │     └─ chatd-internal spans (existing)
    ├─ iterate.attempt[1] span
    └─ ...

All in one Langfuse trace tree, not N+1 separate traces.

## Hallucinated-worker safety net is unchanged

The /v1/iterate flow's hard correctness gate is still
FillValidator + WorkerLookup. Phantom candidate IDs raise
ValidationError::Consistency which 422s and forces the iteration
loop to retry. The trace-id propagation is the OBSERVABILITY layer
on top — it makes the existing safety net's outcomes visible per-call,
not a replacement for it.

## Verification

- internal/validator: 4 new tests
  - TestIterate_TracerEmitsSpanPerAttempt — span/attempt count + SpanID
  - TestIterate_NoTraceIDSkipsTracer — no orphan spans without trace_id
  - TestIterate_ChatCallerReceivesTraceID — propagation contract
  - (existing iterate tests updated for new ChatCaller signature)
- internal/shared: 1 new test
  - TestLangfuseMiddleware_HonorsTraceIDHeader — cross-service linkage
- cmd/validatord: existing HTTP tests still PASS via the dual-shape
  UnmarshalJSON contract.
- validatord_smoke.sh: 5/5 PASS through gateway :3110 (unchanged).
- Full go test ./... green across 33 packages.

## Architecture invariant added

STATE_OF_PLAY "DO NOT RELITIGATE" gains a paragraph documenting
the X-Lakehouse-Trace-Id header contract + the iterate.attempt[N]
span emission. Future-Claude won't re-propose "wire trace-id
propagation" — the header IS the wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 05:13:18 -05:00
root
f9e72412c1 validatord: /v1/validate + /v1/iterate HTTP surface (port 3221)
Closes the last "Go primary" backlog item in
docs/ARCHITECTURE_COMPARISON.md. Go now owns the entire validator
path end-to-end — no Rust dep for staffing safety net.

Architecture: cmd/validatord on :3221 hosts both endpoints. Calls
chatd directly for the iterate loop's LLM hop (no gateway
self-loopback like the Rust shape). Gateway proxies /v1/validate +
/v1/iterate to validatord.

What's in:
- internal/validator/playbook.go — 3rd validator kind (PRD checks:
  fill: prefix, endorsed_names ≤ target_count×2, fingerprint required)
- internal/validator/lookup_jsonl.go — JSONL roster loader (Parquet
  deferred; producer one-liner documented in package comment)
- internal/validator/iterate.go — ExtractJSON helper + Iterate
  orchestrator with ChatCaller seam for unit tests
- cmd/validatord/main.go — HTTP routes, roster load, chat client
- internal/shared/config.go — ValidatordConfig + gateway URL field
- lakehouse.toml — [validatord] section
- cmd/gateway/main.go — proxy routes for /v1/validate + /v1/iterate

Smoke: 5/5 PASS through gateway :3110:
  ✓ playbook happy path
  ✓ playbook missing fingerprint → 422 schema/fingerprint
  ✓ phantom candidate W-PHANTOM → 422 consistency
  ✓ unknown kind → 400
  ✓ roster loaded with 3 records

go test ./... green across 33 packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:53:20 -05:00