validatord: always populate session_id (fallback when Langfuse off)

Surfaced during the 2026-05-02 deploy + reality wave: the persistent
Go stack runs without LANGFUSE_URL/PUBLIC_KEY/SECRET_KEY env, so
shared.langfuseMiddleware operates as a passthrough — never minting
a trace id, never stashing it on the request context. Result:
session_id was empty on every JSONL row, breaking correlation across
the longitudinal log + replay_runs.jsonl + future Langfuse traces.

The fix: validatord falls back to a locally-generated time-ordered
hex id when both the X-Lakehouse-Trace-Id header AND the middleware
context are empty. Same shape Langfuse accepts, so a future deploy
that turns Langfuse on doesn't break correlation — already-emitted
session_ids stay valid as Langfuse trace ids.

Verified post-deploy by driving 9 /v1/iterate sessions through the
persistent stack at :4110:
  - 6 accepted on iter 0 (qwen2.5:latest first-shot 75%)
  - 2 max_iter_exhausted (no_json on prose-y prompts)
  - 1 infra_error (chatd cold-start probe timed out at 5s)

Latest row's session_id: "18abbabdc2306a83-c2306aa9" (was: "")

Probe re-runs (validator_parity, session_log_parity) included as
post-deploy artifacts; both 6/6 + 4/4 with the freshly-restarted
persistent gateway+validatord binaries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-02 06:03:43 -05:00
parent fa4e1b4e16
commit 1263720497
3 changed files with 23 additions and 5 deletions

View File

@ -247,12 +247,18 @@ func (h *handlers) handleIterate(w http.ResponseWriter, r *http.Request) {
// Pull the per-request trace id from the langfuse middleware. If
// the caller forwarded an upstream trace via X-Lakehouse-Trace-Id
// the middleware reuses that one; otherwise it minted a fresh trace
// at HTTP entry. Either way, we propagate it so chat hops nest
// under the same parent and operators can pivot from the iterate
// response's trace_id straight into the full Langfuse tree.
// at HTTP entry. When Langfuse isn't configured at all the
// middleware skips the mint, so we generate a fallback id locally —
// session_id MUST always be populated so the JSONL log + Langfuse
// (when later configured) can correlate by id across logs.
// Surfaced 2026-05-02 deploy: empty session_id breaks DuckDB
// joins on coordinator_sessions.jsonl ↔ replay_runs.jsonl.
if req.TraceID == "" {
req.TraceID = shared.TraceIDFromCtx(r.Context())
}
if req.TraceID == "" {
req.TraceID = newFallbackTraceID()
}
chat := h.chatCaller()
validate := func(kind string, artifact map[string]any) (validator.Report, error) {
@ -443,6 +449,18 @@ func (h *handlers) iterTracer(ctx context.Context) validator.Tracer {
}
}
// newFallbackTraceID generates a time-ordered hex id used when no
// upstream trace id was forwarded AND the langfuse middleware didn't
// mint one (Langfuse unconfigured). Same shape Langfuse accepts so
// future Langfuse-enabled deployments don't break correlation. Avoids
// pulling in a uuid crate dep; nanosecond precision + a randomness
// suffix is unique enough at the rates iterate runs.
func newFallbackTraceID() string {
ns := time.Now().UTC().UnixNano()
rand := uint32(time.Now().UnixNano() % (1 << 32))
return fmt.Sprintf("%016x-%08x", ns, rand)
}
func writeJSON(w http.ResponseWriter, status int, body any) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)

View File

@ -1,6 +1,6 @@
# session_log parity probe — Rust gateway vs Go validatord
**Date:** 2026-05-02T10:38:10Z
**Date:** 2026-05-02T10:53:00Z
**Rust helper:** `/home/profit/lakehouse/target/release/parity_session_log`
**Go helper:** `./bin/parity_session_log_go`

View File

@ -1,6 +1,6 @@
# Validator parity probe — Rust :3100 vs Go :4110
**Date:** 2026-05-02T09:47:49Z
**Date:** 2026-05-02T10:52:59Z
**Rust gateway:** `http://127.0.0.1:3100` · **Go gateway:** `http://127.0.0.1:4110`
Identical `POST /v1/validate` request → both runtimes. Match