golangLAKEHOUSE

profit/golangLAKEHOUSE

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	1a3a82aedb	validatord: coordinator session JSONL for offline analysis (B follow-up) Closes the second half of J's 2026-05-02 multi-call observability concern. Trace-id propagation (commit d6d2fdf) gave us the live view in Langfuse; this gives us the longitudinal view for ad-hoc DuckDB queries over thousands of sessions: "show me every session where the model produced a real candidate without ever needing a retry" "find sessions where validation rejected three times in a row" "first-shot success rate per model — did we feed it enough corpus?" ## What's in internal/validator/session_log.go: - SessionRecord type (schema=session.iterate.v1) - SessionLogger writer — mutex-guarded append, best-effort posture, nil-safe (NewSessionLogger("") = nil = no-op on Append) - BuildSessionRecord helper — assembles a row from any iterate response/failure/infra-error combination, callable from other daemons that wrap iterate (cross-daemon shared schema) - 7 unit tests including concurrent-append safety + the three code paths (success / max_iter_exhausted / infra_error) cmd/validatord/main.go: - handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath - Iterate handler: build + append a SessionRecord on every call - rosterCheckFor("fill") closure stamps grounded_in_roster — the load-bearing forensic property J flagged ("we can never hallucinate available staff members to contracts") internal/shared/config.go + lakehouse.toml: - [validatord].session_log_path field; empty = disabled - Production: /var/lib/lakehouse/validator/sessions.jsonl scripts/validatord_smoke.sh: - Adds a probe verifying validatord announces session log path on startup. Smoke is now 6/6 (was 5/5). docs/SESSION_LOG.md: - Schema reference + 5 worked DuckDB query examples including the "alarm" query (sessions where grounded_in_roster=false on an accepted fill — should always be empty; if not, something is bypassing FillValidator). ## What this is NOT This is NOT a duplicate of replay_runs.jsonl. They're siblings: - replay_runs.jsonl: replay tool's per-task retrieval+model output - sessions.jsonl: validatord's per-iterate full retry chain + grounded-in-roster verdict A single coordinator session can produce rows in both streams; the session_id (= Langfuse trace_id) is the join key. ## Layered observability now in place Live view: Langfuse trace tree (X-Lakehouse-Trace-Id propagation) `iterate.attempt[N]` spans with prompt/raw/verdict Offline: coordinator_sessions.jsonl (this commit) DuckDB-queryable; longitudinal forensics Hard gate: FillValidator + WorkerLookup (existing) phantom IDs structurally rejected, never reach session log's grounded_in_roster=true bucket Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE section — these layers are wired; future work targets the data, not the wiring. ## Verification - internal/validator: 7 new tests (session_log_test.go) — all PASS - cmd/validatord: 3 new integration tests covering the success, failure, and grounded=false paths — all PASS - validatord_smoke.sh: 6/6 PASS through gateway :3110 - Full go test ./... green across 33 packages Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:22:09 -05:00
root	d6d2fdf81f	trace-id propagation through /v1/iterate (multi-call observability) Closes J's 2026-05-02 multi-call observability gap: a single /v1/iterate session with N retries used to surface in Langfuse as N+1 disconnected traces (one per /v1/chat hop + one for the iterate request itself), with no parent/child linkage. Operators couldn't scroll the retry chain in one trace tree to spot where grounding failed. ## Wire-level change - New header constant `shared.TraceIDHeader = "X-Lakehouse-Trace-Id"` - `langfuseMiddleware` honors the header on inbound requests: if set, reuses that trace id instead of minting a new one. Stashes the trace id on the request context so handlers can attach application-level child spans. - `validatord.chatCaller` forwards the header to chatd. Every chat hop in an iterate session lands as a child of the parent trace. ## Application-level spans - `validator.IterateConfig` gains `Tracer` (optional callback). When wired, each iteration attempt emits one Langfuse span via `validator.AttemptSpan`: Name: iterate.attempt[N] Input: { iteration, model, provider, prompt } Output: { verdict, raw, error } Level: WARNING when verdict != accepted - `validatord.iterTracer` is the production hook — bridges `validator.Tracer` → `langfuse.Client.Span`. - `IterateRequest`/`IterateResponse`/`IterateFailure` gain `TraceID`; each `IterateAttempt` gains `SpanID`. The /v1/iterate caller can pivot from the JSON response straight into the Langfuse trace tree. ## What an operator sees post-cutover GET /v1/iterate {kind=fill, prompt=...} → Trace TR-1 ├─ http.request span (from middleware) ├─ iterate.attempt[0] span (validator.Iterate emit) │ input: prompt+model │ output: { verdict: validation_failed, error: ..., raw } ├─ chatd /v1/chat call (X-Lakehouse-Trace-Id: TR-1) │ ├─ http.request span (chatd middleware) │ └─ chatd-internal spans (existing) ├─ iterate.attempt[1] span └─ ... All in one Langfuse trace tree, not N+1 separate traces. ## Hallucinated-worker safety net is unchanged The /v1/iterate flow's hard correctness gate is still FillValidator + WorkerLookup. Phantom candidate IDs raise ValidationError::Consistency which 422s and forces the iteration loop to retry. The trace-id propagation is the OBSERVABILITY layer on top — it makes the existing safety net's outcomes visible per-call, not a replacement for it. ## Verification - internal/validator: 4 new tests - TestIterate_TracerEmitsSpanPerAttempt — span/attempt count + SpanID - TestIterate_NoTraceIDSkipsTracer — no orphan spans without trace_id - TestIterate_ChatCallerReceivesTraceID — propagation contract - (existing iterate tests updated for new ChatCaller signature) - internal/shared: 1 new test - TestLangfuseMiddleware_HonorsTraceIDHeader — cross-service linkage - cmd/validatord: existing HTTP tests still PASS via the dual-shape UnmarshalJSON contract. - validatord_smoke.sh: 5/5 PASS through gateway :3110 (unchanged). - Full go test ./... green across 33 packages. ## Architecture invariant added STATE_OF_PLAY "DO NOT RELITIGATE" gains a paragraph documenting the X-Lakehouse-Trace-Id header contract + the iterate.attempt[N] span emission. Future-Claude won't re-propose "wire trace-id propagation" — the header IS the wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:13:18 -05:00
root	f9e72412c1	validatord: /v1/validate + /v1/iterate HTTP surface (port 3221) Closes the last "Go primary" backlog item in docs/ARCHITECTURE_COMPARISON.md. Go now owns the entire validator path end-to-end — no Rust dep for staffing safety net. Architecture: cmd/validatord on :3221 hosts both endpoints. Calls chatd directly for the iterate loop's LLM hop (no gateway self-loopback like the Rust shape). Gateway proxies /v1/validate + /v1/iterate to validatord. What's in: - internal/validator/playbook.go — 3rd validator kind (PRD checks: fill: prefix, endorsed_names ≤ target_count×2, fingerprint required) - internal/validator/lookup_jsonl.go — JSONL roster loader (Parquet deferred; producer one-liner documented in package comment) - internal/validator/iterate.go — ExtractJSON helper + Iterate orchestrator with ChatCaller seam for unit tests - cmd/validatord/main.go — HTTP routes, roster load, chat client - internal/shared/config.go — ValidatordConfig + gateway URL field - lakehouse.toml — [validatord] section - cmd/gateway/main.go — proxy routes for /v1/validate + /v1/iterate Smoke: 5/5 PASS through gateway :3110: ✓ playbook happy path ✓ playbook missing fingerprint → 422 schema/fingerprint ✓ phantom candidate W-PHANTOM → 422 consistency ✓ unknown kind → 400 ✓ roster loaded with 3 records go test ./... green across 33 packages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 03:53:20 -05:00

Author

SHA1

Message

Date

root

1a3a82aedb

validatord: coordinator session JSONL for offline analysis (B follow-up)

Closes the second half of J's 2026-05-02 multi-call observability
concern. Trace-id propagation (commit d6d2fdf) gave us the *live*
view in Langfuse; this gives us the *longitudinal* view for ad-hoc
DuckDB queries over thousands of sessions:

  "show me every session where the model produced a real candidate
   without ever needing a retry"
  "find sessions where validation rejected three times in a row"
  "first-shot success rate per model — did we feed it enough corpus?"

## What's in

internal/validator/session_log.go:
  - SessionRecord type (schema=session.iterate.v1)
  - SessionLogger writer — mutex-guarded append, best-effort posture,
    nil-safe (NewSessionLogger("") = nil = no-op on Append)
  - BuildSessionRecord helper — assembles a row from any
    iterate response/failure/infra-error combination, callable from
    other daemons that wrap iterate (cross-daemon shared schema)
  - 7 unit tests including concurrent-append safety + the three
    code paths (success / max_iter_exhausted / infra_error)

cmd/validatord/main.go:
  - handlers.sessionLog field + wiring from cfg.Validatord.SessionLogPath
  - Iterate handler: build + append a SessionRecord on every call
  - rosterCheckFor("fill") closure stamps grounded_in_roster — the
    load-bearing forensic property J flagged ("we can never
    hallucinate available staff members to contracts")

internal/shared/config.go + lakehouse.toml:
  - [validatord].session_log_path field; empty = disabled
  - Production: /var/lib/lakehouse/validator/sessions.jsonl

scripts/validatord_smoke.sh:
  - Adds a probe verifying validatord announces session log path on
    startup. Smoke is now 6/6 (was 5/5).

docs/SESSION_LOG.md:
  - Schema reference + 5 worked DuckDB query examples including the
    "alarm" query (sessions where grounded_in_roster=false on an
    accepted fill — should always be empty; if not, something is
    bypassing FillValidator).

## What this is NOT

This is NOT a duplicate of replay_runs.jsonl. They're siblings:
  - replay_runs.jsonl: replay tool's per-task retrieval+model output
  - sessions.jsonl: validatord's per-iterate full retry chain +
    grounded-in-roster verdict

A single coordinator session can produce rows in both streams; the
session_id (= Langfuse trace_id) is the join key.

## Layered observability now in place

  Live view:  Langfuse trace tree (X-Lakehouse-Trace-Id propagation)
              `iterate.attempt[N]` spans with prompt/raw/verdict
  Offline:    coordinator_sessions.jsonl (this commit)
              DuckDB-queryable; longitudinal forensics
  Hard gate:  FillValidator + WorkerLookup (existing)
              phantom IDs structurally rejected, never reach
              session log's grounded_in_roster=true bucket

Per the architecture invariant in STATE_OF_PLAY's DO NOT RELITIGATE
section — these layers are wired; future work targets the data, not
the wiring.

## Verification

- internal/validator: 7 new tests (session_log_test.go) — all PASS
- cmd/validatord: 3 new integration tests covering the success,
  failure, and grounded=false paths — all PASS
- validatord_smoke.sh: 6/6 PASS through gateway :3110
- Full go test ./... green across 33 packages

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 05:22:09 -05:00

root

d6d2fdf81f

trace-id propagation through /v1/iterate (multi-call observability)

Closes J's 2026-05-02 multi-call observability gap: a single
/v1/iterate session with N retries used to surface in Langfuse as
N+1 disconnected traces (one per /v1/chat hop + one for the iterate
request itself), with no parent/child linkage. Operators couldn't
scroll the retry chain in one trace tree to spot where grounding
failed.

## Wire-level change

- New header constant `shared.TraceIDHeader = "X-Lakehouse-Trace-Id"`
- `langfuseMiddleware` honors the header on inbound requests: if
  set, reuses that trace id instead of minting a new one. Stashes
  the trace id on the request context so handlers can attach
  application-level child spans.
- `validatord.chatCaller` forwards the header to chatd. Every chat
  hop in an iterate session lands as a child of the parent trace.

## Application-level spans

- `validator.IterateConfig` gains `Tracer` (optional callback).
  When wired, each iteration attempt emits one Langfuse span
  via `validator.AttemptSpan`:
    Name: iterate.attempt[N]
    Input: { iteration, model, provider, prompt }
    Output: { verdict, raw, error }
    Level: WARNING when verdict != accepted
- `validatord.iterTracer` is the production hook — bridges
  `validator.Tracer` → `langfuse.Client.Span`.
- `IterateRequest`/`IterateResponse`/`IterateFailure` gain
  `TraceID`; each `IterateAttempt` gains `SpanID`. The /v1/iterate
  caller can pivot from the JSON response straight into the
  Langfuse trace tree.

## What an operator sees post-cutover

  GET /v1/iterate {kind=fill, prompt=...} → Trace TR-1
    ├─ http.request span (from middleware)
    ├─ iterate.attempt[0] span (validator.Iterate emit)
    │     input: prompt+model
    │     output: { verdict: validation_failed, error: ..., raw }
    ├─ chatd /v1/chat call (X-Lakehouse-Trace-Id: TR-1)
    │     ├─ http.request span (chatd middleware)
    │     └─ chatd-internal spans (existing)
    ├─ iterate.attempt[1] span
    └─ ...

All in one Langfuse trace tree, not N+1 separate traces.

## Hallucinated-worker safety net is unchanged

The /v1/iterate flow's hard correctness gate is still
FillValidator + WorkerLookup. Phantom candidate IDs raise
ValidationError::Consistency which 422s and forces the iteration
loop to retry. The trace-id propagation is the OBSERVABILITY layer
on top — it makes the existing safety net's outcomes visible per-call,
not a replacement for it.

## Verification

- internal/validator: 4 new tests
  - TestIterate_TracerEmitsSpanPerAttempt — span/attempt count + SpanID
  - TestIterate_NoTraceIDSkipsTracer — no orphan spans without trace_id
  - TestIterate_ChatCallerReceivesTraceID — propagation contract
  - (existing iterate tests updated for new ChatCaller signature)
- internal/shared: 1 new test
  - TestLangfuseMiddleware_HonorsTraceIDHeader — cross-service linkage
- cmd/validatord: existing HTTP tests still PASS via the dual-shape
  UnmarshalJSON contract.
- validatord_smoke.sh: 5/5 PASS through gateway :3110 (unchanged).
- Full go test ./... green across 33 packages.

## Architecture invariant added

STATE_OF_PLAY "DO NOT RELITIGATE" gains a paragraph documenting
the X-Lakehouse-Trace-Id header contract + the iterate.attempt[N]
span emission. Future-Claude won't re-propose "wire trace-id
propagation" — the header IS the wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 05:13:18 -05:00

root

f9e72412c1

validatord: /v1/validate + /v1/iterate HTTP surface (port 3221)

Closes the last "Go primary" backlog item in
docs/ARCHITECTURE_COMPARISON.md. Go now owns the entire validator
path end-to-end — no Rust dep for staffing safety net.

Architecture: cmd/validatord on :3221 hosts both endpoints. Calls
chatd directly for the iterate loop's LLM hop (no gateway
self-loopback like the Rust shape). Gateway proxies /v1/validate +
/v1/iterate to validatord.

What's in:
- internal/validator/playbook.go — 3rd validator kind (PRD checks:
  fill: prefix, endorsed_names ≤ target_count×2, fingerprint required)
- internal/validator/lookup_jsonl.go — JSONL roster loader (Parquet
  deferred; producer one-liner documented in package comment)
- internal/validator/iterate.go — ExtractJSON helper + Iterate
  orchestrator with ChatCaller seam for unit tests
- cmd/validatord/main.go — HTTP routes, roster load, chat client
- internal/shared/config.go — ValidatordConfig + gateway URL field
- lakehouse.toml — [validatord] section
- cmd/gateway/main.go — proxy routes for /v1/validate + /v1/iterate

Smoke: 5/5 PASS through gateway :3110:
  ✓ playbook happy path
  ✓ playbook missing fingerprint → 422 schema/fingerprint
  ✓ phantom candidate W-PHANTOM → 422 consistency
  ✓ unknown kind → 400
  ✓ roster loaded with 3 records

go test ./... green across 33 packages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 03:53:20 -05:00

3 Commits