golangLAKEHOUSE

profit/golangLAKEHOUSE

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	d6d2fdf81f	trace-id propagation through /v1/iterate (multi-call observability) Closes J's 2026-05-02 multi-call observability gap: a single /v1/iterate session with N retries used to surface in Langfuse as N+1 disconnected traces (one per /v1/chat hop + one for the iterate request itself), with no parent/child linkage. Operators couldn't scroll the retry chain in one trace tree to spot where grounding failed. ## Wire-level change - New header constant `shared.TraceIDHeader = "X-Lakehouse-Trace-Id"` - `langfuseMiddleware` honors the header on inbound requests: if set, reuses that trace id instead of minting a new one. Stashes the trace id on the request context so handlers can attach application-level child spans. - `validatord.chatCaller` forwards the header to chatd. Every chat hop in an iterate session lands as a child of the parent trace. ## Application-level spans - `validator.IterateConfig` gains `Tracer` (optional callback). When wired, each iteration attempt emits one Langfuse span via `validator.AttemptSpan`: Name: iterate.attempt[N] Input: { iteration, model, provider, prompt } Output: { verdict, raw, error } Level: WARNING when verdict != accepted - `validatord.iterTracer` is the production hook — bridges `validator.Tracer` → `langfuse.Client.Span`. - `IterateRequest`/`IterateResponse`/`IterateFailure` gain `TraceID`; each `IterateAttempt` gains `SpanID`. The /v1/iterate caller can pivot from the JSON response straight into the Langfuse trace tree. ## What an operator sees post-cutover GET /v1/iterate {kind=fill, prompt=...} → Trace TR-1 ├─ http.request span (from middleware) ├─ iterate.attempt[0] span (validator.Iterate emit) │ input: prompt+model │ output: { verdict: validation_failed, error: ..., raw } ├─ chatd /v1/chat call (X-Lakehouse-Trace-Id: TR-1) │ ├─ http.request span (chatd middleware) │ └─ chatd-internal spans (existing) ├─ iterate.attempt[1] span └─ ... All in one Langfuse trace tree, not N+1 separate traces. ## Hallucinated-worker safety net is unchanged The /v1/iterate flow's hard correctness gate is still FillValidator + WorkerLookup. Phantom candidate IDs raise ValidationError::Consistency which 422s and forces the iteration loop to retry. The trace-id propagation is the OBSERVABILITY layer on top — it makes the existing safety net's outcomes visible per-call, not a replacement for it. ## Verification - internal/validator: 4 new tests - TestIterate_TracerEmitsSpanPerAttempt — span/attempt count + SpanID - TestIterate_NoTraceIDSkipsTracer — no orphan spans without trace_id - TestIterate_ChatCallerReceivesTraceID — propagation contract - (existing iterate tests updated for new ChatCaller signature) - internal/shared: 1 new test - TestLangfuseMiddleware_HonorsTraceIDHeader — cross-service linkage - cmd/validatord: existing HTTP tests still PASS via the dual-shape UnmarshalJSON contract. - validatord_smoke.sh: 5/5 PASS through gateway :3110 (unchanged). - Full go test ./... green across 33 packages. ## Architecture invariant added STATE_OF_PLAY "DO NOT RELITIGATE" gains a paragraph documenting the X-Lakehouse-Trace-Id header contract + the iterate.attempt[N] span emission. Future-Claude won't re-propose "wire trace-id propagation" — the header IS the wiring. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-02 05:13:18 -05:00
root	68d9e554b0	shared: auto-emit Langfuse trace+span per HTTP request — closes OPEN #2 Adds langfuseMiddleware in internal/shared so every daemon's shared.Run gets free production-traffic trace visibility when LANGFUSE_URL + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY are set. Same env names + file shape as the multi_coord_stress driver, so operators ship one /etc/lakehouse/langfuse.env across the deploy. Wiring is auth-gated: middleware runs INSIDE the RequireAuth group, so 401s from credential-stuffing don't pollute traces. /health is exempt so LB probes don't either. Missing env vars → nil client → middleware is a passthrough no-op (fail-open per ADR-005 5.1). Bundled deploy: - langfuse.env.example template (mode 0640, root:lakehouse) - 11 systemd units gain `EnvironmentFile=-/etc/lakehouse/langfuse.env` (leading - so missing file = OK) - REPLICATION.md bootstrap section documents setup Tests (4): nil passthrough, /health bypass, real-request emission, status-writer wrapping. All green. STATE_OF_PLAY OPEN list: 5 rows → 4 rows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 19:55:42 -05:00

Author

SHA1

Message

Date

root

d6d2fdf81f

trace-id propagation through /v1/iterate (multi-call observability)

Closes J's 2026-05-02 multi-call observability gap: a single
/v1/iterate session with N retries used to surface in Langfuse as
N+1 disconnected traces (one per /v1/chat hop + one for the iterate
request itself), with no parent/child linkage. Operators couldn't
scroll the retry chain in one trace tree to spot where grounding
failed.

## Wire-level change

- New header constant `shared.TraceIDHeader = "X-Lakehouse-Trace-Id"`
- `langfuseMiddleware` honors the header on inbound requests: if
  set, reuses that trace id instead of minting a new one. Stashes
  the trace id on the request context so handlers can attach
  application-level child spans.
- `validatord.chatCaller` forwards the header to chatd. Every chat
  hop in an iterate session lands as a child of the parent trace.

## Application-level spans

- `validator.IterateConfig` gains `Tracer` (optional callback).
  When wired, each iteration attempt emits one Langfuse span
  via `validator.AttemptSpan`:
    Name: iterate.attempt[N]
    Input: { iteration, model, provider, prompt }
    Output: { verdict, raw, error }
    Level: WARNING when verdict != accepted
- `validatord.iterTracer` is the production hook — bridges
  `validator.Tracer` → `langfuse.Client.Span`.
- `IterateRequest`/`IterateResponse`/`IterateFailure` gain
  `TraceID`; each `IterateAttempt` gains `SpanID`. The /v1/iterate
  caller can pivot from the JSON response straight into the
  Langfuse trace tree.

## What an operator sees post-cutover

  GET /v1/iterate {kind=fill, prompt=...} → Trace TR-1
    ├─ http.request span (from middleware)
    ├─ iterate.attempt[0] span (validator.Iterate emit)
    │     input: prompt+model
    │     output: { verdict: validation_failed, error: ..., raw }
    ├─ chatd /v1/chat call (X-Lakehouse-Trace-Id: TR-1)
    │     ├─ http.request span (chatd middleware)
    │     └─ chatd-internal spans (existing)
    ├─ iterate.attempt[1] span
    └─ ...

All in one Langfuse trace tree, not N+1 separate traces.

## Hallucinated-worker safety net is unchanged

The /v1/iterate flow's hard correctness gate is still
FillValidator + WorkerLookup. Phantom candidate IDs raise
ValidationError::Consistency which 422s and forces the iteration
loop to retry. The trace-id propagation is the OBSERVABILITY layer
on top — it makes the existing safety net's outcomes visible per-call,
not a replacement for it.

## Verification

- internal/validator: 4 new tests
  - TestIterate_TracerEmitsSpanPerAttempt — span/attempt count + SpanID
  - TestIterate_NoTraceIDSkipsTracer — no orphan spans without trace_id
  - TestIterate_ChatCallerReceivesTraceID — propagation contract
  - (existing iterate tests updated for new ChatCaller signature)
- internal/shared: 1 new test
  - TestLangfuseMiddleware_HonorsTraceIDHeader — cross-service linkage
- cmd/validatord: existing HTTP tests still PASS via the dual-shape
  UnmarshalJSON contract.
- validatord_smoke.sh: 5/5 PASS through gateway :3110 (unchanged).
- Full go test ./... green across 33 packages.

## Architecture invariant added

STATE_OF_PLAY "DO NOT RELITIGATE" gains a paragraph documenting
the X-Lakehouse-Trace-Id header contract + the iterate.attempt[N]
span emission. Future-Claude won't re-propose "wire trace-id
propagation" — the header IS the wiring.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-02 05:13:18 -05:00

root

68d9e554b0

shared: auto-emit Langfuse trace+span per HTTP request — closes OPEN #2

Adds langfuseMiddleware in internal/shared so every daemon's
shared.Run gets free production-traffic trace visibility when
LANGFUSE_URL + LANGFUSE_PUBLIC_KEY + LANGFUSE_SECRET_KEY are set.
Same env names + file shape as the multi_coord_stress driver, so
operators ship one /etc/lakehouse/langfuse.env across the deploy.

Wiring is auth-gated: middleware runs INSIDE the RequireAuth group,
so 401s from credential-stuffing don't pollute traces. /health is
exempt so LB probes don't either. Missing env vars → nil client →
middleware is a passthrough no-op (fail-open per ADR-005 5.1).

Bundled deploy:
- langfuse.env.example template (mode 0640, root:lakehouse)
- 11 systemd units gain `EnvironmentFile=-/etc/lakehouse/langfuse.env`
  (leading - so missing file = OK)
- REPLICATION.md bootstrap section documents setup

Tests (4): nil passthrough, /health bypass, real-request emission,
status-writer wrapping. All green.

STATE_OF_PLAY OPEN list: 5 rows → 4 rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 19:55:42 -05:00

2 Commits