2 Commits

Author SHA1 Message Date
root
6c93a38093 scrum multi_coord_phase3: 4 fixes from cross-lineage review
Cross-lineage scrum on bundle 87cbd10..f971e64 (3,652 lines)
produced 4 actionable findings, all defensive hardening.

1. (Opus WARN) internal/langfuse/client.go:queue
   Synchronous Flush at maxBatch threshold blocked the calling
   goroutine for the full 5s HTTP timeout when Langfuse hiccupped,
   defeating the "best-effort, never blocks calling path" contract
   in the package doc. Now fire-and-forget via goroutine.

2. (Opus + Kimi convergent) cmd/observerd/main.go:handleInbox
   - Free-form priority string was accepted; "nonsense" passed
     through unchecked. Now closed enum: urgent|high|medium|low (+
     empty defaults to medium). Tested: TestInbox_RejectsBadPriority.
   - No size cap on body, only emptiness check; multi-MB payloads
     would bloat observer's ring + JSONL. Now 8 KiB cap returns 413.
     Tested: TestInbox_RejectsOversizedBody.
   - Subject/sender/tag concatenated into InputSummary without
     newline stripping; embedded \n could corrupt JSONL line-based
     parsers. New sanitizeInboxField strips \r\n + caps at 256 chars
     before interpolation.

3. (Opus INFO) scripts/multi_coord_stress/main.go
   Removed dead `must[T]` generic — tracedSearch took over the
   fail-fast role for matrix searches, so the helper became unused.

4. (Opus INFO) scripts/multi_coord_stress/main.go:Event
   `JudgeRating int` collapsed "judge errored" and "judge said
   unrated" both to 0. Changed to *int — nil = errored, 1-5 =
   verdict. judgeInboxResult still returns 0 on error; caller
   gates on > 0 before assigning.

Dismissed (with rationale):
- Opus WARN ExcludeIDs ordering: verified by code read — filter
  applies after sort + before top-K truncation as documented;
  no slot waste possible.
- Opus INFO 10 prior-run reports contradict #011: those are
  point-in-time snapshots; intentional history.
- Kimi INFO Langfuse error suppression: design intent (best-effort
  per package doc).
- Kimi INFO contract schema validation: defer until contract count
  grows enough to make hand-edit drift a real risk.
- Kimi INFO paraphrase prompt duplicated across lift + multi_coord:
  defer (lift to internal/paraphrase/ when a third consumer appears).
- Qwen HOLD: single-line, no actionable finding.

go test ./cmd/observerd ./internal/langfuse all green; multi_coord
driver builds clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:42:07 -05:00
root
7e6431e4fd langfuse: Go-side client + Phase 1c instrumentation
The Rust side has Langfuse tracing already (gateway/v1/langfuse_trace.rs);
this commit lands Go-side parity so the multi-coord stress harness can
emit traces visible at http://localhost:3001.

internal/langfuse/client.go:
- Minimal Trace + Span + Flush API mirroring what the Rust emitter
  uses. Auth: Basic over public_key:secret_key.
- Best-effort posture: errors are slog.Warn'd, never block calling
  paths. Same fail-open as observerd's persistor (ADR-005 Decision
  5.1) — observability is a witness, not a gate.
- Events buffered until 50, then auto-flushed; explicit Flush() at
  process exit.
- Each Trace/Span returns its id so callers can build hierarchies.

multi_coord_stress driver wiring:
- New --langfuse-env flag (default /etc/lakehouse/langfuse.env).
  Empty / missing / unparseable file → skip tracing with a logged
  warning; run still proceeds.
- Phase 1c (inbox burst) now emits one parent trace + 4 spans per
  inbox event:
    1. observerd.inbox.record  (post to /v1/observer/inbox)
    2. llm.parse_demand        (qwen2.5 → structured fields)
    3. matrix.search           (parsed query → top-K)
    4. llm.judge_top1          (rate top-1 vs original body)
  Each span carries input/output JSON + start/end times so the
  Langfuse UI shows a full waterfall per event.

Run #009 result:
  Trace landed: "multi_coord_stress phase 1c inbox burst"
  Observations attached: 24 (= 6 events × 4 spans)
  Tags: stress, phase-1c, inbox
  Browseable at http://localhost:3001 by tag query.

Other harness metrics: diversity 0.016, determinism 1.000,
verbatim handover 4/4, paraphrase handover 4/4 — all unchanged
by the tracing addition (best-effort post in parallel).

Phase 1c is the proof-of-concept; future commits can wrap other
phases (baseline / merge / handover / split) in traces too. Once
that's done, the entire stress run becomes scrubbable in Langfuse
without grepping the events JSON.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:25:03 -05:00