Cross-lineage scrum on bundle 87cbd10..f971e64 (3,652 lines)
produced 4 actionable findings, all defensive hardening.
1. (Opus WARN) internal/langfuse/client.go:queue
Synchronous Flush at maxBatch threshold blocked the calling
goroutine for the full 5s HTTP timeout when Langfuse hiccupped,
defeating the "best-effort, never blocks calling path" contract
in the package doc. Now fire-and-forget via goroutine.
2. (Opus + Kimi convergent) cmd/observerd/main.go:handleInbox
- Free-form priority string was accepted; "nonsense" passed
through unchecked. Now closed enum: urgent|high|medium|low (+
empty defaults to medium). Tested: TestInbox_RejectsBadPriority.
- No size cap on body, only emptiness check; multi-MB payloads
would bloat observer's ring + JSONL. Now 8 KiB cap returns 413.
Tested: TestInbox_RejectsOversizedBody.
- Subject/sender/tag concatenated into InputSummary without
newline stripping; embedded \n could corrupt JSONL line-based
parsers. New sanitizeInboxField strips \r\n + caps at 256 chars
before interpolation.
3. (Opus INFO) scripts/multi_coord_stress/main.go
Removed dead `must[T]` generic — tracedSearch took over the
fail-fast role for matrix searches, so the helper became unused.
4. (Opus INFO) scripts/multi_coord_stress/main.go:Event
`JudgeRating int` collapsed "judge errored" and "judge said
unrated" both to 0. Changed to *int — nil = errored, 1-5 =
verdict. judgeInboxResult still returns 0 on error; caller
gates on > 0 before assigning.
Dismissed (with rationale):
- Opus WARN ExcludeIDs ordering: verified by code read — filter
applies after sort + before top-K truncation as documented;
no slot waste possible.
- Opus INFO 10 prior-run reports contradict #011: those are
point-in-time snapshots; intentional history.
- Kimi INFO Langfuse error suppression: design intent (best-effort
per package doc).
- Kimi INFO contract schema validation: defer until contract count
grows enough to make hand-edit drift a real risk.
- Kimi INFO paraphrase prompt duplicated across lift + multi_coord:
defer (lift to internal/paraphrase/ when a third consumer appears).
- Qwen HOLD: single-line, no actionable finding.
go test ./cmd/observerd ./internal/langfuse all green; multi_coord
driver builds clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Rust side has Langfuse tracing already (gateway/v1/langfuse_trace.rs);
this commit lands Go-side parity so the multi-coord stress harness can
emit traces visible at http://localhost:3001.
internal/langfuse/client.go:
- Minimal Trace + Span + Flush API mirroring what the Rust emitter
uses. Auth: Basic over public_key:secret_key.
- Best-effort posture: errors are slog.Warn'd, never block calling
paths. Same fail-open as observerd's persistor (ADR-005 Decision
5.1) — observability is a witness, not a gate.
- Events buffered until 50, then auto-flushed; explicit Flush() at
process exit.
- Each Trace/Span returns its id so callers can build hierarchies.
multi_coord_stress driver wiring:
- New --langfuse-env flag (default /etc/lakehouse/langfuse.env).
Empty / missing / unparseable file → skip tracing with a logged
warning; run still proceeds.
- Phase 1c (inbox burst) now emits one parent trace + 4 spans per
inbox event:
1. observerd.inbox.record (post to /v1/observer/inbox)
2. llm.parse_demand (qwen2.5 → structured fields)
3. matrix.search (parsed query → top-K)
4. llm.judge_top1 (rate top-1 vs original body)
Each span carries input/output JSON + start/end times so the
Langfuse UI shows a full waterfall per event.
Run #009 result:
Trace landed: "multi_coord_stress phase 1c inbox burst"
Observations attached: 24 (= 6 events × 4 spans)
Tags: stress, phase-1c, inbox
Browseable at http://localhost:3001 by tag query.
Other harness metrics: diversity 0.016, determinism 1.000,
verbatim handover 4/4, paraphrase handover 4/4 — all unchanged
by the tracing addition (best-effort post in parallel).
Phase 1c is the proof-of-concept; future commits can wrap other
phases (baseline / merge / handover / split) in traces too. Once
that's done, the entire stress run becomes scrubbable in Langfuse
without grepping the events JSON.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>