Closes the gap J flagged: observer wraps MCP:3700, scenarios hit
gateway:3100 directly, observer idle at 0 ops across 3600+ cycles.
Now scenarios POST per-event outcomes to observer's new HTTP ingest
on :3800, observer consumes them alongside MCP-wrapped ops, ERROR_
ANALYZER and PLAYBOOK_BUILDER loops see the full picture.
observer.ts:
- Bun.serve() HTTP listener on OBSERVER_PORT (default 3800):
GET /health — basic + ring depth
GET /stats — total / success / failure / by_source / recent
scenario ops digest
POST /event — accept scenario outcome, shape it into ObservedOp
with source="scenario" + staffer_id + sig_hash +
event_kind + role/city/state + rescue flags
- recordExternalOp() — shared ring-buffer insert so the main analyzer
+ playbook builder don't care where the op came from
- ObservedOp extended with provenance fields
persistOp() FIX — old path POSTed to /ingest/file?name=observed_operations
which REPLACES the dataset (flagged in feedback_ingest_replace_semantics.md).
Every op was silently wiping all prior ops. Replaced with append to
data/_observer/ops.jsonl so the historical trace is durable across
analyzer cycles and process restarts.
scenario.ts:
- OBSERVER_URL env (default http://localhost:3800)
- postObserverEvent() helper with 2s AbortSignal.timeout so observer
being down doesn't block scenario flow
- Per-event POST after ctx.results.push(result), carrying staffer_id,
sig_hash (via imported computeSignature), event_kind + role + city
+ state + count + rescue_attempted / rescue_succeeded + truncated
output_summary
VERIFIED:
curl POST /event → {"accepted":true,"ring_size":1}
curl GET /stats → {"total":1,"successes":1,"by_source":{"scenario":1},
"recent_scenario_ops":[{...staffer_id,kind,role}]}
Final v3 demo leaderboard (9 runs per staffer, cumulative 3 batches):
James (local): 92.9% fill, 36.8 cites, score 0.775 — RANK 1
Maria (full): 81.0% fill, 26.2 cites, score 0.727
Sam (basic): 61.9% fill, 28.2 cites, score 0.640
Alex (minimal): 59.5% fill, 32.2 cites, score 0.631
Honest finding: Alex has MORE citations than Sam despite NO T3 and NO
rescue. Playbook inheritance alone is firing hardest when overseer is
absent. The 59.5% fill rate (up from 0% when qwen2.5 was executor)
proves cloud-exec + playbook inheritance is the floor the architecture
delivers.
Local gpt-oss:20b T3 outperforms cloud gpt-oss:120b T3 by 12pt fill
rate on this workload — cloud overseer paying latency+variance for
no measurable gain, worth flagging in next models.json tune.