# Multi-Coordinator Stress Test — Run 001 **Generated:** 2026-04-30T12:54:09.621556469Z **Coordinators:** alice / bob / carol (each with own playbook namespace: `playbook_alice` / `playbook_bob` / `playbook_carol`) **Contracts:** alpha_milwaukee_distribution / beta_indianapolis_manufacturing / gamma_chicago_construction **Corpora:** `workers,ethereal_workers` **K per query:** 8 **Total events captured:** 52 **Evidence:** `reports/reality-tests/multi_coord_stress_001.json` --- ## Diversity — is the system locking into scenarios or cycling? | Metric | Mean Jaccard | n pairs | Interpretation | |---|---:|---:|---| | Same role across different contracts | 0 | 0 | Lower = more diverse (different region/cert mix → different workers) | | Different roles within same contract | 0.003703703703703704 | 18 | Should be near-zero (different roles = different worker pools) | **Healthy ranges:** - Same role across contracts: < 0.30 means the system is genuinely picking different workers per region/contract. - Different roles same contract: < 0.10 means role-specific retrieval is working. - If either is > 0.50, the system is "cycling" the same handful of workers regardless of query intent. --- ## Determinism — same query reissued, top-K stability | Metric | Value | |---|---:| | Mean Jaccard on retrieval-only reissue | 1 | | Number of reissue pairs | 12 | **Interpretation:** - ≥ 0.95: HNSW retrieval is highly deterministic; reissues land on near-identical top-K. Good — system locks into a stable view of "best workers for this query." - 0.80 – 0.95: Some HNSW or embed variance, acceptable. - < 0.80: Retrieval is unstable — reissues see substantially different results, suggesting either embed nondeterminism (Ollama returning slightly different vectors) or vectord nondeterminism (HNSW insertion order affecting recall). --- ## Learning — handover hit rate Bob takes Alice's contract using Alice's playbook namespace. Did Alice's recorded answers surface in Bob's results? | Metric | Value | |---|---:| | Handover queries run | 4 | | Alice's recorded answer at Bob's top-1 | 4 | | Alice's recorded answer in Bob's top-K | 4 | | **Handover hit rate (top-1)** | **1** | **Interpretation:** - Hit rate ≥ 0.5: handover is meaningful — the second coordinator inherits the first's institutional memory. - Hit rate ≈ 0.0: playbook namespace isolation is working but the playbook itself isn't transferable, OR Bob's queries don't match Alice's recordings closely enough. --- ## Per-event capture All matrix.search responses live in the JSON — top-K with worker IDs, distances, and per-corpus counts. Search by phase: ```bash jq '.events[] | select(.phase == "merge")' reports/reality-tests/multi_coord_stress_001.json jq '.events[] | select(.coordinator == "alice" and .phase == "baseline")' reports/reality-tests/multi_coord_stress_001.json jq '.events[] | select(.role == "warehouse worker") | {phase, contract, top_k_ids: [.top_k[].id]}' reports/reality-tests/multi_coord_stress_001.json ``` --- ## What's NOT in this run (Phase 1 deliberately defers) - **48-hour clock.** Events fire as discrete steps, not on a timeline. - **Email / SMS ingest.** No endpoints exist on the Go side yet. - **New-resume injection mid-run.** The corpus is fixed at the start. - **Langfuse traces.** Need Go-side wiring. These are Phase 2/3. The Phase 1 substrate is what the time-based runner will mount on top of.