Three coordinators (alice / bob / carol) with three contracts
(Milwaukee distribution / Indianapolis manufacturing / Chicago
construction). 7-phase scenario runner: baseline → surge → merge →
handover → split → reissue → analysis. Each coord has a separate
playbook namespace (playbook_{name}) so institutional memory stays
isolated by default but transferable on demand.
Phase 1 deliberately skips the 48-hour clock, email/SMS endpoints,
and Langfuse tracing — those are Phase 2/3.
Run #001 (52 events, 4 queries × 3 coords × 2 demand flavors):
Diversity:
Different-roles-same-contract Jaccard = 0.004 (n=18)
→ role-specific retrieval is working perfectly. Different
roles within one contract pull totally different worker
pools. System is NOT cycling; locks into per-role retrieval.
Same-role-across-contracts Jaccard = N/A (n=0)
→ TEST-DESIGN ISSUE: the 3 contracts use distinct role names
per industry (warehouse worker / production worker / general
laborer), so no exact-name overlaps exist. Phase 2 should
either share at least one role across contracts OR add a
skill-based diversity metric.
Determinism: Jaccard = 1.000 (n=12)
→ HNSW + Ollama retrieval is fully deterministic on identical
query text. coder/hnsw + nomic-embed-text are stable.
Learning: handover hit rate = 4/4 = 100%
→ Bob inherits Alice's recordings perfectly when bob runs
identical queries with alice's playbook namespace. CAVEAT:
this tests the trivial verbatim case, not paraphrase handover.
The harder test (bob runs paraphrased queries with alice's
playbook) is Phase 2 work.
Per-event capture in JSON: every matrix.search response is logged
with phase / coordinator / contract / role / query / top-K IDs +
distances + per-corpus counts + boosted/injected counts. Reviewable
via:
jq '.events[] | select(.phase == "merge")'
jq '.events[] | select(.coordinator == "alice")'
jq '.events[] | select(.role == "warehouse worker")'
Notable finding from per-event: carol's "general laborer" and "crane
operator" queries both surface w-1009 as top-1, with crane operator
at distance 0.098 (very tight) and general laborer at 0.297. The
system found a worker who legitimately covers both roles — realistic
for small construction crews.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>