2 Commits

Author SHA1 Message Date
root
6b71c8e9b2 Phase 23 — contract terms + staffer identity + competence-weighted retrieval
Matrix-index the "who handled this" dimension so top staffers become
the training signal and juniors inherit their playbooks automatically
via the boost pipeline. Auto-discovered indicators emerge from
comparing trajectories across staffers on similar contracts — that was
always the architectural point; this wires the last piece.

ContractTerms:
- deadline, budget_total_usd, budget_per_hour_max, local_bonus_per_hour,
  local_bonus_radius_mi, fill_requirement ("paramount" | "preferred")
- Attached to ScenarioSpec, propagated into T3 checkpoint + cloud
  rescue prompts so cloud reasons about trade-offs (pivot within bonus
  radius first; respect per-hour cap; split across cities when
  fill_requirement=paramount).

Staffer:
- {id, name, tenure_months, role: senior|mid|junior|trainee}
- On ScenarioSpec; logged at scenario start; attached to KB outcome
- Recomputed StafferStats written to data/_kb/staffers.jsonl after
  every run: total_runs, fill_rate, avg_turns, avg_citations,
  rescue_rate, competence_score.
- Competence formula: 0.45*fill_rate + 0.20*turn_efficiency +
  0.20*citation_density + 0.15*rescue_rate. Normalized to 0..1.

findNeighbors now returns weighted_score = cosine × best_staffer_competence
(floored at 0.3 so high-similarity low-competence neighbors still
surface). pathway_recommender prompt shows the top staffer's identity
so cloud knows WHOSE playbook it's synthesizing from.

Demo infrastructure:
- tests/multi-agent/gen_staffer_demo.ts: 4 personas (Maria senior,
  James mid, Sam junior, Alex trainee) × 3 contracts (Nashville Welder,
  Joliet Warehouse, Indianapolis Assembly). 12 scenarios total.
- scripts/run_staffer_demo.sh: runs the 12 sequentially with
  LH_OVERVIEW_CLOUD=1. Post-run calls kb_staffer_report.py.
- scripts/kb_staffer_report.py: leaderboard + cross-staffer worker
  overlap (names endorsed by ≥2 staffers → auto-discovered high-value
  workers). Top vs bottom differential.

gen_scenarios.ts (Phase 22 generator) also now emits contract terms
on 70% of generated specs — future KB batches populate with realistic
constraint patterns instead of bare role+city+count.

Stress scenario from item A intentionally NOT the production test.
Real staffing has constraints; Nashville contract + staffer demo is
the honest test of whether the architecture produces measurable
differential between coordinator skill levels.

Demo batch launched — 12 runs × ~3min each ≈ 40min unattended. Report
emitted after batch.
2026-04-20 22:16:09 -05:00
root
330cb90f99 Lift k cap, drop ornamental reason field, scenario generator
ITEM 1 — k CAP + REASON FIELD
The hybrid_search default k was hard-coded to 10. For multi-fill events
(5× expansion, 4× emergency) that's pool=10 → propose 5-of-10, half
the candidates become the answer with no room for rejection. Executor
prompt now instructs k to scale with target_count: k = max(count*5, 20),
cap 80. Default helper bumped 10 → 20.

Fill.reason dropped from required to optional. Nothing downstream ever
consumed it — resolveWorkerIds, sealSale, retrospective all use
candidate_id and name. Models loved to write 100-150 char justifications
per fill; on 4+ fills that blew the JSON budget before the structure
closed. Test 1 run result after this change: FIRST EVER 5/5 on the
Riverfront Steel scenario, 13 total turns across 5 events. The event
that failed last run (emergency 4×Loader with truncated reason-field
continuation) now clears in 2 turns.

Progression:
  mistral baseline:                  0/5
  qwen3.5 + continuation + think:false: 4/5
  qwen3.5 + k=20 + no-reason:        5/5 ✓

ITEM 2 — SCENARIO GENERATOR (NOT YET TESTED E2E)
tests/multi-agent/gen_scenarios.ts emits N deterministic ScenarioSpecs
with varied clients (15 companies), cities (20 Midwest cities known
to exist in workers_500k), role mixes (14 industrial staffing roles,
weighted realistic), and event sequences. Each gets a unique sig_hash
so the KB populates with distinct neighbor signatures.

scripts/run_kb_batch.sh runs all generated specs sequentially against
scenario.ts, logs per-scenario outcomes, and reports KB state at the
end. Each run takes ~2-4min; 20-30 scenarios = 1-2hr unattended.

Next: test the generator+batch on a small N (3-5) to verify KB
populates correctly and pathway recommendations start getting neighbor
signal instead of cold-starts. Then item 3 (Rust re-weighting of
hybrid_search by playbook_memory success).
2026-04-20 20:31:34 -05:00