lakehouse

Author SHA1 Message Date

Author	SHA1	Message	Date
root	6b71c8e9b2	Phase 23 — contract terms + staffer identity + competence-weighted retrieval Matrix-index the "who handled this" dimension so top staffers become the training signal and juniors inherit their playbooks automatically via the boost pipeline. Auto-discovered indicators emerge from comparing trajectories across staffers on similar contracts — that was always the architectural point; this wires the last piece. ContractTerms: - deadline, budget_total_usd, budget_per_hour_max, local_bonus_per_hour, local_bonus_radius_mi, fill_requirement ("paramount" \| "preferred") - Attached to ScenarioSpec, propagated into T3 checkpoint + cloud rescue prompts so cloud reasons about trade-offs (pivot within bonus radius first; respect per-hour cap; split across cities when fill_requirement=paramount). Staffer: - {id, name, tenure_months, role: senior\|mid\|junior\|trainee} - On ScenarioSpec; logged at scenario start; attached to KB outcome - Recomputed StafferStats written to data/_kb/staffers.jsonl after every run: total_runs, fill_rate, avg_turns, avg_citations, rescue_rate, competence_score. - Competence formula: 0.45fill_rate + 0.20turn_efficiency + 0.20citation_density + 0.15rescue_rate. Normalized to 0..1. findNeighbors now returns weighted_score = cosine × best_staffer_competence (floored at 0.3 so high-similarity low-competence neighbors still surface). pathway_recommender prompt shows the top staffer's identity so cloud knows WHOSE playbook it's synthesizing from. Demo infrastructure: - tests/multi-agent/gen_staffer_demo.ts: 4 personas (Maria senior, James mid, Sam junior, Alex trainee) × 3 contracts (Nashville Welder, Joliet Warehouse, Indianapolis Assembly). 12 scenarios total. - scripts/run_staffer_demo.sh: runs the 12 sequentially with LH_OVERVIEW_CLOUD=1. Post-run calls kb_staffer_report.py. - scripts/kb_staffer_report.py: leaderboard + cross-staffer worker overlap (names endorsed by ≥2 staffers → auto-discovered high-value workers). Top vs bottom differential. gen_scenarios.ts (Phase 22 generator) also now emits contract terms on 70% of generated specs — future KB batches populate with realistic constraint patterns instead of bare role+city+count. Stress scenario from item A intentionally NOT the production test. Real staffing has constraints; Nashville contract + staffer demo is the honest test of whether the architecture produces measurable differential between coordinator skill levels. Demo batch launched — 12 runs × ~3min each ≈ 40min unattended. Report emitted after batch.	2026-04-20 22:16:09 -05:00
root	330cb90f99	Lift k cap, drop ornamental `reason` field, scenario generator ITEM 1 — k CAP + REASON FIELD The hybrid_search default k was hard-coded to 10. For multi-fill events (5× expansion, 4× emergency) that's pool=10 → propose 5-of-10, half the candidates become the answer with no room for rejection. Executor prompt now instructs k to scale with target_count: k = max(count*5, 20), cap 80. Default helper bumped 10 → 20. Fill.reason dropped from required to optional. Nothing downstream ever consumed it — resolveWorkerIds, sealSale, retrospective all use candidate_id and name. Models loved to write 100-150 char justifications per fill; on 4+ fills that blew the JSON budget before the structure closed. Test 1 run result after this change: FIRST EVER 5/5 on the Riverfront Steel scenario, 13 total turns across 5 events. The event that failed last run (emergency 4×Loader with truncated reason-field continuation) now clears in 2 turns. Progression: mistral baseline: 0/5 qwen3.5 + continuation + think:false: 4/5 qwen3.5 + k=20 + no-reason: 5/5 ✓ ITEM 2 — SCENARIO GENERATOR (NOT YET TESTED E2E) tests/multi-agent/gen_scenarios.ts emits N deterministic ScenarioSpecs with varied clients (15 companies), cities (20 Midwest cities known to exist in workers_500k), role mixes (14 industrial staffing roles, weighted realistic), and event sequences. Each gets a unique sig_hash so the KB populates with distinct neighbor signatures. scripts/run_kb_batch.sh runs all generated specs sequentially against scenario.ts, logs per-scenario outcomes, and reports KB state at the end. Each run takes ~2-4min; 20-30 scenarios = 1-2hr unattended. Next: test the generator+batch on a small N (3-5) to verify KB populates correctly and pathway recommendations start getting neighbor signal instead of cold-starts. Then item 3 (Rust re-weighting of hybrid_search by playbook_memory success).	2026-04-20 20:31:34 -05:00

root

6b71c8e9b2

Phase 23 — contract terms + staffer identity + competence-weighted retrieval

Matrix-index the "who handled this" dimension so top staffers become
the training signal and juniors inherit their playbooks automatically
via the boost pipeline. Auto-discovered indicators emerge from
comparing trajectories across staffers on similar contracts — that was
always the architectural point; this wires the last piece.

ContractTerms:
- deadline, budget_total_usd, budget_per_hour_max, local_bonus_per_hour,
  local_bonus_radius_mi, fill_requirement ("paramount" | "preferred")
- Attached to ScenarioSpec, propagated into T3 checkpoint + cloud
  rescue prompts so cloud reasons about trade-offs (pivot within bonus
  radius first; respect per-hour cap; split across cities when
  fill_requirement=paramount).

Staffer:
- {id, name, tenure_months, role: senior|mid|junior|trainee}
- On ScenarioSpec; logged at scenario start; attached to KB outcome
- Recomputed StafferStats written to data/_kb/staffers.jsonl after
  every run: total_runs, fill_rate, avg_turns, avg_citations,
  rescue_rate, competence_score.
- Competence formula: 0.45*fill_rate + 0.20*turn_efficiency +
  0.20*citation_density + 0.15*rescue_rate. Normalized to 0..1.

findNeighbors now returns weighted_score = cosine × best_staffer_competence
(floored at 0.3 so high-similarity low-competence neighbors still
surface). pathway_recommender prompt shows the top staffer's identity
so cloud knows WHOSE playbook it's synthesizing from.

Demo infrastructure:
- tests/multi-agent/gen_staffer_demo.ts: 4 personas (Maria senior,
  James mid, Sam junior, Alex trainee) × 3 contracts (Nashville Welder,
  Joliet Warehouse, Indianapolis Assembly). 12 scenarios total.
- scripts/run_staffer_demo.sh: runs the 12 sequentially with
  LH_OVERVIEW_CLOUD=1. Post-run calls kb_staffer_report.py.
- scripts/kb_staffer_report.py: leaderboard + cross-staffer worker
  overlap (names endorsed by ≥2 staffers → auto-discovered high-value
  workers). Top vs bottom differential.

gen_scenarios.ts (Phase 22 generator) also now emits contract terms
on 70% of generated specs — future KB batches populate with realistic
constraint patterns instead of bare role+city+count.

Stress scenario from item A intentionally NOT the production test.
Real staffing has constraints; Nashville contract + staffer demo is
the honest test of whether the architecture produces measurable
differential between coordinator skill levels.

Demo batch launched — 12 runs × ~3min each ≈ 40min unattended. Report
emitted after batch.

2026-04-20 22:16:09 -05:00

root

330cb90f99

Lift k cap, drop ornamental reason field, scenario generator

ITEM 1 — k CAP + REASON FIELD
The hybrid_search default k was hard-coded to 10. For multi-fill events
(5× expansion, 4× emergency) that's pool=10 → propose 5-of-10, half
the candidates become the answer with no room for rejection. Executor
prompt now instructs k to scale with target_count: k = max(count*5, 20),
cap 80. Default helper bumped 10 → 20.

Fill.reason dropped from required to optional. Nothing downstream ever
consumed it — resolveWorkerIds, sealSale, retrospective all use
candidate_id and name. Models loved to write 100-150 char justifications
per fill; on 4+ fills that blew the JSON budget before the structure
closed. Test 1 run result after this change: FIRST EVER 5/5 on the
Riverfront Steel scenario, 13 total turns across 5 events. The event
that failed last run (emergency 4×Loader with truncated reason-field
continuation) now clears in 2 turns.

Progression:
  mistral baseline:                  0/5
  qwen3.5 + continuation + think:false: 4/5
  qwen3.5 + k=20 + no-reason:        5/5 ✓

ITEM 2 — SCENARIO GENERATOR (NOT YET TESTED E2E)
tests/multi-agent/gen_scenarios.ts emits N deterministic ScenarioSpecs
with varied clients (15 companies), cities (20 Midwest cities known
to exist in workers_500k), role mixes (14 industrial staffing roles,
weighted realistic), and event sequences. Each gets a unique sig_hash
so the KB populates with distinct neighbor signatures.

scripts/run_kb_batch.sh runs all generated specs sequentially against
scenario.ts, logs per-scenario outcomes, and reports KB state at the
end. Each run takes ~2-4min; 20-30 scenarios = 1-2hr unattended.

Next: test the generator+batch on a small N (3-5) to verify KB
populates correctly and pathway recommendations start getting neighbor
signal instead of cold-starts. Then item 3 (Rust re-weighting of
hybrid_search by playbook_memory success).

2026-04-20 20:31:34 -05:00

2 Commits