Matrix-index the "who handled this" dimension so top staffers become
the training signal and juniors inherit their playbooks automatically
via the boost pipeline. Auto-discovered indicators emerge from
comparing trajectories across staffers on similar contracts — that was
always the architectural point; this wires the last piece.
ContractTerms:
- deadline, budget_total_usd, budget_per_hour_max, local_bonus_per_hour,
local_bonus_radius_mi, fill_requirement ("paramount" | "preferred")
- Attached to ScenarioSpec, propagated into T3 checkpoint + cloud
rescue prompts so cloud reasons about trade-offs (pivot within bonus
radius first; respect per-hour cap; split across cities when
fill_requirement=paramount).
Staffer:
- {id, name, tenure_months, role: senior|mid|junior|trainee}
- On ScenarioSpec; logged at scenario start; attached to KB outcome
- Recomputed StafferStats written to data/_kb/staffers.jsonl after
every run: total_runs, fill_rate, avg_turns, avg_citations,
rescue_rate, competence_score.
- Competence formula: 0.45*fill_rate + 0.20*turn_efficiency +
0.20*citation_density + 0.15*rescue_rate. Normalized to 0..1.
findNeighbors now returns weighted_score = cosine × best_staffer_competence
(floored at 0.3 so high-similarity low-competence neighbors still
surface). pathway_recommender prompt shows the top staffer's identity
so cloud knows WHOSE playbook it's synthesizing from.
Demo infrastructure:
- tests/multi-agent/gen_staffer_demo.ts: 4 personas (Maria senior,
James mid, Sam junior, Alex trainee) × 3 contracts (Nashville Welder,
Joliet Warehouse, Indianapolis Assembly). 12 scenarios total.
- scripts/run_staffer_demo.sh: runs the 12 sequentially with
LH_OVERVIEW_CLOUD=1. Post-run calls kb_staffer_report.py.
- scripts/kb_staffer_report.py: leaderboard + cross-staffer worker
overlap (names endorsed by ≥2 staffers → auto-discovered high-value
workers). Top vs bottom differential.
gen_scenarios.ts (Phase 22 generator) also now emits contract terms
on 70% of generated specs — future KB batches populate with realistic
constraint patterns instead of bare role+city+count.
Stress scenario from item A intentionally NOT the production test.
Real staffing has constraints; Nashville contract + staffer demo is
the honest test of whether the architecture produces measurable
differential between coordinator skill levels.
Demo batch launched — 12 runs × ~3min each ≈ 40min unattended. Report
emitted after batch.
47 lines
1.9 KiB
Bash
Executable File
47 lines
1.9 KiB
Bash
Executable File
#!/usr/bin/env bash
|
||
# Run the 12 staffer-demo scenarios (4 personas × 3 contracts) with
|
||
# cloud T3 enabled. After each run, KB indexes the outcome and
|
||
# recomputes that staffer's competence_score. By the end, the top
|
||
# staffer's playbooks will be surfacing first in neighbor retrieval.
|
||
|
||
set -e
|
||
cd "$(dirname "$0")/.."
|
||
|
||
export OLLAMA_CLOUD_KEY="$(python3 -c "import json; print(json.load(open('/root/llm_team_config.json'))['providers']['ollama_cloud']['api_key'])" 2>/dev/null || echo '')"
|
||
|
||
MANIFEST="tests/multi-agent/scenarios/staffer_demo/manifest.json"
|
||
if [ ! -f "$MANIFEST" ]; then
|
||
echo "✗ no manifest — run: bun tests/multi-agent/gen_staffer_demo.ts"
|
||
exit 1
|
||
fi
|
||
|
||
START_TS=$(date -Iseconds)
|
||
LOG_DIR="/tmp/lakehouse_staffer_demo_$(date +%s)"
|
||
mkdir -p "$LOG_DIR"
|
||
echo "▶ Staffer demo start: $START_TS, logs → $LOG_DIR"
|
||
|
||
python3 -c "
|
||
import json
|
||
m = json.load(open('$MANIFEST'))
|
||
for s in m['scenarios']:
|
||
print(s['file'], '|', s['staffer'], '|', s['contract'])
|
||
" | while IFS='|' read -r SCEN STAFFER CONTRACT; do
|
||
SCEN=$(echo "$SCEN" | xargs)
|
||
STAFFER=$(echo "$STAFFER" | xargs)
|
||
CONTRACT=$(echo "$CONTRACT" | xargs)
|
||
SPEC="tests/multi-agent/scenarios/staffer_demo/$SCEN"
|
||
BASE=$(basename "$SPEC" .json)
|
||
LOG="$LOG_DIR/${BASE}.log"
|
||
echo " ▶ $STAFFER × $CONTRACT"
|
||
LH_OVERVIEW_CLOUD=1 bun tests/multi-agent/scenario.ts "$SPEC" > "$LOG" 2>&1 || true
|
||
OK=$(grep -oP '\d+/\d+ events succeeded' "$LOG" | tail -1 || echo "no-result")
|
||
RESCUES=$(grep -c "cloud rescue requested" "$LOG" || true)
|
||
RESCUE_OK=$(grep -c "retry outcome: ✓" "$LOG" || true)
|
||
SIG=$(grep -oP 'KB indexed: sig=\K[a-f0-9]+' "$LOG" | tail -1 || echo "-")
|
||
echo " → $OK; rescues=$RESCUES (${RESCUE_OK} succeeded); sig=$SIG"
|
||
done
|
||
|
||
echo "▶ Staffer demo done: $(date -Iseconds)"
|
||
echo "▶ Staffer competence leaderboard:"
|
||
python3 scripts/kb_staffer_report.py 2>/dev/null || echo "(run scripts/kb_staffer_report.py after batch completes)"
|