ITEM 1 — k CAP + REASON FIELD The hybrid_search default k was hard-coded to 10. For multi-fill events (5× expansion, 4× emergency) that's pool=10 → propose 5-of-10, half the candidates become the answer with no room for rejection. Executor prompt now instructs k to scale with target_count: k = max(count*5, 20), cap 80. Default helper bumped 10 → 20. Fill.reason dropped from required to optional. Nothing downstream ever consumed it — resolveWorkerIds, sealSale, retrospective all use candidate_id and name. Models loved to write 100-150 char justifications per fill; on 4+ fills that blew the JSON budget before the structure closed. Test 1 run result after this change: FIRST EVER 5/5 on the Riverfront Steel scenario, 13 total turns across 5 events. The event that failed last run (emergency 4×Loader with truncated reason-field continuation) now clears in 2 turns. Progression: mistral baseline: 0/5 qwen3.5 + continuation + think:false: 4/5 qwen3.5 + k=20 + no-reason: 5/5 ✓ ITEM 2 — SCENARIO GENERATOR (NOT YET TESTED E2E) tests/multi-agent/gen_scenarios.ts emits N deterministic ScenarioSpecs with varied clients (15 companies), cities (20 Midwest cities known to exist in workers_500k), role mixes (14 industrial staffing roles, weighted realistic), and event sequences. Each gets a unique sig_hash so the KB populates with distinct neighbor signatures. scripts/run_kb_batch.sh runs all generated specs sequentially against scenario.ts, logs per-scenario outcomes, and reports KB state at the end. Each run takes ~2-4min; 20-30 scenarios = 1-2hr unattended. Next: test the generator+batch on a small N (3-5) to verify KB populates correctly and pathway recommendations start getting neighbor signal instead of cold-starts. Then item 3 (Rust re-weighting of hybrid_search by playbook_memory success).
44 lines
1.5 KiB
Bash
Executable File
44 lines
1.5 KiB
Bash
Executable File
#!/usr/bin/env bash
|
|
# Run all generated scenarios sequentially to populate the KB.
|
|
# Reads tests/multi-agent/scenarios/manifest.json and feeds each file
|
|
# to scenario.ts. Each scenario indexes into data/_kb/ automatically
|
|
# via the end-of-run hook. Exit code: 0 if all scenarios completed
|
|
# (event failures are NOT failures for the batch — we want the KB to
|
|
# record both successes AND failures).
|
|
|
|
set -e
|
|
cd "$(dirname "$0")/.."
|
|
|
|
export OLLAMA_CLOUD_KEY="$(python3 -c "import json; print(json.load(open('/root/llm_team_config.json'))['providers']['ollama_cloud']['api_key'])" 2>/dev/null || echo '')"
|
|
|
|
MANIFEST="tests/multi-agent/scenarios/manifest.json"
|
|
if [ ! -f "$MANIFEST" ]; then
|
|
echo "✗ no manifest at $MANIFEST — run: bun tests/multi-agent/gen_scenarios.ts <N>"
|
|
exit 1
|
|
fi
|
|
|
|
START_TS=$(date -Iseconds)
|
|
LOG_DIR="/tmp/lakehouse_kb_batch_$(date +%s)"
|
|
mkdir -p "$LOG_DIR"
|
|
echo "▶ KB batch start: $START_TS, logs → $LOG_DIR"
|
|
|
|
python3 -c "
|
|
import json
|
|
m = json.load(open('$MANIFEST'))
|
|
for s in m['scenarios']:
|
|
print(s['file'])
|
|
" | while read -r SCEN; do
|
|
SPEC="tests/multi-agent/scenarios/$SCEN"
|
|
BASE=$(basename "$SPEC" .json)
|
|
LOG="$LOG_DIR/${BASE}.log"
|
|
echo " ▶ $SCEN"
|
|
bun tests/multi-agent/scenario.ts "$SPEC" > "$LOG" 2>&1 || true
|
|
OK=$(grep -oP '\d+/\d+ events succeeded' "$LOG" | tail -1 || echo "no-result")
|
|
SIG=$(grep -oP 'KB indexed: sig=\K[a-f0-9]+' "$LOG" | tail -1 || echo "-")
|
|
echo " → $OK; sig=$SIG"
|
|
done
|
|
|
|
echo "▶ KB batch done: $(date -Iseconds)"
|
|
echo "▶ KB state:"
|
|
wc -l data/_kb/*.jsonl 2>/dev/null || true
|