Proves cloud passthrough works end-to-end AND fixes the diagnostic
quality problem that first run surfaced.
STRESS SCENARIO (tests/multi-agent/scenarios/stress_01.json):
Five genuinely hard events with varied failure modes:
- Gary, IN 5× Electrician: ZERO supply (city not in workers_500k)
- Peoria, IL 8× Safety Coordinator: scarce role, initial pool only 5
- Flint, MI 3× Welder: ZERO supply
- Grand Rapids, MI 4× Tool & Die Maker: scarce but solvable
- Gary, IN 1× Electrician misplacement: repeats event 1's impossibility
FIRST RUN (stress v1) — cloud passthrough works, diagnosis vague:
T3 checkpoint: "Potential drift flags for upcoming role"
Lesson: "Before dispatching, query pool status. Update turn counter..."
Generic tactical advice that doesn't address the real problem.
Root cause: T3 prompt only saw outcome summary, not the raw
SQL/pool/drift signals the executor had in its log.
DIAGNOSTIC FIX:
- Added LogEntry[] `sharedLog` parameter to runAgentFill so the caller
retains the trace even when runAgentFill throws drift-abort.
- EventResult gained `diagnostic_log` field populated on both OK and
FAIL paths.
- extractDiagnostics() pulls SQL filters, hybrid_search row counts,
SQL errors, and reviewer drift notes from the log.
- Checkpoint prompt now includes FAILURE FORENSICS block for failed
events: SQL filters attempted, row counts, errors, drift reasons,
and an explicit teaching note about zero-supply detection.
- Cross-day lesson prompt flags each event with [ZERO-SUPPLY: pivot
city needed] tag when drift reasons mention "no match"/"no
candidates"/"0 rows". PRIORITY clause in the prompt tells the model
its lesson MUST name alternate cities when that tag appears.
SECOND RUN (stress v2 with enriched prompt) — cloud diagnosis sharp:
T3 after Flint: risk="Zero candidate supply for Welder in Flint"
hint="search Welder×3 in Saginaw, MI (≈30 mi) or
expand role to Metal Fabricator"
T3 after Gary: risk="Zero supply for Electrician in Gary, IN"
hint="Pivot to Chicago, IL (≈40 min); broaden to
Electrical Technician within 60 min radius"
Lesson: specific, per-city, with distances, role-broadening
fallback, and pre-loading strategy — actionable for item B retry.
Cloud 120b call latencies consistent: 4.8-8.0s per prompt. Cloud
passthrough proven under stress.
Fill outcomes unchanged (1/5 — correct rejection of three impossible
events + one propagating JSON emission edge case on retry pivot
reasoning). The knowledge to rescue them now exists in the lesson;
item B wires the retry.
59 lines
2.1 KiB
JSON
59 lines
2.1 KiB
JSON
{
|
||
"client": "Ironclad Industrial",
|
||
"date": "2026-04-22",
|
||
"events": [
|
||
{
|
||
"kind": "baseline_fill",
|
||
"at": "07:00",
|
||
"role": "Electrician",
|
||
"count": 5,
|
||
"city": "Gary",
|
||
"state": "IN",
|
||
"shift_start": "07:00 AM",
|
||
"scenario_note": "Gary IN has ZERO Electricians in the index. Local WILL fail this. Cloud should diagnose no-supply and recommend pivoting to Chicago IL (40min drive) or relaxing to 'Maintenance Tech'."
|
||
},
|
||
{
|
||
"kind": "expansion",
|
||
"at": "09:30",
|
||
"role": "Safety Coordinator",
|
||
"count": 8,
|
||
"city": "Peoria",
|
||
"state": "IL",
|
||
"shift_start": "09:30 AM",
|
||
"scenario_note": "Safety Coordinator is the rarest role overall (~4500 nationally). 8× in a mid-sized city with availability > 0.5 is genuinely tight. Cloud should either confirm or suggest multi-city sourcing."
|
||
},
|
||
{
|
||
"kind": "emergency",
|
||
"at": "11:45",
|
||
"role": "Welder",
|
||
"count": 3,
|
||
"city": "Flint",
|
||
"state": "MI",
|
||
"shift_start": "12:00 PM",
|
||
"deadline": "13:30",
|
||
"scenario_note": "Flint MI has ZERO workers indexed — total data desert. Cloud must flag 'impossible supply' and recommend pivot (Detroit 60mi, Saginaw 40mi)."
|
||
},
|
||
{
|
||
"kind": "expansion",
|
||
"at": "14:00",
|
||
"role": "Tool & Die Maker",
|
||
"count": 4,
|
||
"city": "Grand Rapids",
|
||
"state": "MI",
|
||
"shift_start": "14:00 PM",
|
||
"scenario_note": "Tool & Die Maker is scarce (~9000 total). 4× in Grand Rapids, availability > 0.5 AND reliability > 0.75. Tight but solvable if playbook_memory has history; cloud should prioritize proven performers."
|
||
},
|
||
{
|
||
"kind": "misplacement",
|
||
"at": "15:30",
|
||
"role": "Electrician",
|
||
"count": 1,
|
||
"city": "Gary",
|
||
"state": "IN",
|
||
"shift_start": "15:30 PM",
|
||
"replaces_event": "07:00",
|
||
"scenario_note": "Refilling 1× Electrician in Gary after a no-show. Same data desert as event 1 — cloud should recognize the repeat and recommend the SAME pivot it gave earlier, proving it learns within-run."
|
||
}
|
||
]
|
||
}
|