root c21b261877 Item A — stress scenario + enriched T3 diagnostic prompt
Proves cloud passthrough works end-to-end AND fixes the diagnostic
quality problem that first run surfaced.

STRESS SCENARIO (tests/multi-agent/scenarios/stress_01.json):
Five genuinely hard events with varied failure modes:
- Gary, IN 5× Electrician: ZERO supply (city not in workers_500k)
- Peoria, IL 8× Safety Coordinator: scarce role, initial pool only 5
- Flint, MI 3× Welder: ZERO supply
- Grand Rapids, MI 4× Tool & Die Maker: scarce but solvable
- Gary, IN 1× Electrician misplacement: repeats event 1's impossibility

FIRST RUN (stress v1) — cloud passthrough works, diagnosis vague:
  T3 checkpoint: "Potential drift flags for upcoming role"
  Lesson: "Before dispatching, query pool status. Update turn counter..."
Generic tactical advice that doesn't address the real problem.
Root cause: T3 prompt only saw outcome summary, not the raw
SQL/pool/drift signals the executor had in its log.

DIAGNOSTIC FIX:
- Added LogEntry[] `sharedLog` parameter to runAgentFill so the caller
  retains the trace even when runAgentFill throws drift-abort.
- EventResult gained `diagnostic_log` field populated on both OK and
  FAIL paths.
- extractDiagnostics() pulls SQL filters, hybrid_search row counts,
  SQL errors, and reviewer drift notes from the log.
- Checkpoint prompt now includes FAILURE FORENSICS block for failed
  events: SQL filters attempted, row counts, errors, drift reasons,
  and an explicit teaching note about zero-supply detection.
- Cross-day lesson prompt flags each event with [ZERO-SUPPLY: pivot
  city needed] tag when drift reasons mention "no match"/"no
  candidates"/"0 rows". PRIORITY clause in the prompt tells the model
  its lesson MUST name alternate cities when that tag appears.

SECOND RUN (stress v2 with enriched prompt) — cloud diagnosis sharp:
  T3 after Flint: risk="Zero candidate supply for Welder in Flint"
                  hint="search Welder×3 in Saginaw, MI (≈30 mi) or
                        expand role to Metal Fabricator"
  T3 after Gary:  risk="Zero supply for Electrician in Gary, IN"
                  hint="Pivot to Chicago, IL (≈40 min); broaden to
                        Electrical Technician within 60 min radius"
  Lesson: specific, per-city, with distances, role-broadening
  fallback, and pre-loading strategy — actionable for item B retry.

Cloud 120b call latencies consistent: 4.8-8.0s per prompt. Cloud
passthrough proven under stress.

Fill outcomes unchanged (1/5 — correct rejection of three impossible
events + one propagating JSON emission edge case on retry pivot
reasoning). The knowledge to rescue them now exists in the lesson;
item B wires the retry.
2026-04-20 21:54:29 -05:00

59 lines
2.1 KiB
JSON
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"client": "Ironclad Industrial",
"date": "2026-04-22",
"events": [
{
"kind": "baseline_fill",
"at": "07:00",
"role": "Electrician",
"count": 5,
"city": "Gary",
"state": "IN",
"shift_start": "07:00 AM",
"scenario_note": "Gary IN has ZERO Electricians in the index. Local WILL fail this. Cloud should diagnose no-supply and recommend pivoting to Chicago IL (40min drive) or relaxing to 'Maintenance Tech'."
},
{
"kind": "expansion",
"at": "09:30",
"role": "Safety Coordinator",
"count": 8,
"city": "Peoria",
"state": "IL",
"shift_start": "09:30 AM",
"scenario_note": "Safety Coordinator is the rarest role overall (~4500 nationally). 8× in a mid-sized city with availability > 0.5 is genuinely tight. Cloud should either confirm or suggest multi-city sourcing."
},
{
"kind": "emergency",
"at": "11:45",
"role": "Welder",
"count": 3,
"city": "Flint",
"state": "MI",
"shift_start": "12:00 PM",
"deadline": "13:30",
"scenario_note": "Flint MI has ZERO workers indexed — total data desert. Cloud must flag 'impossible supply' and recommend pivot (Detroit 60mi, Saginaw 40mi)."
},
{
"kind": "expansion",
"at": "14:00",
"role": "Tool & Die Maker",
"count": 4,
"city": "Grand Rapids",
"state": "MI",
"shift_start": "14:00 PM",
"scenario_note": "Tool & Die Maker is scarce (~9000 total). 4× in Grand Rapids, availability > 0.5 AND reliability > 0.75. Tight but solvable if playbook_memory has history; cloud should prioritize proven performers."
},
{
"kind": "misplacement",
"at": "15:30",
"role": "Electrician",
"count": 1,
"city": "Gary",
"state": "IN",
"shift_start": "15:30 PM",
"replaces_event": "07:00",
"scenario_note": "Refilling 1× Electrician in Gary after a no-show. Same data desert as event 1 — cloud should recognize the repeat and recommend the SAME pivot it gave earlier, proving it learns within-run."
}
]
}