Proves cloud passthrough works end-to-end AND fixes the diagnostic
quality problem that first run surfaced.
STRESS SCENARIO (tests/multi-agent/scenarios/stress_01.json):
Five genuinely hard events with varied failure modes:
- Gary, IN 5× Electrician: ZERO supply (city not in workers_500k)
- Peoria, IL 8× Safety Coordinator: scarce role, initial pool only 5
- Flint, MI 3× Welder: ZERO supply
- Grand Rapids, MI 4× Tool & Die Maker: scarce but solvable
- Gary, IN 1× Electrician misplacement: repeats event 1's impossibility
FIRST RUN (stress v1) — cloud passthrough works, diagnosis vague:
T3 checkpoint: "Potential drift flags for upcoming role"
Lesson: "Before dispatching, query pool status. Update turn counter..."
Generic tactical advice that doesn't address the real problem.
Root cause: T3 prompt only saw outcome summary, not the raw
SQL/pool/drift signals the executor had in its log.
DIAGNOSTIC FIX:
- Added LogEntry[] `sharedLog` parameter to runAgentFill so the caller
retains the trace even when runAgentFill throws drift-abort.
- EventResult gained `diagnostic_log` field populated on both OK and
FAIL paths.
- extractDiagnostics() pulls SQL filters, hybrid_search row counts,
SQL errors, and reviewer drift notes from the log.
- Checkpoint prompt now includes FAILURE FORENSICS block for failed
events: SQL filters attempted, row counts, errors, drift reasons,
and an explicit teaching note about zero-supply detection.
- Cross-day lesson prompt flags each event with [ZERO-SUPPLY: pivot
city needed] tag when drift reasons mention "no match"/"no
candidates"/"0 rows". PRIORITY clause in the prompt tells the model
its lesson MUST name alternate cities when that tag appears.
SECOND RUN (stress v2 with enriched prompt) — cloud diagnosis sharp:
T3 after Flint: risk="Zero candidate supply for Welder in Flint"
hint="search Welder×3 in Saginaw, MI (≈30 mi) or
expand role to Metal Fabricator"
T3 after Gary: risk="Zero supply for Electrician in Gary, IN"
hint="Pivot to Chicago, IL (≈40 min); broaden to
Electrical Technician within 60 min radius"
Lesson: specific, per-city, with distances, role-broadening
fallback, and pre-loading strategy — actionable for item B retry.
Cloud 120b call latencies consistent: 4.8-8.0s per prompt. Cloud
passthrough proven under stress.
Fill outcomes unchanged (1/5 — correct rejection of three impossible
events + one propagating JSON emission edge case on retry pivot
reasoning). The knowledge to rescue them now exists in the lesson;
item B wires the retry.
Description
Rust-first object storage system
Languages
TypeScript
38.4%
Rust
35.8%
HTML
13.9%
Python
7.8%
Shell
2.1%
Other
2%