When a scenario event fails (drift abort or other error) and
LH_RETRY_ON_FAIL is on (default when cloud T3 is enabled), ask cloud
for a concrete pivot — new city, role, or count — then re-run the
event with the remediation's fields. Capped at 1 retry per event so a
genuinely-impossible scenario can't burn budget.
requestCloudRemediation(event, result):
- Feeds the same diagnostic bundle T3 checkpoints get (SQL filters,
row counts, SQL errors, reviewer drift reasons, gap signals).
- Prompt demands structured JSON: {retry, new_city, new_role,
new_count, rationale}.
- Cloud is instructed to pivot to NEAREST alternate city when
zero-supply detected, broaden role when uniquely scarce, reduce
count when clearly unachievable, or return retry=false when no
pivot seems viable.
EventResult additions:
- retry_attempt, retry_remediation (with rationale + cloud_model +
duration), retry_result (full inner result shape), original_event.
- If retry succeeded, it becomes the primary result and original_event
preserves what was attempted first. If retry also failed, the
primary stays the failure and retry is recorded alongside.
Sanitizer on cloud output: model sometimes emits "Hammond, IN" in
new_city with "IN" in a non-existent new_state field, producing
"Hammond, IN, IN" downstream. Split new_city on comma, take first
token as city, extract state if present after the comma. Original
event's state is the fallback.
VERIFIED on stress_01.json with LH_OVERVIEW_CLOUD=1:
Without rescue (item A baseline): 1/5 events ok
With rescue (item B): 3/5 events ok
Gary IN misplacement: drift → cloud proposed South Bend IN → retry
filled 1/1. Rationale stored in retry_remediation for forensics.
Known limits surfaced (future work):
- City-field mangling failed one rescue before the sanitizer landed;
next run will use the fix.
- Cloud picks alternate cities without knowing ground-truth supply.
Flint → Saginaw pivoted but Saginaw also had sparse Welders.
Future: expose a /vectors/supply-estimate endpoint cloud can consult
before proposing a pivot.