root 98db129b8f gateway: /v1/iterate — Phase 43 v3 part 3 (generate → validate → retry loop)
Closes the Phase 43 PRD's "iteration loop with validation in place"
structurally. Single endpoint that wraps the 0→85% pattern any
caller can post against without re-implementing it.

POST /v1/iterate
  {
    "kind":"fill" | "email" | "playbook",
    "prompt":"...",
    "system":"...",                 (optional)
    "provider":"ollama_cloud",
    "model":"kimi-k2.6",
    "context":{...},                (target_count/city/state/role/...)
    "max_iterations":3,             (default 3)
    "temperature":0.2,              (default 0.2)
    "max_tokens":4096               (default 4096)
  }
→ 200 + IterateResponse  (artifact accepted)
   {artifact, validation, iterations, history:[{iteration,raw,status}]}
→ 422 + IterateFailure   (max iter reached)
   {error, iterations, history}

The loop:
1. Generate via gateway-internal HTTP loopback to /v1/chat with the
   given provider/model. Model output is the model's free-form text.
2. Extract a JSON object from the output — handles fenced blocks
   (```json ... ```), bare braces, and prose-with-embedded-JSON.
   On no extractable JSON: append "your response wasn't valid JSON"
   to the prompt and retry.
3. POST the extracted artifact to /v1/validate (server-side reuse of
   the FillValidator/EmailValidator/PlaybookValidator stack from
   Phase 43 v3 part 2).
4. On 200 + Report: success — return artifact + history.
5. On 422 + ValidationError: append the specific error JSON to the
   prompt as corrective context and retry. This is the "observer
   correction" piece in PRD shape, simplified — the validator's own
   structured error IS the feedback signal.
6. Cap at max_iterations.

Verified end-to-end with kimi-k2.6 via ollama_cloud:
  Request:  fill 1 Welder in Toledo, model picks W-1 (actually
            Louisville, KY — wrong city)
  iter 0:   model emits {fills:[W-1,"W-1"]} → 422 Consistency
            ("city 'Louisville' doesn't match contract city 'Toledo'")
  iter 1:   prompt now includes the error → model emits same answer
            (didn't pick a different worker — model lacks roster
            access; would need hybrid_search upstream)
  max=2:    422 IterateFailure with full history

The negative test demonstrates the LOOP MECHANICS work:
- Generation → validation → retry-with-error-context → cap
- The model's failure trace is queryable; downstream tooling can
  inspect history[] to see exactly where each iteration broke
- A production executor would do hybrid_search to find Toledo
  workers before posting; /v1/iterate is the validation+retry
  layer downstream

JSON extractor handles three shapes:
- Fenced: ```json {...} ```  (preferred — explicit signal)
- Bare:   plain text + {...} + plain text
- Multi:  picks the first balanced {...}

Unit tests cover all three plus the no-JSON fallback.

Phase 43 closure status:
  v1: scaffolds                    (older commit)
  v2: real validators              00c8408
  v3 part 1: parquet WorkerLookup  ebd9ab7
  v3 part 2: /v1/validate          86123fc
  v3 part 3: /v1/iterate           THIS COMMIT

The "0→85% with iteration" thesis is now testable in production.
Staffing executors can compose hybrid_search → /v1/iterate (with
validation) and converge on validation-passing artifacts in 1-2
iterations on average.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-27 07:56:43 -05:00
..