profit a7aba31935 tests/real-world: scrum-master pipeline — composes everything we built
The orchestrator J described: pulls git repo source + PRD +
suggested-changes doc, chunks them, hands each code piece through
the proven escalation ladder with learning context, collects
per-file suggestions in a consolidated handoff report.

Composes ONLY already-shipped primitives — no new core code:
  - chunker with 800-char / 120-overlap windows
  - sidecar /embed for real nomic-embed-text embeddings
  - in-memory cosine retrieval for top-5 PRD + top-5 proposal
    chunks per target file
  - escalation ladder (qwen3.5 → qwen3 → gpt-oss:20b → gpt-oss:120b
    → devstral-2:123b → mistral-large-3:675b)
  - per-attempt learning-context injection (prior failures as
    "do not repeat" block)
  - acceptance rubric (length ≥ 200 chars + structured form)

Live-run (tests/real-world/runs/scrum_moatqkee/):
  targets: 3 files
    - crates/vectord/src/playbook_memory.rs  (920 lines)
    - crates/vectord/src/doc_drift.rs        (163 lines)
    - auditor/audit.ts                        (170 lines)
  resolved: 3/3 on attempt 1 by qwen3.5:latest local 7B
  total duration: 111.7s
  output: scrum_report.md + per-file JSON

Sample from scrum_report.md (playbook_memory.rs review):
  - Alignment score: 9/10 vs PRD Phase 19
  - 4 concrete change suggestions naming specific lines + PLAN/PRD
    chunk offsets
  - 3 gap analyses with PRD-reference citations

Honest findings from this run:
1. Local 7B handled review-style tasks first-try. The escalation
   ladder infrastructure is live but didn't fire — review is an
   easier task shape than strict code-generation (see hard_task
   test which needed devstral-2 specialist).
2. 6KB file-truncation caused one false positive: model claimed
   playbook_memory.rs lacks a `doc_refs` field, but that field
   exists past the 6KB cutoff. Trade-off between context-size
   and review-depth needs tuning per file.
3. Chunk-offset citations are real: model output includes
   `[PRD @27880]` and `[PLAN @16320]` which map to the actual
   byte offsets of retrieved context chunks. Auditor pattern could
   adopt this for traceable claims.

This is the scrum-master-handoff shape J asked for:
  repo + PRD + proposal → chunk → retrieve → escalate → consolidate
  → human-reviewable markdown report

Not shipping: per-PR diff analysis, open-PR integration, Gitea
posting of suggestions. Those compose the same primitives
differently — this proves the core pattern.

Env override: LH_SCRUM_FILES=path1,path2,... to target a different
file set. Default 3 files keeps runtime ~2min.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 20:52:42 -05:00
..