docs: SCRUM_MASTER_SPEC timeline — productization wave + verified live state
Some checks failed
lakehouse/auditor 9 blocking issues: todo!() macro call in tests/real-world/scrum_master_pipeline.ts

Splits the existing 04-25/26 section into two waves:
- experiment wave (mode-runner build-out, pre-productization)
- productization wave (OpenAI-compat, Archon, answers corpus,
  staffing native runner, multi-corpus + downgrade gate, observer
  paid escalation, /v1/chat → observer event wiring)

Adds verified-live block at the end with the numbers a fresh session
needs to anchor on: pathway memory 88 traces / 11 successful replays
at 100% (probation gate crossed), strong-model auto-downgrade firing
on grok-4.1-fast, and the auditor blind spot at static.ts:117 (now
fixed in 107a682).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-04-26 20:50:05 -05:00
parent 107a68224d
commit f753e11157

View File

@ -27,7 +27,7 @@ baseline below. Read the changes top-down to understand current shape:
- Paid OpenRouter ladder + Kimi K2.6 + Gemini 2.5 (commit `4ac5656`)
- Goal-driven autonomous loop harness (commit `e79e51e`)
### 2026-04-25 → 26 (architectural through-line — read these next)
### 2026-04-25 → 26 morning (mode-runner experiment wave)
- **Observer health-probe TypeConfusion fix** (`54689d5`) — `r.json()` on text/plain `/health` was crash-looping the observer; sealed in pathway_memory as `TypeConfusion:fetch-health-json`.
- **Adjacency-pollution relevance filter** (`0115a60`) — observer `/relevance` endpoint + scrum wiring (`LH_RELEVANCE_FILTER` / `LH_RELEVANCE_THRESHOLD`). Drops chunks about symbols the focus file IMPORTS but doesn't define.
- **Audit-consensus → retire wire** (`626f18d`) — when observer rejects a hot-swap-recommended attempt, immediately call `/vectors/pathway/retire` on the trace. `HotSwapCandidate` gained `trace_uid` for single-trace precision. Confidence ≥0.7 gate avoids retiring on heuristic-fallback verdicts.
@ -35,6 +35,23 @@ baseline below. Read the changes top-down to understand current shape:
- **Native enrichment runner** (`86f63a0`) — `codereview_lakehouse` mode that COMPOSES every primitive (focus file + bug fingerprints + relevance-filtered matrix + adversarial framing) into ONE prompt for one-shot success. `POST /v1/mode/execute`. Modes-as-prompt-molders, not model-pickers — see ★ Insight from session 2026-04-26.
- **Parameterized runner + 5 experiment modes** (`7c47734`) — `codereview_lakehouse|null|isolation|matrix_only|playbook_only`. Each isolates one architectural axis. `scripts/mode_experiment.ts` sweeps files × modes; `scripts/mode_compare.ts` aggregates with grounding check (catches confabulation by comparing cited symbols to real file content).
- **Scrum mode-runner fast path** (`7c47734`) — gated by `LH_USE_MODE_RUNNER=1`, scrum tries `/v1/mode/execute` BEFORE the 9-rung ladder. Falls through to ladder if response < `LH_MODE_MIN_CHARS` or anything errors. Off by default until A/B-validated.
- **Mode-compare grounding column** (`52bb216`) — emoji-tolerant section regex + control-flag tagging. Caught `playbook_only` confabulation that hand-grading also found.
### 2026-04-26 evening (productization wave)
- **Override knobs + staffing native runner** (`56bf30c`) — pass 2/3/4 harnesses, mode runner now serves `staffing.fill` task class natively, not just code review.
- **Multi-corpus runner + variance harness + strong-model downgrade gate** (`2dbc8db`) — three corpora (arch / findings / symbols) selectable per mode. Paid models auto-downgrade: skip matrix corpus, isolation framing only. Driven by `feedback_composed_corpora_anti_additive.md` (composed corpora LOST 5/5 vs isolation on grok-4.1-fast, p=0.031).
- **OpenAI-compat alias + smart provider routing** (`3a0b37e`) — gateway is now a drop-in middleware for any OpenAI SDK consumer. Three routing flavors verified via `/tmp/archon-test/sdk-test.ts`: `openai/gpt-4o-mini`, bare `gpt-4o-mini`, `x-ai/grok-4.1-fast`.
- **OpenAI multimodal content shape** (`540a9a2`) — accepts `content: [...]` array-of-parts.
- **`/v1/chat` fires observer event** (`d1d97a0`) — every chat call now lands both Langfuse trace AND observer `/event` (was Langfuse only).
- **Archon workflow** (`69919d9`) — `.archon/workflows/lakehouse-architect-review.yaml`. 3 Pi nodes (shape → weakness → improvement) using `openrouter/x-ai/grok-4.1-fast` through the gateway.
- **Observer KB enrichment preamble** (`d9bd4c9`) — observer prepends KB context to escalation prompts (was raw failure cluster).
- **Observer escalation → paid OpenRouter** (`340fca2`) — `deepseek-v3.1-terminus` instead of free-tier rescue. Verified: diagnoses cite architectural patterns (circuit breaker, adapter files) instead of generic timeouts.
- **Gold-standard answer corpus** (`0844206`) — `scripts/build_answers_corpus.ts` indexes `lakehouse_answers_v1` from `scrum_reviews.jsonl + observer_escalations.jsonl`. Doc ID prefixes (`review:` vs `escalation:`) let consumers same-file-gate or broaden. Auto-rebuilds from scrum epilogue (`LH_SCRUM_SKIP_ANSWERS_REBUILD=1` to disable). Observer `buildKbPreamble` now blends three sources (pathway + arch + answers); preamble grew 416 → 727 chars.
### Verified live state (2026-04-26 ~23:30)
- Pathway memory: **88 traces, 11/11 successful replays = 100%** — hot-swap probation gate crossed; live recommendations firing.
- Strong-model auto-downgrade verified: scrum on grok-4.1-fast → matrix corpus dropped, isolation mode auto-selected, 3 files accepted on attempt 1, ~27s each.
- Auditor verdict on PR #11 head `0844206`: **block** on 8 false positives — `auditor/checks/static.ts:117` "field added but never read" check doesn't follow serde derives. Fix is in the auditor, not the code.
### Verified architectural insights (2026-04-26 experiment)
- `codereview_lakehouse` produces 100% grounded findings, beats every challenger.