diff --git a/docs/SCRUM_MASTER_SPEC.md b/docs/SCRUM_MASTER_SPEC.md index 06a2f88..50f52c6 100644 --- a/docs/SCRUM_MASTER_SPEC.md +++ b/docs/SCRUM_MASTER_SPEC.md @@ -27,7 +27,7 @@ baseline below. Read the changes top-down to understand current shape: - Paid OpenRouter ladder + Kimi K2.6 + Gemini 2.5 (commit `4ac5656`) - Goal-driven autonomous loop harness (commit `e79e51e`) -### 2026-04-25 → 26 (architectural through-line — read these next) +### 2026-04-25 → 26 morning (mode-runner experiment wave) - **Observer health-probe TypeConfusion fix** (`54689d5`) — `r.json()` on text/plain `/health` was crash-looping the observer; sealed in pathway_memory as `TypeConfusion:fetch-health-json`. - **Adjacency-pollution relevance filter** (`0115a60`) — observer `/relevance` endpoint + scrum wiring (`LH_RELEVANCE_FILTER` / `LH_RELEVANCE_THRESHOLD`). Drops chunks about symbols the focus file IMPORTS but doesn't define. - **Audit-consensus → retire wire** (`626f18d`) — when observer rejects a hot-swap-recommended attempt, immediately call `/vectors/pathway/retire` on the trace. `HotSwapCandidate` gained `trace_uid` for single-trace precision. Confidence ≥0.7 gate avoids retiring on heuristic-fallback verdicts. @@ -35,6 +35,23 @@ baseline below. Read the changes top-down to understand current shape: - **Native enrichment runner** (`86f63a0`) — `codereview_lakehouse` mode that COMPOSES every primitive (focus file + bug fingerprints + relevance-filtered matrix + adversarial framing) into ONE prompt for one-shot success. `POST /v1/mode/execute`. Modes-as-prompt-molders, not model-pickers — see ★ Insight from session 2026-04-26. - **Parameterized runner + 5 experiment modes** (`7c47734`) — `codereview_lakehouse|null|isolation|matrix_only|playbook_only`. Each isolates one architectural axis. `scripts/mode_experiment.ts` sweeps files × modes; `scripts/mode_compare.ts` aggregates with grounding check (catches confabulation by comparing cited symbols to real file content). - **Scrum mode-runner fast path** (`7c47734`) — gated by `LH_USE_MODE_RUNNER=1`, scrum tries `/v1/mode/execute` BEFORE the 9-rung ladder. Falls through to ladder if response < `LH_MODE_MIN_CHARS` or anything errors. Off by default until A/B-validated. +- **Mode-compare grounding column** (`52bb216`) — emoji-tolerant section regex + control-flag tagging. Caught `playbook_only` confabulation that hand-grading also found. + +### 2026-04-26 evening (productization wave) +- **Override knobs + staffing native runner** (`56bf30c`) — pass 2/3/4 harnesses, mode runner now serves `staffing.fill` task class natively, not just code review. +- **Multi-corpus runner + variance harness + strong-model downgrade gate** (`2dbc8db`) — three corpora (arch / findings / symbols) selectable per mode. Paid models auto-downgrade: skip matrix corpus, isolation framing only. Driven by `feedback_composed_corpora_anti_additive.md` (composed corpora LOST 5/5 vs isolation on grok-4.1-fast, p=0.031). +- **OpenAI-compat alias + smart provider routing** (`3a0b37e`) — gateway is now a drop-in middleware for any OpenAI SDK consumer. Three routing flavors verified via `/tmp/archon-test/sdk-test.ts`: `openai/gpt-4o-mini`, bare `gpt-4o-mini`, `x-ai/grok-4.1-fast`. +- **OpenAI multimodal content shape** (`540a9a2`) — accepts `content: [...]` array-of-parts. +- **`/v1/chat` fires observer event** (`d1d97a0`) — every chat call now lands both Langfuse trace AND observer `/event` (was Langfuse only). +- **Archon workflow** (`69919d9`) — `.archon/workflows/lakehouse-architect-review.yaml`. 3 Pi nodes (shape → weakness → improvement) using `openrouter/x-ai/grok-4.1-fast` through the gateway. +- **Observer KB enrichment preamble** (`d9bd4c9`) — observer prepends KB context to escalation prompts (was raw failure cluster). +- **Observer escalation → paid OpenRouter** (`340fca2`) — `deepseek-v3.1-terminus` instead of free-tier rescue. Verified: diagnoses cite architectural patterns (circuit breaker, adapter files) instead of generic timeouts. +- **Gold-standard answer corpus** (`0844206`) — `scripts/build_answers_corpus.ts` indexes `lakehouse_answers_v1` from `scrum_reviews.jsonl + observer_escalations.jsonl`. Doc ID prefixes (`review:` vs `escalation:`) let consumers same-file-gate or broaden. Auto-rebuilds from scrum epilogue (`LH_SCRUM_SKIP_ANSWERS_REBUILD=1` to disable). Observer `buildKbPreamble` now blends three sources (pathway + arch + answers); preamble grew 416 → 727 chars. + +### Verified live state (2026-04-26 ~23:30) +- Pathway memory: **88 traces, 11/11 successful replays = 100%** — hot-swap probation gate crossed; live recommendations firing. +- Strong-model auto-downgrade verified: scrum on grok-4.1-fast → matrix corpus dropped, isolation mode auto-selected, 3 files accepted on attempt 1, ~27s each. +- Auditor verdict on PR #11 head `0844206`: **block** on 8 false positives — `auditor/checks/static.ts:117` "field added but never read" check doesn't follow serde derives. Fix is in the auditor, not the code. ### Verified architectural insights (2026-04-26 experiment) - `codereview_lakehouse` produces 100% grounded findings, beats every challenger.