From fd92a9a0d0437d9383a1cfe2b9ee071b7c331608 Mon Sep 17 00:00:00 2001 From: root Date: Fri, 24 Apr 2026 06:15:53 -0500 Subject: [PATCH] =?UTF-8?q?docs:=20SCRUM=5FMASTER=5FSPEC.md=20=E2=80=94=20?= =?UTF-8?q?single=20handoff=20artifact=20for=20the=20scrum=20loop?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fresh-session artifact so work is recoverable if the branch is reopened in a new Claude Code session without context. Covers: - 9-rung ladder (kimi-k2:1t through local qwen3.5:latest) - tree-split reducer (files >6KB sharded + map→reduce) - schema_v4 KB rows in data/_kb/scrum_reviews.jsonl - auto-applier 5 hardened gates (confidence, size, cargo-green, warning-count, rationale-diff) - pathway_memory (ADR-021) — narrow fingerprint + hot-swap gate + semantic-correctness layer (SemanticFlag, BugFingerprint) - HTTP surface on gateway (/vectors/pathway/*) - current state (12 traces, 11 fingerprints, 0 hot-swaps — probation) - commit history on scrum/auto-apply-19814 since iter-5 baseline - how-to-run (env vars, service restarts) - where things live (code pointers table) - known gotchas (LLM Team mode registry, restart requirements) Paired updates (not in this commit, live outside the repo): - /home/profit/CLAUDE.md — active workstream pointer + notes - /root/.claude/skills/read-mem/SKILL.md — SCRUM_MASTER_SPEC.md added to the loading list + ADR-021 glossary - memory/project_scrum_pipeline.md — updated with iter-9 state - memory/feedback_semantic_correctness_via_matrix.md — updated with end-to-end proof evidence Co-Authored-By: Claude Opus 4.7 (1M context) --- docs/SCRUM_MASTER_SPEC.md | 275 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 275 insertions(+) create mode 100644 docs/SCRUM_MASTER_SPEC.md diff --git a/docs/SCRUM_MASTER_SPEC.md b/docs/SCRUM_MASTER_SPEC.md new file mode 100644 index 0000000..021fe27 --- /dev/null +++ b/docs/SCRUM_MASTER_SPEC.md @@ -0,0 +1,275 @@ +# Scrum Master Pipeline — Spec + Current State + +**Status:** Active iteration on branch `scrum/auto-apply-19814` → PR #11 at git.agentview.dev/profit/lakehouse +**Last iter:** 9 (2026-04-24) +**Branch commit head:** `f4cff66` (ADR-021 Phase D fix) + +This doc is the single handoff artifact for the scrum-master + auto-apply + pathway-memory loop built during 2026-04-24 sessions. A fresh Claude Code session reading this + `docs/DECISIONS.md` (ADR-020 and ADR-021 specifically) + `MEMORY.md` should have the same context as the session that wrote it. + +## 1. What the loop is + +An autonomous review-and-commit pipeline that: + +1. **Scrum master** (`tests/real-world/scrum_master_pipeline.ts`) — walks a target-file list, asks a 9-rung escalation ladder of cloud models to produce a forensic audit against PRD + a change proposal doc, retries with learning context until acceptance, emits a structured review row. +2. **Pathway memory** (`crates/vectord/src/pathway_memory.rs`) — stores the full backtrack context of each review (attempts, KB chunks, flags, bug fingerprints) indexed by a narrow fingerprint (`task_class + file_prefix + signal_class`). On every new review, it prepends historical bug patterns as a preamble so the reviewer preempts recurrences. Retired pathways auto-exclude themselves from hot-swap eligibility. +3. **Auto-applier** (`tests/real-world/scrum_applier.ts`) — filters schema_v4 review rows by gradient_tier + confidence, asks `qwen3-coder:480b` for concrete `old_string/new_string` patches, runs `cargo check --workspace`, commits on green OR reverts on red/warning-count-up/rationale-mismatch. +4. **Observer** (`mcp-server/observer.ts`) — receives per-file `/event` emissions, escalates failure clusters to LLM Team via `/v1/chat` with `qwen3-coder:480b`. +5. **Auditor** (`auditor/audit.ts`) — external N=3 consensus re-check of scrum findings; writes to `data/_kb/audit_facts.jsonl`. + +The guiding principle: **every KB write has a reader, every PR claim is diff-verifiable.** + +## 2. The 9-rung ladder (cloud-first, strongest-model-first) + +Defined in `tests/real-world/scrum_master_pipeline.ts` at `const LADDER`: + +| # | Provider | Model | Role | +|---|---|---|---| +| 1 | ollama_cloud | `kimi-k2:1t` | flagship, 1T params | +| 2 | ollama_cloud | `qwen3-coder:480b` | coding specialist, 480B | +| 3 | ollama_cloud | `deepseek-v3.1:671b` | reasoning, 671B | +| 4 | ollama_cloud | `mistral-large-3:675b` | deep analysis, 675B | +| 5 | ollama_cloud | `gpt-oss:120b` | reliable workhorse | +| 6 | ollama_cloud | `qwen3.5:397b` | dense 397B, final thinker | +| 7 | openrouter | `openai/gpt-oss-120b:free` | free-tier rescue | +| 8 | openrouter | `google/gemma-3-27b-it:free` | fastest rescue | +| 9 | ollama | `qwen3.5:latest` | last-resort local | + +**Each attempt is evaluated by `isAcceptable()`** (chars ≥ 3800 AND not a malformed JSON-only dump). On reject, the next rung sees a learning preamble with the prior rejection reason. + +## 3. Tree-split reducer + +Files larger than `FILE_TREE_SPLIT_THRESHOLD = 6000` bytes get chunked into `FILE_SHARD_SIZE = 3500`-byte shards. Each shard gets summarized via a fast rung, summaries are concatenated with internal `§N§` markers, then fed as a SCRATCHPAD to the reviewer. The `§N§` markers are stripped before the reviewer sees the merged context so it cannot claim "(shard 3)" in titles. + +Bug regime this fixed: pre-tree-split iters had reviewers claim fields were "missing" because the field was past the 6KB context cutoff, not actually absent. + +## 4. Schema v4 KB rows + +`data/_kb/scrum_reviews.jsonl` — one row per accepted review. Fields: + +```json +{ + "file": "crates/queryd/src/service.rs", + "reviewed_at": "2026-04-24T11:06:56Z", + "accepted_model": "ollama_cloud/kimi-k2:1t", + "accepted_on_attempt": 1, + "attempts_made": 1, + "tree_split_fired": true, + "suggestions_preview": "", + "confidences_per_finding": [92, 90, 88, 85, 75], + "confidence_avg": 86, + "confidence_min": 75, + "findings_count": 5, + "gradient_tier": "dry_run", // auto ≥90 / dry_run ≥70 / simulation ≥50 / block <50 + "gradient_tier_avg": "dry_run", + "alignment_score": 3, // 1-10 self-rated + "output_format": "forensic_json", + "verdict": "fail", // pass | needs_patch | fail + "critical_failures_count": 3, + "pseudocode_flags_count": 0, + "prd_mismatches_count": 4, + "missing_components_count": 6, + "verified_components_count": 2, + "risk_points_count": 3, + "schema_version": 4, + "scrum_master_reviewed": true, + // ADR-021 fields on pathway trace (NOT this row, see pathway_memory state.json) + "pathway_hot_swap_hit": false, + "pathway_id": null, + "pathway_similarity": null, + "pathway_success_rate": null, + "rungs_saved": 0 +} +``` + +## 5. Applier hardened gates (landed 5e8d87b) + +`tests/real-world/scrum_applier.ts` has **5 gates** between emitter output and commit. A patch must pass ALL: + +1. **Confidence gate** — emitter's self-reported `confidence >= MIN_CONF` (default 90; 85 with relaxed env). Rejected patches log reason `confidence NN < MM`. +2. **Size gate** — max 6 lines changed per patch. Prevents cascading cross-file refactors. +3. **Cargo-green gate** — `cargo check --workspace` must pass. Red build → `git checkout -- file`. +4. **Warning-count gate** — workspace baseline warning count measured at start; after patch, new count must be `≤ baseline`. Catches unused-import additions and dead_code-after-allow-removed patterns. **THIS GATE CAUGHT 96b46cd's HashSet noise.** +5. **Rationale-diff token alignment** — rationale text must share at least one non-stopword token with the patch's new_string. Catches "Add destructive SQL filter" claims on `use tracing;` diffs. + +Plus: +- **Deny-list**: `config/`, `ops/`, `auditor/`, `docs/`, `data/`, `mcp-server/`, `ui/`, `sidecar/`, `scripts/` can't be auto-applied (human review required). +- **Branch guard**: refuses to run on `main`. +- **Dry-run workspace revert**: in `COMMIT=0` mode, file is reverted after check regardless of outcome — no state pollution between runs. + +Every decision logs to `data/_kb/auto_apply.jsonl` (action: `committed` / `build_red_reverted` / `warnings_increased_reverted` / `rationale_mismatch_reverted` / `all_rejected` / `no_patches` / `dry_run_would_commit`). + +## 6. Pathway memory (ADR-021) + +**Full spec: `docs/DECISIONS.md` ADR-021. Code: `crates/vectord/src/pathway_memory.rs`.** + +Three-layer matrix index for compounding semantic-correctness signal: + +### Fingerprint (narrow) +`pathway_id = SHA256(task_class + "|" + file_prefix + "|" + signal_class)` — first 2 path segments (`crates/queryd`) so related files in the same crate share pathways. + +### Embedding (similarity vector) +32-bucket L2-normalized token hash. Tokens include: task_class, file_path, signal_class, per-attempt model+rung+accepted flag, KB chunk source_docs, observer class, bridge libraries, sub-pipeline calls, **semantic_flags**, and **bug_fingerprints (flag+pattern_key)**. + +**TS and Rust implementations byte-match** — verified by smoke test showing cosine=1.0 on same input tokens. This is load-bearing for the TS-written traces to be searchable against the Rust-indexed space. + +### Hot-swap gate (5-factor AND) +``` +narrow_fingerprint_matches +AND audit_consensus.pass != false (null OK during bootstrap) +AND replay_count >= 3 (probation) +AND success_rate >= 0.80 +AND NOT retired +AND similarity(query_vec, stored.pathway_vec) >= 0.90 +``` + +Replay bookkeeping: on hot-swap, `replay_count++`; if the recommended model succeeded, `replays_succeeded++`; if `replay_count >= 3 AND success_rate < 0.80` → `retired = true` (sticky — prevents oscillation on noise). + +### Semantic-correctness layer (ADR-021) +Each `PathwayTrace` carries: +- `semantic_flags: Vec` — one of 9 variants: `UnitMismatch`, `TypeConfusion`, `NullableConfusion`, `OffByOne`, `StaleReference`, `PseudoImpl`, `DeadCode`, `WarningNoise`, `BoundaryViolation` +- `bug_fingerprints: Vec` — `{flag, pattern_key, example, occurrences}` where `pattern_key = "{Flag}:{sorted-top-3-identifiers-joined-by-hyphen}"`. Stable across prose variation. +- `type_hints_used: Vec` — `{source, symbol, type_repr}`. Phase E (not yet populated). + +**Pre-review enrichment**: scrum calls `POST /vectors/pathway/bug_fingerprints` with `{task_class, file_path, signal_class, limit}` — returns aggregated fingerprints sorted by occurrences descending. If any, a `📚 PATHWAY MEMORY` preamble is prepended to the reviewer prompt with "this file area had these patterns before — check for recurrences." + +**Post-review extractor** (Phase D, `scrum_master_pipeline.ts`): walks reviewer markdown line-by-line, finds lines containing a `SemanticFlag` variant, extracts identifier-shaped backtick-quoted tokens, filters out flag names + Rust keywords (self/mut/async/etc), sorts and takes top 3, builds `pattern_key = "{Flag}:{tokens}"`. + +### HTTP surface (on gateway port 3100) +| Endpoint | Purpose | +|---|---| +| `POST /vectors/pathway/insert` | write a full PathwayTrace | +| `POST /vectors/pathway/query` | hot-swap candidate check (returns `{candidate: null}` or `{candidate: {...}}`) | +| `POST /vectors/pathway/record_replay` | update replay_count + success_rate after hot-swap | +| `GET /vectors/pathway/stats` | totals + reuse_rate + replay_success_rate | +| `POST /vectors/pathway/bug_fingerprints` | aggregated fingerprints by narrow fingerprint (for pre-review preamble) | + +### State persistence +`data/_pathway_memory/state.json` — JSON dump of all buckets. Loaded at gateway boot (`crates/gateway/src/main.rs` has `pwm.load_from_storage().await`). + +## 7. Current state (2026-04-24 end of session) + +### Commits on branch `scrum/auto-apply-19814` since iter-5 baseline + +| # | SHA | Subject | +|---|---|---| +| 1 | `25ea3de` | observer fix — route LLM Team escalation to `/v1/chat` qwen3-coder | +| 2 | `8b77d67` | OpenRouter rescue ladder + tree-split reducer + first auto-applier | +| 3 | `96b46cd` | first auto-applied commit (later found misleading) | +| 4 | `5e8d87b` | cleanup + applier hardening (warning + rationale + dry-run gates) | +| 5 | `9cc0ceb` | P42-002 — truth gate into queryd `/sql` + `/paged` paths | +| 6 | `2f8b347` | pathway_memory base (PathwayTrace, hot-swap, 18 tests) | +| 7 | `86901f8` | queryd/delta.rs 6-line unit-mismatch fix | +| 8 | `92df0e9` | ADR-021 spec | +| 9 | `0a0843b` | ADR-021 Phases A+B+C (semantic_flags, prompt tags, preamble endpoint) | +| 10 | `ee31424` | ADR-021 Phase D (fingerprint extractor) | +| 11 | `f4cff66` | Phase D fix — strip flag names + Rust keywords from pattern_keys | + +### Matrix index state +- **12 pathway traces** in `data/_pathway_memory/state.json` +- **11 distinct bug fingerprints** across 4 Flag categories on `crates/queryd` narrow fingerprint (1 manually seeded + 10 extracted) +- **0 hot-swaps fired** (probation requires ≥3 replays per pathway; none reached yet) + +### Active in-flight +- Iter 9 complete; next iter 10+ will use cleaner fingerprint extractor (`f4cff66`) +- 4 "noisy" pattern_keys from iter-9-file-1 pre-fix run (e.g., `DeadCode:DeadCode`) — dormant, won't match future output, acceptable dead entries + +### Queued (not yet implemented) +- **Phase E** — `type_hints_used` population from `catalogd` column types, Arrow `RecordBatch.schema()`, Rust struct field types. Feeds typed context to reviewer prompt. +- **Auditor → pathway audit_consensus wire** — activates the strict-audit gate (currently lenient: null bootstraps, only explicit `false` blocks). +- **VCP UI cards** for "top bug fingerprints in last N iters" + "new patterns learned this iter" + +## 8. How to run a new iteration + +```bash +# Default 3 files (playbook_memory.rs, doc_drift.rs, auditor/audit.ts) +LH_SCRUM_FORENSIC=/home/profit/lakehouse/docs/SCRUM_FORENSIC_PROMPT.md \ +LH_SCRUM_PROPOSAL=/home/profit/lakehouse/docs/SCRUM_FIX_WAVE.md \ +bun run tests/real-world/scrum_master_pipeline.ts + +# Targeted files: +LH_SCRUM_FILES="/home/profit/lakehouse/crates/queryd/src/delta.rs,/home/profit/lakehouse/crates/queryd/src/service.rs" \ +LH_SCRUM_FORENSIC=... LH_SCRUM_PROPOSAL=... \ +bun run tests/real-world/scrum_master_pipeline.ts + +# Dry-run auto-applier against the latest scrum output: +LH_APPLIER_MIN_CONF=85 LH_APPLIER_MAX_FILES=10 \ +LH_APPLIER_MODEL=qwen3-coder:480b \ +LH_APPLIER_BRANCH=scrum/auto-apply-19814 \ +bun run tests/real-world/scrum_applier.ts + +# Actually commit (ONLY after dry-run looks clean): +LH_APPLIER_COMMIT=1 LH_APPLIER_MIN_CONF=85 LH_APPLIER_MAX_FILES=10 \ +LH_APPLIER_MODEL=qwen3-coder:480b \ +LH_APPLIER_BRANCH=scrum/auto-apply-19814 \ +bun run tests/real-world/scrum_applier.ts +``` + +## 9. Verify services before running + +```bash +# Gateway (port 3100) — must be up; pathway endpoints are here +curl -s http://localhost:3100/health # "lakehouse ok" +curl -s http://localhost:3100/vectors/pathway/stats # pathway memory totals + +# UI (port 3950) — VCP dashboard + /data/pathway_stats aggregation +curl -s http://localhost:3950/data/pathway_stats + +# Observer (port 3800) — event receiver + LLM Team escalation +curl -s http://localhost:3800/health 2>/dev/null || true + +# Sidecar (port 3200) — Python embed +curl -s http://localhost:3200/health 2>/dev/null || true + +# LLM Team (port 5000) — /api/run?mode=extract ONLY registered mode +# (others like code_review/patch/refactor return "Unknown mode") +curl -s http://localhost:5000/health 2>/dev/null || true +``` + +If gateway missing new routes after code change: `cargo build --release -p gateway && sudo systemctl restart lakehouse.service`. + +If UI missing new routes: kill old `bun run ui/server.ts` and restart (not a systemd service right now). + +## 10. Where things live (code pointers) + +| Concern | File | +|---|---| +| Scrum orchestrator | `tests/real-world/scrum_master_pipeline.ts` | +| Scrum ladder constant | same file, `const LADDER` line ~92 | +| Tree-split reducer | same file, `async function treeSplitFile` | +| Forensic prompt preamble (loaded via env) | `docs/SCRUM_FORENSIC_PROMPT.md` | +| Fix-wave proposal preamble | `docs/SCRUM_FIX_WAVE.md` | +| Scrum iter notes | `docs/SCRUM_LOOP_NOTES.md` | +| Auto-applier | `tests/real-world/scrum_applier.ts` | +| Applier audit trail | `data/_kb/auto_apply.jsonl` | +| Scrum reviews KB | `data/_kb/scrum_reviews.jsonl` | +| Model trust journal | `data/_kb/model_trust.jsonl` | +| Pathway memory module | `crates/vectord/src/pathway_memory.rs` | +| Pathway HTTP handlers | `crates/vectord/src/service.rs` (bottom) | +| Pathway state on disk | `data/_pathway_memory/state.json` | +| VCP UI server | `ui/server.ts` | +| VCP UI client | `ui/ui.js` + `ui/ui.css` + `ui/index.html` | +| Observer | `mcp-server/observer.ts` | +| Auditor | `auditor/audit.ts` | +| LLM Team extract client | `auditor/fact_extractor.ts` | +| ADR-021 spec | `docs/DECISIONS.md` ADR-021 | + +## 11. Key memory files a fresh session should read + +From `/root/.claude/projects/-home-profit/memory/`: + +- `project_scrum_pipeline.md` — updated state of the scrum iterations +- `project_first_auto_apply.md` — 96b46cd story + cleanup + hardening evidence from iter 7 +- `feedback_semantic_correctness_via_matrix.md` — J's insight on compounding, the ADR-021 rule +- `feedback_endpoint_probe_discipline.md` — GET 405 is not endpoint validation +- `reference_llm_team_modes.md` — only `extract` is registered on port 5000 +- `feedback_scrum_cloud_first.md` — scrum/audit/enrich pipelines use cloud first +- `feedback_cloud_determinism.md` — cloud N=3 consensus + qwen3-coder tie-breaker + +## 12. Known gotchas + +- **Gateway restart needed after Rust route additions.** `sudo systemctl restart lakehouse.service` — the service is systemd-managed. +- **UI server needs manual restart** after `ui/server.ts` changes (no systemd unit). Kill old `bun` pid, restart with `bun run ui/server.ts &`. +- **LLM Team mode `code_review` doesn't exist** — only `extract` is registered in `/root/llm_team_ui.py`. Don't wire new features to "Unknown mode" endpoints. See `reference_llm_team_modes.md`. +- **OpenRouter free-tier 429s during consensus probes** are normal (rate-limited upstream). In the production ladder they hit as last-resort rescue with seconds-to-minutes gap; different traffic pattern than rapid-fire consensus runs. +- **Openrouter minimax-m2.5:free has a 45s timeout** — not in ladder, only for one-off probes. +- **Probation period is 3 replays** before hot-swap can fire. On a fresh install, no hot-swap fires until a pathway has been re-visited ≥3 times.