All checks were successful
lakehouse/auditor all checks passed (3 findings, all info)
Three artifacts in one PR:
1. docs/PYTHON_INVENTORY.md — every .py file in the repo classified:
Production (sidecar routers + 3 systemd services), Documented
(kb_measure, kb_staffer_report), Manual (one-off tools), Dead
(sidecar/sidecar/lab_ui.py + pipeline_lab.py are genuinely
not imported anywhere).
2. docs/COHESION_INTEGRATION_PLAN.md — the "smarter DB" loop J
called out as missing. Six phases A-F. Phase A ships here; B-F
are named + sequenced for follow-up PRs. Each phase adds ONE
wire of the loop; no single PR does them all.
3. Phase A wire (auditor verdicts → observer + KB):
- auditor/audit.ts: after assembleVerdict, fire-and-forget POST
to :3800/event with source="auditor" AND append to
data/_kb/outcomes.jsonl with kind="audit". Errors log + drop
— the verdict is still on disk at _auditor/verdicts/.
- mcp-server/observer.ts: extend source union to include
"auditor" | "bot" (was "mcp" | "scenario" only, which silently
coerced my first auditor POST to source="scenario"). Accept
body.ok OR body.success. Accept body.audit_duration_ms as a
fallback for duration_ms. Uses body.one_liner as
output_summary when set.
Live-verified after observer restart:
re-audit PR #6 → verdict=request_changes, 4 findings (1 warn)
observer: by_source={'auditor': 1} (previously coerced to 'scenario')
_kb/outcomes.jsonl tail: kind=audit sig=pr6-7fe47bab
pr=6 overall=request_changes
The shape of the loop is now visible to downstream consumers. Phase
B (auditor's kb_query check reads these audit rows for history)
lands in a follow-up PR. Phase C-F similar.
NOT in this PR:
- Actually deleting lab_ui.py + pipeline_lab.py (operator decision,
called out in the inventory doc)
- Cleaning up the 5 overlapping Python scripts (same)
- Phases B-F of the cohesion plan (separate PRs per wire)
- Integration test that asserts "smarter DB" across runs (Phase F)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
127 lines
6.5 KiB
Markdown
127 lines
6.5 KiB
Markdown
# Cohesion integration plan — the "smarter DB" loop
|
|
|
|
**Written:** 2026-04-22, after J flagged that the system has good parts but they don't compose into the self-improving loop promised in the control-plane thesis (`project_control_plane_thesis.md` memory: 0→85% via hyperfocus-then-escalate).
|
|
|
|
## The gap
|
|
|
|
Each piece works in isolation. What's not wired:
|
|
|
|
```
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ observer │──✓──│ data/_kb/ │──?──│ auditor │
|
|
│ :3800 │ │ outcomes. │ │ kb_query │
|
|
│ │ │ jsonl │ │ check │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
▲ │
|
|
│? ▼
|
|
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
|
|
│ auditor │─────│ data/_audit │ │ cloud │
|
|
│ verdicts │ │or/verdicts/ │ │ inference │
|
|
│ │ │ │ │ │
|
|
└─────────────┘ └─────────────┘ └─────────────┘
|
|
│?
|
|
▼
|
|
┌─────────────┐
|
|
│ hybrid_srch │
|
|
│ playbook_mem│
|
|
│ context7 │
|
|
└─────────────┘
|
|
```
|
|
|
|
- **✓** solid: observer captures scenarios, writes `_kb/outcomes.jsonl`.
|
|
- **?** missing:
|
|
- auditor verdicts go to `_auditor/verdicts/` — not to observer or KB
|
|
- auditor kb_query reads `_kb/outcomes.jsonl` but doesn't query hybrid_search, playbook_memory, or context7
|
|
- auditor inference sends claim+diff to cloud — but WITHOUT KB neighbors, WITHOUT drift context, WITHOUT tool-aware retrieval
|
|
- Result: every audit is stateless-ish. The DB doesn't get smarter across audits.
|
|
|
|
## The target loop
|
|
|
|
```
|
|
PR opened
|
|
↓
|
|
auditor fetches diff + claims
|
|
↓
|
|
auditor enriches with:
|
|
— KB neighbors: past verdicts on similar claim-hashes
|
|
— hybrid_search: playbooks for task classes the diff touches
|
|
— context7 drift status: for any tool named in commit messages
|
|
— MCP tools: agent-style queries if relevant
|
|
↓
|
|
cloud inference sees: diff + claims + enrichment
|
|
↓
|
|
verdict posted to Gitea
|
|
↓
|
|
verdict ALSO persisted to:
|
|
— observer :3800/event (source=auditor)
|
|
— _kb/outcomes.jsonl (kind=audit, sig_hash=pr+sha)
|
|
↓
|
|
next audit on similar PR:
|
|
— KB neighbors find this prior verdict
|
|
— cloud sees "similar PRs ended in request_changes for reason X"
|
|
— verdict is more calibrated
|
|
```
|
|
|
|
## Phases of closure
|
|
|
|
Building this in order from least-invasive:
|
|
|
|
### Phase A — Verdict indexing (shipped in this PR)
|
|
|
|
After every `auditPr()` completes, in addition to persisting to `_auditor/verdicts/`:
|
|
|
|
1. POST the verdict to observer `:3800/event` with `source: "auditor"`, `ok: verdict === "approve"`, `event_kind: "audit"`, `sig_hash: <stable hash of pr_number + head_sha>`.
|
|
2. Append a simplified outcome to `data/_kb/outcomes.jsonl` with a special `kind: "audit"` row so the KB surface treats audits alongside scenarios.
|
|
|
|
Minimal surgery. Doesn't change the verdict itself. Just makes it visible to downstream consumers.
|
|
|
|
**Test:** after this PR lands, re-run the auditor once; observer stats should show `by_source.auditor > 0`; `_kb/outcomes.jsonl` should have one new row per audited SHA.
|
|
|
|
### Phase B — KB query sees auditor history
|
|
|
|
Extend `auditor/checks/kb_query.ts` to:
|
|
|
|
1. Read `_kb/outcomes.jsonl`, filter `kind === "audit"`, find prior audits with matching `sig_hash` (same PR across SHAs) or similar claim hashes.
|
|
2. Emit finding: "prior N audits on this PR ended in [approve, block, request_changes], last reason was X."
|
|
|
|
Gives the auditor memory across re-audits of the same PR.
|
|
|
|
### Phase C — Hybrid search in kb_query
|
|
|
|
When a PR touches `crates/vectord/src/playbook_memory.rs` (task-class matching), auditor calls `POST /vectors/hybrid` to find playbooks semantically related to the diff. Surfaces as "N playbooks in production rely on this code path, consider backward-compat."
|
|
|
|
Requires:
|
|
- Task-class extractor from diff paths
|
|
- Cloud-free hybrid_search call (we have local Ollama for embeddings)
|
|
|
|
### Phase D — Context7 drift awareness in auditor
|
|
|
|
If the diff / commit message names a tool that's in any playbook's `doc_refs`, auditor calls the context7 bridge to check current drift status. Surfaces as "tool X has drifted since N playbooks referenced it; this PR may need to update those."
|
|
|
|
### Phase E — Inference sees enrichment
|
|
|
|
The cloud inference check currently sends `diff + claims`. Extend to send `diff + claims + kb_neighbors + drift_context + hybrid_search_matches`. The prompt becomes context-rich — exactly what makes the cloud model competent (per the control-plane thesis).
|
|
|
|
### Phase F — Full integration test
|
|
|
|
An auditor self-test that:
|
|
|
|
1. Creates a synthetic PR with known-good and known-bad claims
|
|
2. Runs the enriched auditor
|
|
3. Asserts the verdict found the planted issues
|
|
4. Runs a second identical-claim PR
|
|
5. Asserts the SECOND verdict references the FIRST audit (via KB neighbor retrieval)
|
|
6. "Smarter DB" proof: two runs, measurable context gain.
|
|
|
|
## Sequence
|
|
|
|
Phase A lands in this PR alongside the inventory. Phases B-F are follow-up PRs, each with their own auditor gate. Order matters: A → B (reads A's output) → C+D (in parallel) → E (consumes B/C/D) → F (asserts E's behavior).
|
|
|
|
## Not in this plan (deliberate)
|
|
|
|
- **Rewriting the auditor to use a different architecture.** Current 4-check model stays; we just enrich the checks.
|
|
- **Tuning cloud inference precision.** The false-positive rate is a prompt-engineering concern; this plan is about context enrichment, which is separate.
|
|
- **Branch protection enforcement.** Stays off until Phase F passes.
|
|
|
|
The overall bet: this is the "putting it all together coherently" J said was a real problem. Six phases over however many PRs it takes. Each one ships one wire of the loop; no single PR tries to do them all.
|