lakehouse/docs/COHESION_INTEGRATION_PLAN.md
profit 6e39d8778f
All checks were successful
lakehouse/auditor all checks passed (3 findings, all info)
Cohesion: Python inventory + integration plan + Phase A verdict indexing
Three artifacts in one PR:

1. docs/PYTHON_INVENTORY.md — every .py file in the repo classified:
   Production (sidecar routers + 3 systemd services), Documented
   (kb_measure, kb_staffer_report), Manual (one-off tools), Dead
   (sidecar/sidecar/lab_ui.py + pipeline_lab.py are genuinely
   not imported anywhere).

2. docs/COHESION_INTEGRATION_PLAN.md — the "smarter DB" loop J
   called out as missing. Six phases A-F. Phase A ships here; B-F
   are named + sequenced for follow-up PRs. Each phase adds ONE
   wire of the loop; no single PR does them all.

3. Phase A wire (auditor verdicts → observer + KB):
   - auditor/audit.ts: after assembleVerdict, fire-and-forget POST
     to :3800/event with source="auditor" AND append to
     data/_kb/outcomes.jsonl with kind="audit". Errors log + drop
     — the verdict is still on disk at _auditor/verdicts/.
   - mcp-server/observer.ts: extend source union to include
     "auditor" | "bot" (was "mcp" | "scenario" only, which silently
     coerced my first auditor POST to source="scenario"). Accept
     body.ok OR body.success. Accept body.audit_duration_ms as a
     fallback for duration_ms. Uses body.one_liner as
     output_summary when set.

Live-verified after observer restart:
   re-audit PR #6 → verdict=request_changes, 4 findings (1 warn)
     observer: by_source={'auditor': 1}  (previously coerced to 'scenario')
     _kb/outcomes.jsonl tail: kind=audit sig=pr6-7fe47bab
       pr=6 overall=request_changes

The shape of the loop is now visible to downstream consumers. Phase
B (auditor's kb_query check reads these audit rows for history)
lands in a follow-up PR. Phase C-F similar.

NOT in this PR:
- Actually deleting lab_ui.py + pipeline_lab.py (operator decision,
  called out in the inventory doc)
- Cleaning up the 5 overlapping Python scripts (same)
- Phases B-F of the cohesion plan (separate PRs per wire)
- Integration test that asserts "smarter DB" across runs (Phase F)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-22 17:22:42 -05:00

6.5 KiB

Cohesion integration plan — the "smarter DB" loop

Written: 2026-04-22, after J flagged that the system has good parts but they don't compose into the self-improving loop promised in the control-plane thesis (project_control_plane_thesis.md memory: 0→85% via hyperfocus-then-escalate).

The gap

Each piece works in isolation. What's not wired:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   observer  │──✓──│  data/_kb/  │──?──│   auditor   │
│    :3800    │     │ outcomes.   │     │  kb_query   │
│             │     │   jsonl     │     │   check     │
└─────────────┘     └─────────────┘     └─────────────┘
      ▲                                         │
      │?                                        ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  auditor    │─────│ data/_audit │     │   cloud     │
│  verdicts   │     │or/verdicts/ │     │  inference  │
│             │     │             │     │             │
└─────────────┘     └─────────────┘     └─────────────┘
                                                │?
                                                ▼
                                         ┌─────────────┐
                                         │ hybrid_srch │
                                         │ playbook_mem│
                                         │  context7   │
                                         └─────────────┘
  • solid: observer captures scenarios, writes _kb/outcomes.jsonl.
  • ? missing:
    • auditor verdicts go to _auditor/verdicts/ — not to observer or KB
    • auditor kb_query reads _kb/outcomes.jsonl but doesn't query hybrid_search, playbook_memory, or context7
    • auditor inference sends claim+diff to cloud — but WITHOUT KB neighbors, WITHOUT drift context, WITHOUT tool-aware retrieval
  • Result: every audit is stateless-ish. The DB doesn't get smarter across audits.

The target loop

PR opened
  ↓
auditor fetches diff + claims
  ↓
auditor enriches with:
  — KB neighbors: past verdicts on similar claim-hashes
  — hybrid_search: playbooks for task classes the diff touches
  — context7 drift status: for any tool named in commit messages
  — MCP tools: agent-style queries if relevant
  ↓
cloud inference sees: diff + claims + enrichment
  ↓
verdict posted to Gitea
  ↓
verdict ALSO persisted to:
  — observer :3800/event (source=auditor)
  — _kb/outcomes.jsonl (kind=audit, sig_hash=pr+sha)
  ↓
next audit on similar PR:
  — KB neighbors find this prior verdict
  — cloud sees "similar PRs ended in request_changes for reason X"
  — verdict is more calibrated

Phases of closure

Building this in order from least-invasive:

Phase A — Verdict indexing (shipped in this PR)

After every auditPr() completes, in addition to persisting to _auditor/verdicts/:

  1. POST the verdict to observer :3800/event with source: "auditor", ok: verdict === "approve", event_kind: "audit", sig_hash: <stable hash of pr_number + head_sha>.
  2. Append a simplified outcome to data/_kb/outcomes.jsonl with a special kind: "audit" row so the KB surface treats audits alongside scenarios.

Minimal surgery. Doesn't change the verdict itself. Just makes it visible to downstream consumers.

Test: after this PR lands, re-run the auditor once; observer stats should show by_source.auditor > 0; _kb/outcomes.jsonl should have one new row per audited SHA.

Phase B — KB query sees auditor history

Extend auditor/checks/kb_query.ts to:

  1. Read _kb/outcomes.jsonl, filter kind === "audit", find prior audits with matching sig_hash (same PR across SHAs) or similar claim hashes.
  2. Emit finding: "prior N audits on this PR ended in [approve, block, request_changes], last reason was X."

Gives the auditor memory across re-audits of the same PR.

Phase C — Hybrid search in kb_query

When a PR touches crates/vectord/src/playbook_memory.rs (task-class matching), auditor calls POST /vectors/hybrid to find playbooks semantically related to the diff. Surfaces as "N playbooks in production rely on this code path, consider backward-compat."

Requires:

  • Task-class extractor from diff paths
  • Cloud-free hybrid_search call (we have local Ollama for embeddings)

Phase D — Context7 drift awareness in auditor

If the diff / commit message names a tool that's in any playbook's doc_refs, auditor calls the context7 bridge to check current drift status. Surfaces as "tool X has drifted since N playbooks referenced it; this PR may need to update those."

Phase E — Inference sees enrichment

The cloud inference check currently sends diff + claims. Extend to send diff + claims + kb_neighbors + drift_context + hybrid_search_matches. The prompt becomes context-rich — exactly what makes the cloud model competent (per the control-plane thesis).

Phase F — Full integration test

An auditor self-test that:

  1. Creates a synthetic PR with known-good and known-bad claims
  2. Runs the enriched auditor
  3. Asserts the verdict found the planted issues
  4. Runs a second identical-claim PR
  5. Asserts the SECOND verdict references the FIRST audit (via KB neighbor retrieval)
  6. "Smarter DB" proof: two runs, measurable context gain.

Sequence

Phase A lands in this PR alongside the inventory. Phases B-F are follow-up PRs, each with their own auditor gate. Order matters: A → B (reads A's output) → C+D (in parallel) → E (consumes B/C/D) → F (asserts E's behavior).

Not in this plan (deliberate)

  • Rewriting the auditor to use a different architecture. Current 4-check model stays; we just enrich the checks.
  • Tuning cloud inference precision. The false-positive rate is a prompt-engineering concern; this plan is about context enrichment, which is separate.
  • Branch protection enforcement. Stays off until Phase F passes.

The overall bet: this is the "putting it all together coherently" J said was a real problem. Six phases over however many PRs it takes. Each one ships one wire of the loop; no single PR tries to do them all.