profit ac01fffd9a checkpoint: matrix-agent-validated (2026-04-25)
Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.

WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.

WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
  briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
    * UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
    * REVISE: chains versions, parent.superseded_at + superseded_by stamped
    * RETIRE: marks specific trace retired with reason, excluded from retrieval
    * HISTORY: walks chain root→tip, cycle-safe

KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces

Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:43:27 -05:00

128 lines
5.7 KiB
Markdown

# Lakehouse PR-Bot
A local sub-agent that reads the PRD, asks a cloud model for a small change
proposal, applies it, runs tests, and opens a draft PR on Gitea. Manual-only
right now — no systemd, no cron. Run one cycle at a time and watch.
## Run one cycle
```bash
cd /home/profit/lakehouse
bun run bot/cycle.ts
```
Prerequisites:
- Working tree must be clean (`git status` shows nothing). The bot refuses on a dirty tree so its changes don't mix with in-flight work.
- Sidecar running on :3200, gateway on :3100, observer on :3800.
- At least one PRD line tagged `[bot-eligible]` (see below).
- Gitea PAT configured in `~/.git-credentials` (already set up).
## Tag a gap as bot-eligible
Add `[bot-eligible]` to any PRD line the bot is allowed to work on:
```md
- [ ] Add a unit test for parse_city_state covering "South Bend, IN" edge case [bot-eligible]
```
The bot scans `docs/PRD.md` for these tags. Each tagged line becomes a candidate.
Start small — one tag at a time — until you trust the loop.
## Stop it mid-cycle
Create the pause file:
```bash
touch /home/profit/lakehouse/bot.paused
```
The next `bot/cycle.ts` invocation exits immediately with `skipped_pause`.
(It does NOT kill a cycle already in-flight — use `Ctrl-C` or `pkill -f bot/cycle.ts` for that.)
## Budget
- **20 cloud calls/day, 160k tokens/day** (hard ceiling, see `bot/cost.ts`).
- Tracked in `data/_bot/cost-YYYY-MM-DD.json`.
- Resets at UTC midnight.
## Cycle outcomes
Every run writes a result to `data/_bot/cycles/{cycle_id}.json` and POSTs an event to the observer on :3800.
Possible outcomes:
| Outcome | Meaning |
|---|---|
| `ok` | PR opened |
| `cycle_noop` | proposal applied, every file was **identical to current content** (Mem0 NOOP); no test run, no PR |
| `skipped_pause` | pause file present |
| `skipped_cost` | daily budget exhausted |
| `skipped_policy` | `policy.shouldRunCycle` said no |
| `skipped_dirty_tree` | uncommitted changes present |
| `skipped_no_gap` | no `[bot-eligible]` tags in PRD |
| `model_failed` | sidecar/cloud model errored or returned unparseable JSON |
| `proposal_rejected` | `policy.scoreProposal` rejected it (size, path, etc.) |
| `apply_failed` | file write errored |
| `tests_failed` | cargo or bun test red |
| `pr_skipped_by_policy` | green tests, but `policy.shouldOpenPR` said no |
| `pr_failed` | Gitea API or git push errored |
## Mem0-aligned apply semantics
`apply.ts` categorizes every proposed file into one of three modes:
| Mode | Trigger | Action |
|---|---|---|
| **ADD** | `is_new: true` and file doesn't exist | Create the file |
| **UPDATE** | `is_new: false`, file exists, content **differs** from current | Overwrite |
| **NOOP** | `is_new: false`, file exists, content **matches** current exactly | Skip (no write, no diff) |
If every file in the proposal is NOOP, the cycle short-circuits to `cycle_noop` **before** running tests or opening a PR. Mismatched shapes (`is_new:true` but file exists, or `is_new:false` but file missing) become `apply_failed` — model state confusion is surfaced, not papered over.
PR bodies report the three counts separately so reviewers see what actually changed vs. what was confirmed identical.
## How the bot compounds over cycles
Every finished cycle persists a `CycleResult` to `data/_bot/cycles/{cycle_id}.json`. At the **start** of the next cycle, `bot/kb.ts::loadHistory(gap_id)` scans that directory, filters to prior cycles on the same gap, and returns the five most recent outcomes (`ok`, `tests_failed`, `proposal_rejected`, `cycle_noop`, etc.) with their reasons and touched files.
Those outcomes are summarized into a compact block and **injected into the cloud prompt** before asking for a new proposal. The model sees things like:
```
Prior attempts on this gap (3 most recent):
- 2026-04-22 03:15 UTC — tests_failed
reason: cargo test -p vectord::lance failed on field_type_coerce
- 2026-04-22 02:48 UTC — proposal_rejected
reason: touched forbidden path (docs/ADR-): docs/ADR-019-update.md
- 2026-04-22 01:23 UTC — ok PR: https://git.agentview.dev/profit/lakehouse/pulls/142
reason: PR #142 opened
Learn from these: build on what worked, avoid paths that failed.
```
This is the same compounding pattern `scenario.ts` uses via `kb.loadRecommendation` — the bot's cycles are the bot's memory. No embedding, no separate jsonl, no cross-run orchestration required. First cycle on a new gap skips the block cleanly.
The observer events POSTed on every cycle carry the same data, so `GET :3800/stats?source=bot` aggregates "% of bot cycles on similar gaps that landed a PR" without extra plumbing.
## Where YOU edit
`bot/policy.ts` — four small functions that define what the bot does. The rest
(`propose.ts`, `apply.ts`, `test.ts`, `pr.ts`, `cycle.ts`, `kb.ts`) is mechanical
orchestration — you shouldn't need to touch it unless a pipeline changes.
One policy upgrade the history opens up: in `shouldRunCycle`, you can now bail
out if the same gap has failed N times in a row. Example addition:
```ts
import { loadHistory, statsFor } from "./kb.ts";
// inside shouldRunCycle — but you'd need the gap, which is picked later.
// A more natural place is scoreProposal: if prior N failures on this
// gap's path+summary pattern, reject before testing.
```
## Hard-coded guardrails (not in policy — can't be disabled)
- Bot **never deletes files** (`apply.ts` has no delete path).
- Bot **never touches** `.git/`, `secrets`, `lakehouse.toml`, `docs/ADR-*`, `docs/DECISIONS.md`, `docs/PRD.md`, `/etc/`, `/root/`, `Cargo.lock`.
- All paths validated for traversal (`path/../`) and repo-escape.
- PRs always open as **draft** — never auto-merge.
- Budget check is before the policy function runs — no way to override the daily cap from `policy.ts`.