Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.
WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.
WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
* UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
* REVISE: chains versions, parent.superseded_at + superseded_by stamped
* RETIRE: marks specific trace retired with reason, excluded from retrieval
* HISTORY: walks chain root→tip, cycle-safe
KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces
Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
128 lines
5.7 KiB
Markdown
128 lines
5.7 KiB
Markdown
# Lakehouse PR-Bot
|
|
|
|
A local sub-agent that reads the PRD, asks a cloud model for a small change
|
|
proposal, applies it, runs tests, and opens a draft PR on Gitea. Manual-only
|
|
right now — no systemd, no cron. Run one cycle at a time and watch.
|
|
|
|
## Run one cycle
|
|
|
|
```bash
|
|
cd /home/profit/lakehouse
|
|
bun run bot/cycle.ts
|
|
```
|
|
|
|
Prerequisites:
|
|
- Working tree must be clean (`git status` shows nothing). The bot refuses on a dirty tree so its changes don't mix with in-flight work.
|
|
- Sidecar running on :3200, gateway on :3100, observer on :3800.
|
|
- At least one PRD line tagged `[bot-eligible]` (see below).
|
|
- Gitea PAT configured in `~/.git-credentials` (already set up).
|
|
|
|
## Tag a gap as bot-eligible
|
|
|
|
Add `[bot-eligible]` to any PRD line the bot is allowed to work on:
|
|
|
|
```md
|
|
- [ ] Add a unit test for parse_city_state covering "South Bend, IN" edge case [bot-eligible]
|
|
```
|
|
|
|
The bot scans `docs/PRD.md` for these tags. Each tagged line becomes a candidate.
|
|
Start small — one tag at a time — until you trust the loop.
|
|
|
|
## Stop it mid-cycle
|
|
|
|
Create the pause file:
|
|
|
|
```bash
|
|
touch /home/profit/lakehouse/bot.paused
|
|
```
|
|
|
|
The next `bot/cycle.ts` invocation exits immediately with `skipped_pause`.
|
|
(It does NOT kill a cycle already in-flight — use `Ctrl-C` or `pkill -f bot/cycle.ts` for that.)
|
|
|
|
## Budget
|
|
|
|
- **20 cloud calls/day, 160k tokens/day** (hard ceiling, see `bot/cost.ts`).
|
|
- Tracked in `data/_bot/cost-YYYY-MM-DD.json`.
|
|
- Resets at UTC midnight.
|
|
|
|
## Cycle outcomes
|
|
|
|
Every run writes a result to `data/_bot/cycles/{cycle_id}.json` and POSTs an event to the observer on :3800.
|
|
Possible outcomes:
|
|
|
|
| Outcome | Meaning |
|
|
|---|---|
|
|
| `ok` | PR opened |
|
|
| `cycle_noop` | proposal applied, every file was **identical to current content** (Mem0 NOOP); no test run, no PR |
|
|
| `skipped_pause` | pause file present |
|
|
| `skipped_cost` | daily budget exhausted |
|
|
| `skipped_policy` | `policy.shouldRunCycle` said no |
|
|
| `skipped_dirty_tree` | uncommitted changes present |
|
|
| `skipped_no_gap` | no `[bot-eligible]` tags in PRD |
|
|
| `model_failed` | sidecar/cloud model errored or returned unparseable JSON |
|
|
| `proposal_rejected` | `policy.scoreProposal` rejected it (size, path, etc.) |
|
|
| `apply_failed` | file write errored |
|
|
| `tests_failed` | cargo or bun test red |
|
|
| `pr_skipped_by_policy` | green tests, but `policy.shouldOpenPR` said no |
|
|
| `pr_failed` | Gitea API or git push errored |
|
|
|
|
## Mem0-aligned apply semantics
|
|
|
|
`apply.ts` categorizes every proposed file into one of three modes:
|
|
|
|
| Mode | Trigger | Action |
|
|
|---|---|---|
|
|
| **ADD** | `is_new: true` and file doesn't exist | Create the file |
|
|
| **UPDATE** | `is_new: false`, file exists, content **differs** from current | Overwrite |
|
|
| **NOOP** | `is_new: false`, file exists, content **matches** current exactly | Skip (no write, no diff) |
|
|
|
|
If every file in the proposal is NOOP, the cycle short-circuits to `cycle_noop` **before** running tests or opening a PR. Mismatched shapes (`is_new:true` but file exists, or `is_new:false` but file missing) become `apply_failed` — model state confusion is surfaced, not papered over.
|
|
|
|
PR bodies report the three counts separately so reviewers see what actually changed vs. what was confirmed identical.
|
|
|
|
## How the bot compounds over cycles
|
|
|
|
Every finished cycle persists a `CycleResult` to `data/_bot/cycles/{cycle_id}.json`. At the **start** of the next cycle, `bot/kb.ts::loadHistory(gap_id)` scans that directory, filters to prior cycles on the same gap, and returns the five most recent outcomes (`ok`, `tests_failed`, `proposal_rejected`, `cycle_noop`, etc.) with their reasons and touched files.
|
|
|
|
Those outcomes are summarized into a compact block and **injected into the cloud prompt** before asking for a new proposal. The model sees things like:
|
|
|
|
```
|
|
Prior attempts on this gap (3 most recent):
|
|
- 2026-04-22 03:15 UTC — tests_failed
|
|
reason: cargo test -p vectord::lance failed on field_type_coerce
|
|
- 2026-04-22 02:48 UTC — proposal_rejected
|
|
reason: touched forbidden path (docs/ADR-): docs/ADR-019-update.md
|
|
- 2026-04-22 01:23 UTC — ok PR: https://git.agentview.dev/profit/lakehouse/pulls/142
|
|
reason: PR #142 opened
|
|
|
|
Learn from these: build on what worked, avoid paths that failed.
|
|
```
|
|
|
|
This is the same compounding pattern `scenario.ts` uses via `kb.loadRecommendation` — the bot's cycles are the bot's memory. No embedding, no separate jsonl, no cross-run orchestration required. First cycle on a new gap skips the block cleanly.
|
|
|
|
The observer events POSTed on every cycle carry the same data, so `GET :3800/stats?source=bot` aggregates "% of bot cycles on similar gaps that landed a PR" without extra plumbing.
|
|
|
|
## Where YOU edit
|
|
|
|
`bot/policy.ts` — four small functions that define what the bot does. The rest
|
|
(`propose.ts`, `apply.ts`, `test.ts`, `pr.ts`, `cycle.ts`, `kb.ts`) is mechanical
|
|
orchestration — you shouldn't need to touch it unless a pipeline changes.
|
|
|
|
One policy upgrade the history opens up: in `shouldRunCycle`, you can now bail
|
|
out if the same gap has failed N times in a row. Example addition:
|
|
|
|
```ts
|
|
import { loadHistory, statsFor } from "./kb.ts";
|
|
// inside shouldRunCycle — but you'd need the gap, which is picked later.
|
|
// A more natural place is scoreProposal: if prior N failures on this
|
|
// gap's path+summary pattern, reject before testing.
|
|
```
|
|
|
|
## Hard-coded guardrails (not in policy — can't be disabled)
|
|
|
|
- Bot **never deletes files** (`apply.ts` has no delete path).
|
|
- Bot **never touches** `.git/`, `secrets`, `lakehouse.toml`, `docs/ADR-*`, `docs/DECISIONS.md`, `docs/PRD.md`, `/etc/`, `/root/`, `Cargo.lock`.
|
|
- All paths validated for traversal (`path/../`) and repo-escape.
|
|
- PRs always open as **draft** — never auto-merge.
|
|
- Budget check is before the policy function runs — no way to override the daily cap from `policy.ts`.
|