matrix-agent-validated/bot/README.md

# Lakehouse PR-Bot

A local sub-agent that reads the PRD, asks a cloud model for a small change
proposal, applies it, runs tests, and opens a draft PR on Gitea. Manual-only
right now — no systemd, no cron. Run one cycle at a time and watch.

## Run one cycle

```bash
cd /home/profit/lakehouse
bun run bot/cycle.ts
```

Prerequisites:
- Working tree must be clean (`git status` shows nothing). The bot refuses on a dirty tree so its changes don't mix with in-flight work.
- Sidecar running on :3200, gateway on :3100, observer on :3800.
- At least one PRD line tagged `[bot-eligible]` (see below).
- Gitea PAT configured in `~/.git-credentials` (already set up).

## Tag a gap as bot-eligible

Add `[bot-eligible]` to any PRD line the bot is allowed to work on:

```md
- [ ] Add a unit test for parse_city_state covering "South Bend, IN" edge case [bot-eligible]
```

The bot scans `docs/PRD.md` for these tags. Each tagged line becomes a candidate.
Start small — one tag at a time — until you trust the loop.

## Stop it mid-cycle

Create the pause file:

```bash
touch /home/profit/lakehouse/bot.paused
```

The next `bot/cycle.ts` invocation exits immediately with `skipped_pause`.
(It does NOT kill a cycle already in-flight — use `Ctrl-C` or `pkill -f bot/cycle.ts` for that.)

## Budget

- **20 cloud calls/day, 160k tokens/day** (hard ceiling, see `bot/cost.ts`).
- Tracked in `data/_bot/cost-YYYY-MM-DD.json`.
- Resets at UTC midnight.

## Cycle outcomes

Every run writes a result to `data/_bot/cycles/{cycle_id}.json` and POSTs an event to the observer on :3800.
Possible outcomes:

| Outcome | Meaning |
|---|---|
| `ok` | PR opened |
| `cycle_noop` | proposal applied, every file was **identical to current content** (Mem0 NOOP); no test run, no PR |
| `skipped_pause` | pause file present |
| `skipped_cost` | daily budget exhausted |
| `skipped_policy` | `policy.shouldRunCycle` said no |
| `skipped_dirty_tree` | uncommitted changes present |
| `skipped_no_gap` | no `[bot-eligible]` tags in PRD |
| `model_failed` | sidecar/cloud model errored or returned unparseable JSON |
| `proposal_rejected` | `policy.scoreProposal` rejected it (size, path, etc.) |
| `apply_failed` | file write errored |
| `tests_failed` | cargo or bun test red |
| `pr_skipped_by_policy` | green tests, but `policy.shouldOpenPR` said no |
| `pr_failed` | Gitea API or git push errored |

## Mem0-aligned apply semantics

`apply.ts` categorizes every proposed file into one of three modes:

| Mode | Trigger | Action |
|---|---|---|
| **ADD** | `is_new: true` and file doesn't exist | Create the file |
| **UPDATE** | `is_new: false`, file exists, content **differs** from current | Overwrite |
| **NOOP** | `is_new: false`, file exists, content **matches** current exactly | Skip (no write, no diff) |

If every file in the proposal is NOOP, the cycle short-circuits to `cycle_noop` **before** running tests or opening a PR. Mismatched shapes (`is_new:true` but file exists, or `is_new:false` but file missing) become `apply_failed` — model state confusion is surfaced, not papered over.

PR bodies report the three counts separately so reviewers see what actually changed vs. what was confirmed identical.

## How the bot compounds over cycles

Every finished cycle persists a `CycleResult` to `data/_bot/cycles/{cycle_id}.json`. At the **start** of the next cycle, `bot/kb.ts::loadHistory(gap_id)` scans that directory, filters to prior cycles on the same gap, and returns the five most recent outcomes (`ok`, `tests_failed`, `proposal_rejected`, `cycle_noop`, etc.) with their reasons and touched files.

Those outcomes are summarized into a compact block and **injected into the cloud prompt** before asking for a new proposal. The model sees things like:

```
Prior attempts on this gap (3 most recent):
- 2026-04-22 03:15 UTC — tests_failed
    reason: cargo test -p vectord::lance failed on field_type_coerce
- 2026-04-22 02:48 UTC — proposal_rejected
    reason: touched forbidden path (docs/ADR-): docs/ADR-019-update.md
- 2026-04-22 01:23 UTC — ok PR: https://git.agentview.dev/profit/lakehouse/pulls/142
    reason: PR #142 opened

Learn from these: build on what worked, avoid paths that failed.
```

This is the same compounding pattern `scenario.ts` uses via `kb.loadRecommendation` — the bot's cycles are the bot's memory. No embedding, no separate jsonl, no cross-run orchestration required. First cycle on a new gap skips the block cleanly.

The observer events POSTed on every cycle carry the same data, so `GET :3800/stats?source=bot` aggregates "% of bot cycles on similar gaps that landed a PR" without extra plumbing.

## Where YOU edit

`bot/policy.ts` — four small functions that define what the bot does. The rest
(`propose.ts`, `apply.ts`, `test.ts`, `pr.ts`, `cycle.ts`, `kb.ts`) is mechanical
orchestration — you shouldn't need to touch it unless a pipeline changes.

One policy upgrade the history opens up: in `shouldRunCycle`, you can now bail
out if the same gap has failed N times in a row. Example addition:

```ts
import { loadHistory, statsFor } from "./kb.ts";
// inside shouldRunCycle — but you'd need the gap, which is picked later.
// A more natural place is scoreProposal: if prior N failures on this
// gap's path+summary pattern, reject before testing.
```

## Hard-coded guardrails (not in policy — can't be disabled)

- Bot **never deletes files** (`apply.ts` has no delete path).
- Bot **never touches** `.git/`, `secrets`, `lakehouse.toml`, `docs/ADR-*`, `docs/DECISIONS.md`, `docs/PRD.md`, `/etc/`, `/root/`, `Cargo.lock`.
- All paths validated for traversal (`path/../`) and repo-escape.
- PRs always open as **draft** — never auto-merge.
- Budget check is before the policy function runs — no way to override the daily cap from `policy.ts`.