root a00e9bb438
Some checks failed
lakehouse/auditor 2 blocking issues: State field rename likely incomplete — `opencode_key` may not exist on `self.state`
infra: replace gpt-oss with Ollama Pro + OpenCode Zen across hot paths
Ollama Pro plan went live today (39-model fleet on the same
OLLAMA_CLOUD_KEY) and OpenCode Zen was already wired in the gateway
but not consumed. Routing every gpt-oss call site to faster /
stronger replacements:

| Site | gpt-oss → replacement | Why |
|---|---|---|
| ollama_cloud default | gpt-oss:120b → deepseek-v3.2 | newest DeepSeek revision; live-probed `pong` |
| openrouter default | openai/gpt-oss-120b:free → x-ai/grok-4.1-fast | already the scrum LADDER's PRIMARY |
| modes.toml staffing_inference | openai/gpt-oss-120b:free → kimi-k2.6 | coding-specialized, on Ollama Pro |
| modes.toml doc_drift_check | gpt-oss:120b → gemini-3-flash-preview | speed leader for factual checks |
| scrum_master_pipeline tree-split MAP+REDUCE | gpt-oss:120b → gemini-3-flash-preview | latency-dominated path (5-20× per file) |
| bot/propose.ts CLOUD_MODEL | gpt-oss:120b → deepseek-v3.2 | same Ollama key, faster |
| mcp-server/observer.ts overseer label fallback | gpt-oss:120b → claude-opus-4-7 | matches new overseer model |
| crates/gateway/src/execution_loop overseer escalation | ollama_cloud/gpt-oss:120b → opencode/claude-opus-4-7 | frontier reasoning matters here — fires only after local self-correct fails twice; Zen pay-per-token cost is bounded |

Verification:
- `cargo check -p gateway --tests` — clean
- Live probes through localhost:3100/v1/chat:
  - `opencode/claude-opus-4-7` → "pong"
  - `gemini-3-flash-preview` (ollama_cloud) → "pong"
  - `kimi-k2.6` (ollama_cloud) → "pong"
  - `deepseek-v3.2` (ollama_cloud) → "Pong! 🏓"

Notes:
- kimi-k2:1t still upstream-broken (HTTP 500 on Ollama Pro probe today,
  matches yesterday's memory). Replacement table never picks it.
- The Rust changes need a `systemctl restart lakehouse.service` to
  take effect on the running gateway. TS callers reload on next run.
- aibridge/src/context.rs still has gpt-oss:{20b,120b} in its window-
  size lookup table; harmless and kept for callers that pass it
  explicitly as an override.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 06:13:48 -05:00
..

Lakehouse PR-Bot

A local sub-agent that reads the PRD, asks a cloud model for a small change proposal, applies it, runs tests, and opens a draft PR on Gitea. Manual-only right now — no systemd, no cron. Run one cycle at a time and watch.

Run one cycle

cd /home/profit/lakehouse
bun run bot/cycle.ts

Prerequisites:

  • Working tree must be clean (git status shows nothing). The bot refuses on a dirty tree so its changes don't mix with in-flight work.
  • Sidecar running on :3200, gateway on :3100, observer on :3800.
  • At least one PRD line tagged [bot-eligible] (see below).
  • Gitea PAT configured in ~/.git-credentials (already set up).

Tag a gap as bot-eligible

Add [bot-eligible] to any PRD line the bot is allowed to work on:

- [ ] Add a unit test for parse_city_state covering "South Bend, IN" edge case [bot-eligible]

The bot scans docs/PRD.md for these tags. Each tagged line becomes a candidate. Start small — one tag at a time — until you trust the loop.

Stop it mid-cycle

Create the pause file:

touch /home/profit/lakehouse/bot.paused

The next bot/cycle.ts invocation exits immediately with skipped_pause. (It does NOT kill a cycle already in-flight — use Ctrl-C or pkill -f bot/cycle.ts for that.)

Budget

  • 20 cloud calls/day, 160k tokens/day (hard ceiling, see bot/cost.ts).
  • Tracked in data/_bot/cost-YYYY-MM-DD.json.
  • Resets at UTC midnight.

Cycle outcomes

Every run writes a result to data/_bot/cycles/{cycle_id}.json and POSTs an event to the observer on :3800. Possible outcomes:

Outcome Meaning
ok PR opened
cycle_noop proposal applied, every file was identical to current content (Mem0 NOOP); no test run, no PR
skipped_pause pause file present
skipped_cost daily budget exhausted
skipped_policy policy.shouldRunCycle said no
skipped_dirty_tree uncommitted changes present
skipped_no_gap no [bot-eligible] tags in PRD
model_failed sidecar/cloud model errored or returned unparseable JSON
proposal_rejected policy.scoreProposal rejected it (size, path, etc.)
apply_failed file write errored
tests_failed cargo or bun test red
pr_skipped_by_policy green tests, but policy.shouldOpenPR said no
pr_failed Gitea API or git push errored

Mem0-aligned apply semantics

apply.ts categorizes every proposed file into one of three modes:

Mode Trigger Action
ADD is_new: true and file doesn't exist Create the file
UPDATE is_new: false, file exists, content differs from current Overwrite
NOOP is_new: false, file exists, content matches current exactly Skip (no write, no diff)

If every file in the proposal is NOOP, the cycle short-circuits to cycle_noop before running tests or opening a PR. Mismatched shapes (is_new:true but file exists, or is_new:false but file missing) become apply_failed — model state confusion is surfaced, not papered over.

PR bodies report the three counts separately so reviewers see what actually changed vs. what was confirmed identical.

How the bot compounds over cycles

Every finished cycle persists a CycleResult to data/_bot/cycles/{cycle_id}.json. At the start of the next cycle, bot/kb.ts::loadHistory(gap_id) scans that directory, filters to prior cycles on the same gap, and returns the five most recent outcomes (ok, tests_failed, proposal_rejected, cycle_noop, etc.) with their reasons and touched files.

Those outcomes are summarized into a compact block and injected into the cloud prompt before asking for a new proposal. The model sees things like:

Prior attempts on this gap (3 most recent):
- 2026-04-22 03:15 UTC — tests_failed
    reason: cargo test -p vectord::lance failed on field_type_coerce
- 2026-04-22 02:48 UTC — proposal_rejected
    reason: touched forbidden path (docs/ADR-): docs/ADR-019-update.md
- 2026-04-22 01:23 UTC — ok PR: https://git.agentview.dev/profit/lakehouse/pulls/142
    reason: PR #142 opened

Learn from these: build on what worked, avoid paths that failed.

This is the same compounding pattern scenario.ts uses via kb.loadRecommendation — the bot's cycles are the bot's memory. No embedding, no separate jsonl, no cross-run orchestration required. First cycle on a new gap skips the block cleanly.

The observer events POSTed on every cycle carry the same data, so GET :3800/stats?source=bot aggregates "% of bot cycles on similar gaps that landed a PR" without extra plumbing.

Where YOU edit

bot/policy.ts — four small functions that define what the bot does. The rest (propose.ts, apply.ts, test.ts, pr.ts, cycle.ts, kb.ts) is mechanical orchestration — you shouldn't need to touch it unless a pipeline changes.

One policy upgrade the history opens up: in shouldRunCycle, you can now bail out if the same gap has failed N times in a row. Example addition:

import { loadHistory, statsFor } from "./kb.ts";
// inside shouldRunCycle — but you'd need the gap, which is picked later.
// A more natural place is scoreProposal: if prior N failures on this
// gap's path+summary pattern, reject before testing.

Hard-coded guardrails (not in policy — can't be disabled)

  • Bot never deletes files (apply.ts has no delete path).
  • Bot never touches .git/, secrets, lakehouse.toml, docs/ADR-*, docs/DECISIONS.md, docs/PRD.md, /etc/, /root/, Cargo.lock.
  • All paths validated for traversal (path/../) and repo-escape.
  • PRs always open as draft — never auto-merge.
  • Budget check is before the policy function runs — no way to override the daily cap from policy.ts.