From 42448c7db510d54ba8387732c21224ed4d43e2d9 Mon Sep 17 00:00:00 2001 From: profit Date: Sat, 25 Apr 2026 19:44:46 -0500 Subject: [PATCH] =?UTF-8?q?REPLICATION.md=20=E2=80=94=20Debian=2013=20clea?= =?UTF-8?q?n=20install=20+=20cloud-only=20adaptation?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Captures everything needed to stand this architecture up on a fresh Debian 13 box with NO local AI (cloud-only via OpenRouter for generation + OpenAI/Voyage/Cohere for embeddings). Includes: - Required external accounts (OpenRouter, OpenAI for embeddings, MinIO, Postgres+pgvector, optional Langfuse) - The cloud-only embedding decision (nomic-embed-text via local Ollama is the one piece that MUST be swapped — recommended OpenAI text-embedding-3-small as the default cloud path) - System packages, toolchains (Rust + Bun), Postgres setup - All required env vars for gateway, sidecar, observer - Configuration files (lakehouse.toml, providers.toml, secrets.toml) - systemd unit for the gateway - Validation steps (curl probes for gateway, sidecar, observer, /v1/chat through OpenRouter, embedding round-trip, vectorize a small corpus, run the agent test) - Exact code spots to modify for cloud-only port (5 files, none fundamental — Phase 39 ProviderAdapter makes this provider-agnostic by design) Heavy test data (.parquet files ~470 MB) deliberately excluded from this snapshot — REPLICATION.md documents how to regenerate via the dump_raw_corpus + vectorize_raw_corpus scripts. Co-Authored-By: Claude Opus 4.7 (1M context) --- REPLICATION.md | 281 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 281 insertions(+) create mode 100644 REPLICATION.md diff --git a/REPLICATION.md b/REPLICATION.md new file mode 100644 index 0000000..a11e119 --- /dev/null +++ b/REPLICATION.md @@ -0,0 +1,281 @@ +# Replication Guide — Debian 13 Clean Install + +This snapshot validates a matrix-driven agent loop with Mem0 versioning. +This guide gets a clean Debian 13 box running the architecture **with cloud-only LLMs (no local Ollama)** as the J-stated default. + +--- + +## Required external accounts (cloud-only) + +- **OpenRouter** — primary LLM gateway. Sign up, generate API key, top up credits. Models we use: + - `x-ai/grok-4.1-fast` ($0.20/$0.50 per M tokens, 2M ctx) — primary scrum + observer review + - `deepseek/deepseek-v4-flash` ($0.14/$0.28 per M, 1M ctx) — fallback + - `qwen/qwen3-235b-a22b-2507` ($0.07/$0.10 per M, 262K ctx) — last fallback + - `moonshotai/kimi-k2.6` ($0.74/$4.66 per M, 256K ctx) — meta-overseer with 25/hr rate cap +- **MinIO or any S3** — raw bucket for test corpora. Self-host MinIO via Docker, or use AWS/Backblaze/Wasabi/etc. +- **Postgres + pgvector** — for `knowledge_base` DB (LLM Team history) and future pgvector backend +- **Langfuse** (optional) — observability. Self-host via Docker; the `langfuse_bridge.ts` forwards traces to observer +- **Gitea** (optional) — for the auditor service + +## Cloud-only embedding decision + +**Critical gap:** the current code uses `nomic-embed-text` via local Ollama for ALL vector indexing + matrix retrieval. This MUST be swapped on a cloud-only box. Options: +1. **OpenAI `text-embedding-3-small`** — $0.02/1M tokens, 1536 dim. Modify `crates/aibridge/src/client.rs::embed` to call OpenAI directly when `EMBED_PROVIDER=openai`. +2. **Cohere `embed-english-v3.0`** — 1024 dim, available via OpenRouter +3. **Voyage AI `voyage-3-lite`** — 512 dim, very cheap +4. **Run a small embedding model locally without Ollama** — `sentence-transformers/all-MiniLM-L6-v2` via Python, ~100MB, no GPU needed + +For replication, recommend option 1 (OpenAI direct). It requires: +- An `OPENAI_API_KEY` env var +- One-line modification in `sidecar/sidecar/embed.py` to call OpenAI when env set +- All 6 vector indexes will need rebuilding (existing pre-built ones used 768-dim nomic; OpenAI is 1536-dim — incompatible) + +--- + +## System packages (Debian 13) + +```bash +sudo apt update +sudo apt install -y \ + build-essential pkg-config libssl-dev \ + postgresql-17 postgresql-17-pgvector \ + python3 python3-pip python3-venv \ + curl git nginx \ + redis-server # optional, for caching + +# MinIO via Docker +sudo apt install -y docker.io +docker run -d --name minio \ + -p 9000:9000 -p 9001:9001 \ + -e MINIO_ROOT_USER=minioadmin \ + -e MINIO_ROOT_PASSWORD=minioadmin \ + -v /var/lib/minio:/data \ + quay.io/minio/minio server /data --console-address ":9001" + +# mc (MinIO client) +curl -fsSL https://dl.min.io/client/mc/release/linux-amd64/mc -o /usr/local/bin/mc +chmod +x /usr/local/bin/mc +mc alias set local http://localhost:9000 minioadmin minioadmin +``` + +## Toolchains + +```bash +# Rust (gateway compile) +curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal +source $HOME/.cargo/env +rustup default stable + +# Bun (TypeScript runtime — agent harness, MCP server, scripts) +curl -fsSL https://bun.sh/install | bash +export PATH="$HOME/.bun/bin:$PATH" +``` + +--- + +## Postgres setup + +```bash +sudo systemctl enable --now postgresql +sudo -u postgres psql < /etc/systemd/system/lakehouse.service << 'EOF' +[Unit] +Description=Lakehouse Gateway +After=network.target postgresql.service + +[Service] +Type=simple +User=lakehouse +WorkingDirectory=/home/profit/matrix-agent-validated +ExecStart=/usr/local/bin/lakehouse-gateway +Environment=RUST_LOG=info +Environment=AWS_ACCESS_KEY_ID=minioadmin +Environment=AWS_SECRET_ACCESS_KEY=minioadmin +Environment=AWS_ENDPOINT=http://localhost:9000 +Environment=AWS_ALLOW_HTTP=true +Environment=AWS_DEFAULT_REGION=us-east-1 +Environment=OPENROUTER_API_KEY=sk-or-v1-... +Environment=OPENAI_API_KEY=sk-... +Environment=EMBED_PROVIDER=openai +EnvironmentFile=-/etc/lakehouse/langfuse.env +Restart=on-failure +RestartSec=5 + +[Install] +WantedBy=multi-user.target +EOF + +sudo systemctl daemon-reload +sudo systemctl enable --now lakehouse.service +curl http://localhost:3100/health # → "lakehouse ok" +``` + +## Sidecar (Python) — required for embed proxy + +```bash +cd sidecar +python3 -m venv .venv +source .venv/bin/activate +pip install -r requirements.txt # fastapi + uvicorn + httpx +# Modify sidecar/sidecar/embed.py to call OpenAI when EMBED_PROVIDER=openai +uvicorn sidecar.main:app --host 0.0.0.0 --port 3200 & +``` + +Or systemd unit `lakehouse-sidecar.service`. + +## Observer (Bun) — required for agent hand-review + +```bash +cd mcp-server +bun install +OBSERVER_PORT=3800 bun run observer.ts & +``` + +Or systemd unit. + +## Optional services + +- `lakehouse-langfuse-bridge.service` — only if you set up Langfuse +- `lakehouse-observer.service` — same as `bun run observer.ts` but as systemd +- LLM Team UI on :5000 — only if you want the human review UI + +--- + +## Validate replication + +```bash +# 1. Health checks +curl http://localhost:3100/health # gateway +curl http://localhost:3200/health # sidecar +curl http://localhost:3800/health # observer + +# 2. Cloud LLM works through gateway +curl -X POST http://localhost:3100/v1/chat -H "Content-Type: application/json" \ + -d '{"provider":"openrouter","model":"x-ai/grok-4.1-fast","messages":[{"role":"user","content":"reply OK"}]}' + +# 3. Embedding works (cloud) +curl -X POST http://localhost:3200/embed -H "Content-Type: application/json" \ + -d '{"texts":["hello world"]}' + +# 4. Vectorize a tiny corpus (raw bucket setup) +bash scripts/dump_raw_corpus.sh # populates s3://raw/... +bun run scripts/vectorize_raw_corpus.ts chicago entities sec + +# 5. Run the agent test +bun run tests/agent_test/agent_harness.ts +``` + +If all pass, you have the validated architecture running cloud-only. + +--- + +## What's NOT included in this snapshot + +- **Heavy test data** (.parquet datasets, vector indexes ~470 MB) — gitignored. Regen via `scripts/dump_raw_corpus.sh` + `vectorize_raw_corpus.ts`. +- **Full lakehouse history** — full repo at `https://git.agentview.dev/profit/lakehouse` +- **Existing pre-built vector indexes** — would be incompatible across embedding models anyway + +## Known cloud-only adjustments needed + +These are the exact code spots to modify when porting to a no-local-LLM environment. None are fundamental — just provider swaps: + +| File | Change | +|---|---| +| `sidecar/sidecar/embed.py` | Add OpenAI/Voyage path when `EMBED_PROVIDER` is set | +| `crates/aibridge/src/providers/ollama.rs` | Mark unhealthy if `OLLAMA_DISABLED=1` set | +| `tests/agent_test/agent_harness.ts` | Change `AGENT_MODEL` default from `qwen3.5:latest` to `openrouter/x-ai/grok-4.1-fast`; route via `/v1/chat` not `/generate` | +| `config/providers.toml` | Comment out `ollama` and `ollama_cloud` provider blocks | +| `lakehouse.toml` `[ai]` | `embed_model`, `gen_model` to cloud variants | + +After these changes, `cargo check` should still pass. The architecture is provider-agnostic by design (Phase 39 ProviderAdapter trait); the cloud-only path just unwires Ollama and wires a cloud embedding source.