# Replication Guide — Debian 13 Clean Install This snapshot validates a matrix-driven agent loop with Mem0 versioning. This guide gets a clean Debian 13 box running the architecture **with cloud-only LLMs (no local Ollama)** as the J-stated default. --- ## Required external accounts (cloud-only) - **OpenRouter** — primary LLM gateway. Sign up, generate API key, top up credits. Models we use: - `x-ai/grok-4.1-fast` ($0.20/$0.50 per M tokens, 2M ctx) — primary scrum + observer review - `deepseek/deepseek-v4-flash` ($0.14/$0.28 per M, 1M ctx) — fallback - `qwen/qwen3-235b-a22b-2507` ($0.07/$0.10 per M, 262K ctx) — last fallback - `moonshotai/kimi-k2.6` ($0.74/$4.66 per M, 256K ctx) — meta-overseer with 25/hr rate cap - **MinIO or any S3** — raw bucket for test corpora. Self-host MinIO via Docker, or use AWS/Backblaze/Wasabi/etc. - **Postgres + pgvector** — for `knowledge_base` DB (LLM Team history) and future pgvector backend - **Langfuse** (optional) — observability. Self-host via Docker; the `langfuse_bridge.ts` forwards traces to observer - **Gitea** (optional) — for the auditor service ## Cloud-only embedding decision **Critical gap:** the current code uses `nomic-embed-text` via local Ollama for ALL vector indexing + matrix retrieval. This MUST be swapped on a cloud-only box. Options: 1. **OpenAI `text-embedding-3-small`** — $0.02/1M tokens, 1536 dim. Modify `crates/aibridge/src/client.rs::embed` to call OpenAI directly when `EMBED_PROVIDER=openai`. 2. **Cohere `embed-english-v3.0`** — 1024 dim, available via OpenRouter 3. **Voyage AI `voyage-3-lite`** — 512 dim, very cheap 4. **Run a small embedding model locally without Ollama** — `sentence-transformers/all-MiniLM-L6-v2` via Python, ~100MB, no GPU needed For replication, recommend option 1 (OpenAI direct). It requires: - An `OPENAI_API_KEY` env var - One-line modification in `sidecar/sidecar/embed.py` to call OpenAI when env set - All 6 vector indexes will need rebuilding (existing pre-built ones used 768-dim nomic; OpenAI is 1536-dim — incompatible) --- ## System packages (Debian 13) ```bash sudo apt update sudo apt install -y \ build-essential pkg-config libssl-dev \ postgresql-17 postgresql-17-pgvector \ python3 python3-pip python3-venv \ curl git nginx \ redis-server # optional, for caching # MinIO via Docker sudo apt install -y docker.io docker run -d --name minio \ -p 9000:9000 -p 9001:9001 \ -e MINIO_ROOT_USER=minioadmin \ -e MINIO_ROOT_PASSWORD=minioadmin \ -v /var/lib/minio:/data \ quay.io/minio/minio server /data --console-address ":9001" # mc (MinIO client) curl -fsSL https://dl.min.io/client/mc/release/linux-amd64/mc -o /usr/local/bin/mc chmod +x /usr/local/bin/mc mc alias set local http://localhost:9000 minioadmin minioadmin ``` ## Toolchains ```bash # Rust (gateway compile) curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal source $HOME/.cargo/env rustup default stable # Bun (TypeScript runtime — agent harness, MCP server, scripts) curl -fsSL https://bun.sh/install | bash export PATH="$HOME/.bun/bin:$PATH" ``` --- ## Postgres setup ```bash sudo systemctl enable --now postgresql sudo -u postgres psql < /etc/systemd/system/lakehouse.service << 'EOF' [Unit] Description=Lakehouse Gateway After=network.target postgresql.service [Service] Type=simple User=lakehouse WorkingDirectory=/home/profit/matrix-agent-validated ExecStart=/usr/local/bin/lakehouse-gateway Environment=RUST_LOG=info Environment=AWS_ACCESS_KEY_ID=minioadmin Environment=AWS_SECRET_ACCESS_KEY=minioadmin Environment=AWS_ENDPOINT=http://localhost:9000 Environment=AWS_ALLOW_HTTP=true Environment=AWS_DEFAULT_REGION=us-east-1 Environment=OPENROUTER_API_KEY=sk-or-v1-... Environment=OPENAI_API_KEY=sk-... Environment=EMBED_PROVIDER=openai EnvironmentFile=-/etc/lakehouse/langfuse.env Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target EOF sudo systemctl daemon-reload sudo systemctl enable --now lakehouse.service curl http://localhost:3100/health # → "lakehouse ok" ``` ## Sidecar (Python) — required for embed proxy ```bash cd sidecar python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt # fastapi + uvicorn + httpx # Modify sidecar/sidecar/embed.py to call OpenAI when EMBED_PROVIDER=openai uvicorn sidecar.main:app --host 0.0.0.0 --port 3200 & ``` Or systemd unit `lakehouse-sidecar.service`. ## Observer (Bun) — required for agent hand-review ```bash cd mcp-server bun install OBSERVER_PORT=3800 bun run observer.ts & ``` Or systemd unit. ## Optional services - `lakehouse-langfuse-bridge.service` — only if you set up Langfuse - `lakehouse-observer.service` — same as `bun run observer.ts` but as systemd - LLM Team UI on :5000 — only if you want the human review UI --- ## Validate replication ```bash # 1. Health checks curl http://localhost:3100/health # gateway curl http://localhost:3200/health # sidecar curl http://localhost:3800/health # observer # 2. Cloud LLM works through gateway curl -X POST http://localhost:3100/v1/chat -H "Content-Type: application/json" \ -d '{"provider":"openrouter","model":"x-ai/grok-4.1-fast","messages":[{"role":"user","content":"reply OK"}]}' # 3. Embedding works (cloud) curl -X POST http://localhost:3200/embed -H "Content-Type: application/json" \ -d '{"texts":["hello world"]}' # 4. Vectorize a tiny corpus (raw bucket setup) bash scripts/dump_raw_corpus.sh # populates s3://raw/... bun run scripts/vectorize_raw_corpus.ts chicago entities sec # 5. Run the agent test bun run tests/agent_test/agent_harness.ts ``` If all pass, you have the validated architecture running cloud-only. --- ## What's NOT included in this snapshot - **Heavy test data** (.parquet datasets, vector indexes ~470 MB) — gitignored. Regen via `scripts/dump_raw_corpus.sh` + `vectorize_raw_corpus.ts`. - **Full lakehouse history** — full repo at `https://git.agentview.dev/profit/lakehouse` - **Existing pre-built vector indexes** — would be incompatible across embedding models anyway ## Known cloud-only adjustments needed These are the exact code spots to modify when porting to a no-local-LLM environment. None are fundamental — just provider swaps: | File | Change | |---|---| | `sidecar/sidecar/embed.py` | Add OpenAI/Voyage path when `EMBED_PROVIDER` is set | | `crates/aibridge/src/providers/ollama.rs` | Mark unhealthy if `OLLAMA_DISABLED=1` set | | `tests/agent_test/agent_harness.ts` | Change `AGENT_MODEL` default from `qwen3.5:latest` to `openrouter/x-ai/grok-4.1-fast`; route via `/v1/chat` not `/generate` | | `config/providers.toml` | Comment out `ollama` and `ollama_cloud` provider blocks | | `lakehouse.toml` `[ai]` | `embed_model`, `gen_model` to cloud variants | After these changes, `cargo check` should still pass. The architecture is provider-agnostic by design (Phase 39 ProviderAdapter trait); the cloud-only path just unwires Ollama and wires a cloud embedding source.