REPLICATION.md — Debian 13 clean install + cloud-only adaptation
Captures everything needed to stand this architecture up on a fresh Debian 13 box with NO local AI (cloud-only via OpenRouter for generation + OpenAI/Voyage/Cohere for embeddings). Includes: - Required external accounts (OpenRouter, OpenAI for embeddings, MinIO, Postgres+pgvector, optional Langfuse) - The cloud-only embedding decision (nomic-embed-text via local Ollama is the one piece that MUST be swapped — recommended OpenAI text-embedding-3-small as the default cloud path) - System packages, toolchains (Rust + Bun), Postgres setup - All required env vars for gateway, sidecar, observer - Configuration files (lakehouse.toml, providers.toml, secrets.toml) - systemd unit for the gateway - Validation steps (curl probes for gateway, sidecar, observer, /v1/chat through OpenRouter, embedding round-trip, vectorize a small corpus, run the agent test) - Exact code spots to modify for cloud-only port (5 files, none fundamental — Phase 39 ProviderAdapter makes this provider-agnostic by design) Heavy test data (.parquet files ~470 MB) deliberately excluded from this snapshot — REPLICATION.md documents how to regenerate via the dump_raw_corpus + vectorize_raw_corpus scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
ac01fffd9a
commit
42448c7db5
281
REPLICATION.md
Normal file
281
REPLICATION.md
Normal file
@ -0,0 +1,281 @@
|
||||
# Replication Guide — Debian 13 Clean Install
|
||||
|
||||
This snapshot validates a matrix-driven agent loop with Mem0 versioning.
|
||||
This guide gets a clean Debian 13 box running the architecture **with cloud-only LLMs (no local Ollama)** as the J-stated default.
|
||||
|
||||
---
|
||||
|
||||
## Required external accounts (cloud-only)
|
||||
|
||||
- **OpenRouter** — primary LLM gateway. Sign up, generate API key, top up credits. Models we use:
|
||||
- `x-ai/grok-4.1-fast` ($0.20/$0.50 per M tokens, 2M ctx) — primary scrum + observer review
|
||||
- `deepseek/deepseek-v4-flash` ($0.14/$0.28 per M, 1M ctx) — fallback
|
||||
- `qwen/qwen3-235b-a22b-2507` ($0.07/$0.10 per M, 262K ctx) — last fallback
|
||||
- `moonshotai/kimi-k2.6` ($0.74/$4.66 per M, 256K ctx) — meta-overseer with 25/hr rate cap
|
||||
- **MinIO or any S3** — raw bucket for test corpora. Self-host MinIO via Docker, or use AWS/Backblaze/Wasabi/etc.
|
||||
- **Postgres + pgvector** — for `knowledge_base` DB (LLM Team history) and future pgvector backend
|
||||
- **Langfuse** (optional) — observability. Self-host via Docker; the `langfuse_bridge.ts` forwards traces to observer
|
||||
- **Gitea** (optional) — for the auditor service
|
||||
|
||||
## Cloud-only embedding decision
|
||||
|
||||
**Critical gap:** the current code uses `nomic-embed-text` via local Ollama for ALL vector indexing + matrix retrieval. This MUST be swapped on a cloud-only box. Options:
|
||||
1. **OpenAI `text-embedding-3-small`** — $0.02/1M tokens, 1536 dim. Modify `crates/aibridge/src/client.rs::embed` to call OpenAI directly when `EMBED_PROVIDER=openai`.
|
||||
2. **Cohere `embed-english-v3.0`** — 1024 dim, available via OpenRouter
|
||||
3. **Voyage AI `voyage-3-lite`** — 512 dim, very cheap
|
||||
4. **Run a small embedding model locally without Ollama** — `sentence-transformers/all-MiniLM-L6-v2` via Python, ~100MB, no GPU needed
|
||||
|
||||
For replication, recommend option 1 (OpenAI direct). It requires:
|
||||
- An `OPENAI_API_KEY` env var
|
||||
- One-line modification in `sidecar/sidecar/embed.py` to call OpenAI when env set
|
||||
- All 6 vector indexes will need rebuilding (existing pre-built ones used 768-dim nomic; OpenAI is 1536-dim — incompatible)
|
||||
|
||||
---
|
||||
|
||||
## System packages (Debian 13)
|
||||
|
||||
```bash
|
||||
sudo apt update
|
||||
sudo apt install -y \
|
||||
build-essential pkg-config libssl-dev \
|
||||
postgresql-17 postgresql-17-pgvector \
|
||||
python3 python3-pip python3-venv \
|
||||
curl git nginx \
|
||||
redis-server # optional, for caching
|
||||
|
||||
# MinIO via Docker
|
||||
sudo apt install -y docker.io
|
||||
docker run -d --name minio \
|
||||
-p 9000:9000 -p 9001:9001 \
|
||||
-e MINIO_ROOT_USER=minioadmin \
|
||||
-e MINIO_ROOT_PASSWORD=minioadmin \
|
||||
-v /var/lib/minio:/data \
|
||||
quay.io/minio/minio server /data --console-address ":9001"
|
||||
|
||||
# mc (MinIO client)
|
||||
curl -fsSL https://dl.min.io/client/mc/release/linux-amd64/mc -o /usr/local/bin/mc
|
||||
chmod +x /usr/local/bin/mc
|
||||
mc alias set local http://localhost:9000 minioadmin minioadmin
|
||||
```
|
||||
|
||||
## Toolchains
|
||||
|
||||
```bash
|
||||
# Rust (gateway compile)
|
||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal
|
||||
source $HOME/.cargo/env
|
||||
rustup default stable
|
||||
|
||||
# Bun (TypeScript runtime — agent harness, MCP server, scripts)
|
||||
curl -fsSL https://bun.sh/install | bash
|
||||
export PATH="$HOME/.bun/bin:$PATH"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Postgres setup
|
||||
|
||||
```bash
|
||||
sudo systemctl enable --now postgresql
|
||||
sudo -u postgres psql <<EOF
|
||||
CREATE DATABASE knowledge_base;
|
||||
\c knowledge_base
|
||||
CREATE EXTENSION IF NOT EXISTS vector;
|
||||
EOF
|
||||
```
|
||||
|
||||
The `knowledge_base` DB stores LLM team history. Schema in `/root/llm-team-ui/schema.sql` (not in this repo — install that separately if you want LLM Team UI).
|
||||
|
||||
---
|
||||
|
||||
## Required environment variables
|
||||
|
||||
### Gateway (`/etc/systemd/system/lakehouse.service` Environment= or systemd EnvironmentFile)
|
||||
|
||||
```bash
|
||||
RUST_LOG=info
|
||||
# S3 / MinIO
|
||||
AWS_ACCESS_KEY_ID=minioadmin
|
||||
AWS_SECRET_ACCESS_KEY=minioadmin
|
||||
AWS_ENDPOINT=http://localhost:9000
|
||||
AWS_ALLOW_HTTP=true
|
||||
AWS_DEFAULT_REGION=us-east-1
|
||||
# OpenRouter (primary LLM gateway — used by aibridge)
|
||||
OPENROUTER_API_KEY=sk-or-v1-...
|
||||
# OpenAI (for embeddings — cloud-only path)
|
||||
OPENAI_API_KEY=sk-...
|
||||
EMBED_PROVIDER=openai # custom env we add for cloud-only mode
|
||||
# Postgres for LLM team history dump
|
||||
DATABASE_URL=postgres://postgres@localhost/knowledge_base
|
||||
```
|
||||
|
||||
`EnvironmentFile=-/etc/lakehouse/langfuse.env` (optional):
|
||||
|
||||
```bash
|
||||
LANGFUSE_URL=http://localhost:3001
|
||||
LANGFUSE_PUBLIC_KEY=pk-lf-...
|
||||
LANGFUSE_SECRET_KEY=sk-lf-...
|
||||
```
|
||||
|
||||
### `/etc/lakehouse/secrets.toml` (chmod 600)
|
||||
|
||||
```toml
|
||||
[minio-lakehouse]
|
||||
access_key = "minioadmin"
|
||||
secret_key = "minioadmin"
|
||||
```
|
||||
|
||||
### `~/.env` (used by some Bun scripts as fallback for OpenRouter key)
|
||||
|
||||
```bash
|
||||
OPENROUTER_API_KEY=sk-or-v1-...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration files
|
||||
|
||||
All config files are in the repo at `lakehouse.toml` and `config/`. Edit `lakehouse.toml`:
|
||||
|
||||
- `[gateway]` — `host`, `port` (default 3100)
|
||||
- `[storage.buckets]` — at least `primary` (local) + `s3:lakehouse` (MinIO)
|
||||
- `[ai]` — `embed_model = "text-embedding-3-small"` (CLOUD), `gen_model = "x-ai/grok-4.1-fast"`
|
||||
- `[agent]` — autotune defaults
|
||||
|
||||
`config/providers.toml`:
|
||||
|
||||
```toml
|
||||
[[provider]]
|
||||
name = "openrouter"
|
||||
base_url = "https://openrouter.ai/api/v1"
|
||||
auth = "bearer"
|
||||
auth_env = "OPENROUTER_API_KEY"
|
||||
default_model = "x-ai/grok-4.1-fast"
|
||||
|
||||
# DROP these for cloud-only:
|
||||
# [[provider]] name = "ollama" # — no local Ollama
|
||||
# [[provider]] name = "ollama_cloud" # — keep only if you have an Ollama Cloud account
|
||||
```
|
||||
|
||||
`config/routing.toml` — leave as-is, all model-prefix routes auto-resolve.
|
||||
|
||||
---
|
||||
|
||||
## Build + start the gateway
|
||||
|
||||
```bash
|
||||
cd /home/profit/matrix-agent-validated
|
||||
cargo build --release -p gateway
|
||||
sudo cp target/release/gateway /usr/local/bin/lakehouse-gateway
|
||||
|
||||
# Systemd unit (place at /etc/systemd/system/lakehouse.service)
|
||||
cat > /etc/systemd/system/lakehouse.service << 'EOF'
|
||||
[Unit]
|
||||
Description=Lakehouse Gateway
|
||||
After=network.target postgresql.service
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=lakehouse
|
||||
WorkingDirectory=/home/profit/matrix-agent-validated
|
||||
ExecStart=/usr/local/bin/lakehouse-gateway
|
||||
Environment=RUST_LOG=info
|
||||
Environment=AWS_ACCESS_KEY_ID=minioadmin
|
||||
Environment=AWS_SECRET_ACCESS_KEY=minioadmin
|
||||
Environment=AWS_ENDPOINT=http://localhost:9000
|
||||
Environment=AWS_ALLOW_HTTP=true
|
||||
Environment=AWS_DEFAULT_REGION=us-east-1
|
||||
Environment=OPENROUTER_API_KEY=sk-or-v1-...
|
||||
Environment=OPENAI_API_KEY=sk-...
|
||||
Environment=EMBED_PROVIDER=openai
|
||||
EnvironmentFile=-/etc/lakehouse/langfuse.env
|
||||
Restart=on-failure
|
||||
RestartSec=5
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now lakehouse.service
|
||||
curl http://localhost:3100/health # → "lakehouse ok"
|
||||
```
|
||||
|
||||
## Sidecar (Python) — required for embed proxy
|
||||
|
||||
```bash
|
||||
cd sidecar
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install -r requirements.txt # fastapi + uvicorn + httpx
|
||||
# Modify sidecar/sidecar/embed.py to call OpenAI when EMBED_PROVIDER=openai
|
||||
uvicorn sidecar.main:app --host 0.0.0.0 --port 3200 &
|
||||
```
|
||||
|
||||
Or systemd unit `lakehouse-sidecar.service`.
|
||||
|
||||
## Observer (Bun) — required for agent hand-review
|
||||
|
||||
```bash
|
||||
cd mcp-server
|
||||
bun install
|
||||
OBSERVER_PORT=3800 bun run observer.ts &
|
||||
```
|
||||
|
||||
Or systemd unit.
|
||||
|
||||
## Optional services
|
||||
|
||||
- `lakehouse-langfuse-bridge.service` — only if you set up Langfuse
|
||||
- `lakehouse-observer.service` — same as `bun run observer.ts` but as systemd
|
||||
- LLM Team UI on :5000 — only if you want the human review UI
|
||||
|
||||
---
|
||||
|
||||
## Validate replication
|
||||
|
||||
```bash
|
||||
# 1. Health checks
|
||||
curl http://localhost:3100/health # gateway
|
||||
curl http://localhost:3200/health # sidecar
|
||||
curl http://localhost:3800/health # observer
|
||||
|
||||
# 2. Cloud LLM works through gateway
|
||||
curl -X POST http://localhost:3100/v1/chat -H "Content-Type: application/json" \
|
||||
-d '{"provider":"openrouter","model":"x-ai/grok-4.1-fast","messages":[{"role":"user","content":"reply OK"}]}'
|
||||
|
||||
# 3. Embedding works (cloud)
|
||||
curl -X POST http://localhost:3200/embed -H "Content-Type: application/json" \
|
||||
-d '{"texts":["hello world"]}'
|
||||
|
||||
# 4. Vectorize a tiny corpus (raw bucket setup)
|
||||
bash scripts/dump_raw_corpus.sh # populates s3://raw/...
|
||||
bun run scripts/vectorize_raw_corpus.ts chicago entities sec
|
||||
|
||||
# 5. Run the agent test
|
||||
bun run tests/agent_test/agent_harness.ts
|
||||
```
|
||||
|
||||
If all pass, you have the validated architecture running cloud-only.
|
||||
|
||||
---
|
||||
|
||||
## What's NOT included in this snapshot
|
||||
|
||||
- **Heavy test data** (.parquet datasets, vector indexes ~470 MB) — gitignored. Regen via `scripts/dump_raw_corpus.sh` + `vectorize_raw_corpus.ts`.
|
||||
- **Full lakehouse history** — full repo at `https://git.agentview.dev/profit/lakehouse`
|
||||
- **Existing pre-built vector indexes** — would be incompatible across embedding models anyway
|
||||
|
||||
## Known cloud-only adjustments needed
|
||||
|
||||
These are the exact code spots to modify when porting to a no-local-LLM environment. None are fundamental — just provider swaps:
|
||||
|
||||
| File | Change |
|
||||
|---|---|
|
||||
| `sidecar/sidecar/embed.py` | Add OpenAI/Voyage path when `EMBED_PROVIDER` is set |
|
||||
| `crates/aibridge/src/providers/ollama.rs` | Mark unhealthy if `OLLAMA_DISABLED=1` set |
|
||||
| `tests/agent_test/agent_harness.ts` | Change `AGENT_MODEL` default from `qwen3.5:latest` to `openrouter/x-ai/grok-4.1-fast`; route via `/v1/chat` not `/generate` |
|
||||
| `config/providers.toml` | Comment out `ollama` and `ollama_cloud` provider blocks |
|
||||
| `lakehouse.toml` `[ai]` | `embed_model`, `gen_model` to cloud variants |
|
||||
|
||||
After these changes, `cargo check` should still pass. The architecture is provider-agnostic by design (Phase 39 ProviderAdapter trait); the cloud-only path just unwires Ollama and wires a cloud embedding source.
|
||||
Loading…
x
Reference in New Issue
Block a user