profit 42448c7db5 REPLICATION.md — Debian 13 clean install + cloud-only adaptation

Captures everything needed to stand this architecture up on a fresh
Debian 13 box with NO local AI (cloud-only via OpenRouter for
generation + OpenAI/Voyage/Cohere for embeddings).

Includes:
- Required external accounts (OpenRouter, OpenAI for embeddings,
  MinIO, Postgres+pgvector, optional Langfuse)
- The cloud-only embedding decision (nomic-embed-text via local
  Ollama is the one piece that MUST be swapped — recommended OpenAI
  text-embedding-3-small as the default cloud path)
- System packages, toolchains (Rust + Bun), Postgres setup
- All required env vars for gateway, sidecar, observer
- Configuration files (lakehouse.toml, providers.toml, secrets.toml)
- systemd unit for the gateway
- Validation steps (curl probes for gateway, sidecar, observer,
  /v1/chat through OpenRouter, embedding round-trip, vectorize a
  small corpus, run the agent test)
- Exact code spots to modify for cloud-only port (5 files, none
  fundamental — Phase 39 ProviderAdapter makes this provider-agnostic
  by design)

Heavy test data (.parquet files ~470 MB) deliberately excluded from
this snapshot — REPLICATION.md documents how to regenerate via the
dump_raw_corpus + vectorize_raw_corpus scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-25 19:44:46 -05:00

9.3 KiB

Raw Blame History

Replication Guide — Debian 13 Clean Install

This snapshot validates a matrix-driven agent loop with Mem0 versioning. This guide gets a clean Debian 13 box running the architecture with cloud-only LLMs (no local Ollama) as the J-stated default.

Required external accounts (cloud-only)

OpenRouter — primary LLM gateway. Sign up, generate API key, top up credits. Models we use:
- x-ai/grok-4.1-fast ($0.20/$0.50 per M tokens, 2M ctx) — primary scrum + observer review
- deepseek/deepseek-v4-flash ($0.14/$0.28 per M, 1M ctx) — fallback
- qwen/qwen3-235b-a22b-2507 ($0.07/$0.10 per M, 262K ctx) — last fallback
- moonshotai/kimi-k2.6 ($0.74/$4.66 per M, 256K ctx) — meta-overseer with 25/hr rate cap
MinIO or any S3 — raw bucket for test corpora. Self-host MinIO via Docker, or use AWS/Backblaze/Wasabi/etc.
Postgres + pgvector — for knowledge_base DB (LLM Team history) and future pgvector backend
Langfuse (optional) — observability. Self-host via Docker; the langfuse_bridge.ts forwards traces to observer
Gitea (optional) — for the auditor service

Cloud-only embedding decision

Critical gap: the current code uses nomic-embed-text via local Ollama for ALL vector indexing + matrix retrieval. This MUST be swapped on a cloud-only box. Options:

OpenAI text-embedding-3-small — $0.02/1M tokens, 1536 dim. Modify crates/aibridge/src/client.rs::embed to call OpenAI directly when EMBED_PROVIDER=openai.
Cohere embed-english-v3.0 — 1024 dim, available via OpenRouter
Voyage AI voyage-3-lite — 512 dim, very cheap
Run a small embedding model locally without Ollama — sentence-transformers/all-MiniLM-L6-v2 via Python, ~100MB, no GPU needed

For replication, recommend option 1 (OpenAI direct). It requires:

An OPENAI_API_KEY env var
One-line modification in sidecar/sidecar/embed.py to call OpenAI when env set
All 6 vector indexes will need rebuilding (existing pre-built ones used 768-dim nomic; OpenAI is 1536-dim — incompatible)

System packages (Debian 13)

sudo apt update
sudo apt install -y \
  build-essential pkg-config libssl-dev \
  postgresql-17 postgresql-17-pgvector \
  python3 python3-pip python3-venv \
  curl git nginx \
  redis-server  # optional, for caching

# MinIO via Docker
sudo apt install -y docker.io
docker run -d --name minio \
  -p 9000:9000 -p 9001:9001 \
  -e MINIO_ROOT_USER=minioadmin \
  -e MINIO_ROOT_PASSWORD=minioadmin \
  -v /var/lib/minio:/data \
  quay.io/minio/minio server /data --console-address ":9001"

# mc (MinIO client)
curl -fsSL https://dl.min.io/client/mc/release/linux-amd64/mc -o /usr/local/bin/mc
chmod +x /usr/local/bin/mc
mc alias set local http://localhost:9000 minioadmin minioadmin

Toolchains

# Rust (gateway compile)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal
source $HOME/.cargo/env
rustup default stable

# Bun (TypeScript runtime — agent harness, MCP server, scripts)
curl -fsSL https://bun.sh/install | bash
export PATH="$HOME/.bun/bin:$PATH"

Postgres setup

sudo systemctl enable --now postgresql
sudo -u postgres psql <<EOF
CREATE DATABASE knowledge_base;
\c knowledge_base
CREATE EXTENSION IF NOT EXISTS vector;
EOF

The knowledge_base DB stores LLM team history. Schema in /root/llm-team-ui/schema.sql (not in this repo — install that separately if you want LLM Team UI).

Required environment variables

Gateway (`/etc/systemd/system/lakehouse.service` Environment= or systemd EnvironmentFile)

RUST_LOG=info
# S3 / MinIO
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin
AWS_ENDPOINT=http://localhost:9000
AWS_ALLOW_HTTP=true
AWS_DEFAULT_REGION=us-east-1
# OpenRouter (primary LLM gateway — used by aibridge)
OPENROUTER_API_KEY=sk-or-v1-...
# OpenAI (for embeddings — cloud-only path)
OPENAI_API_KEY=sk-...
EMBED_PROVIDER=openai            # custom env we add for cloud-only mode
# Postgres for LLM team history dump
DATABASE_URL=postgres://postgres@localhost/knowledge_base

EnvironmentFile=-/etc/lakehouse/langfuse.env (optional):

LANGFUSE_URL=http://localhost:3001
LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...

`/etc/lakehouse/secrets.toml` (chmod 600)

[minio-lakehouse]
access_key = "minioadmin"
secret_key = "minioadmin"

`~/.env` (used by some Bun scripts as fallback for OpenRouter key)

OPENROUTER_API_KEY=sk-or-v1-...

Configuration files

All config files are in the repo at lakehouse.toml and config/. Edit lakehouse.toml:

[gateway] — host, port (default 3100)
[storage.buckets] — at least primary (local) + s3:lakehouse (MinIO)
[ai] — embed_model = "text-embedding-3-small" (CLOUD), gen_model = "x-ai/grok-4.1-fast"
[agent] — autotune defaults

config/providers.toml:

[[provider]]
name = "openrouter"
base_url = "https://openrouter.ai/api/v1"
auth = "bearer"
auth_env = "OPENROUTER_API_KEY"
default_model = "x-ai/grok-4.1-fast"

# DROP these for cloud-only:
# [[provider]] name = "ollama"          # — no local Ollama
# [[provider]] name = "ollama_cloud"    # — keep only if you have an Ollama Cloud account

config/routing.toml — leave as-is, all model-prefix routes auto-resolve.

Build + start the gateway

cd /home/profit/matrix-agent-validated
cargo build --release -p gateway
sudo cp target/release/gateway /usr/local/bin/lakehouse-gateway

# Systemd unit (place at /etc/systemd/system/lakehouse.service)
cat > /etc/systemd/system/lakehouse.service << 'EOF'
[Unit]
Description=Lakehouse Gateway
After=network.target postgresql.service

[Service]
Type=simple
User=lakehouse
WorkingDirectory=/home/profit/matrix-agent-validated
ExecStart=/usr/local/bin/lakehouse-gateway
Environment=RUST_LOG=info
Environment=AWS_ACCESS_KEY_ID=minioadmin
Environment=AWS_SECRET_ACCESS_KEY=minioadmin
Environment=AWS_ENDPOINT=http://localhost:9000
Environment=AWS_ALLOW_HTTP=true
Environment=AWS_DEFAULT_REGION=us-east-1
Environment=OPENROUTER_API_KEY=sk-or-v1-...
Environment=OPENAI_API_KEY=sk-...
Environment=EMBED_PROVIDER=openai
EnvironmentFile=-/etc/lakehouse/langfuse.env
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now lakehouse.service
curl http://localhost:3100/health   # → "lakehouse ok"

Sidecar (Python) — required for embed proxy

cd sidecar
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt   # fastapi + uvicorn + httpx
# Modify sidecar/sidecar/embed.py to call OpenAI when EMBED_PROVIDER=openai
uvicorn sidecar.main:app --host 0.0.0.0 --port 3200 &

Or systemd unit lakehouse-sidecar.service.

Observer (Bun) — required for agent hand-review

cd mcp-server
bun install
OBSERVER_PORT=3800 bun run observer.ts &

Or systemd unit.

Optional services

lakehouse-langfuse-bridge.service — only if you set up Langfuse
lakehouse-observer.service — same as bun run observer.ts but as systemd
LLM Team UI on :5000 — only if you want the human review UI

Validate replication

# 1. Health checks
curl http://localhost:3100/health           # gateway
curl http://localhost:3200/health           # sidecar
curl http://localhost:3800/health           # observer

# 2. Cloud LLM works through gateway
curl -X POST http://localhost:3100/v1/chat -H "Content-Type: application/json" \
  -d '{"provider":"openrouter","model":"x-ai/grok-4.1-fast","messages":[{"role":"user","content":"reply OK"}]}'

# 3. Embedding works (cloud)
curl -X POST http://localhost:3200/embed -H "Content-Type: application/json" \
  -d '{"texts":["hello world"]}'

# 4. Vectorize a tiny corpus (raw bucket setup)
bash scripts/dump_raw_corpus.sh             # populates s3://raw/...
bun run scripts/vectorize_raw_corpus.ts chicago entities sec

# 5. Run the agent test
bun run tests/agent_test/agent_harness.ts

If all pass, you have the validated architecture running cloud-only.

What's NOT included in this snapshot

Heavy test data (.parquet datasets, vector indexes ~470 MB) — gitignored. Regen via scripts/dump_raw_corpus.sh + vectorize_raw_corpus.ts.
Full lakehouse history — full repo at https://git.agentview.dev/profit/lakehouse
Existing pre-built vector indexes — would be incompatible across embedding models anyway

Known cloud-only adjustments needed

These are the exact code spots to modify when porting to a no-local-LLM environment. None are fundamental — just provider swaps:

File	Change
`sidecar/sidecar/embed.py`	Add OpenAI/Voyage path when `EMBED_PROVIDER` is set
`crates/aibridge/src/providers/ollama.rs`	Mark unhealthy if `OLLAMA_DISABLED=1` set
`tests/agent_test/agent_harness.ts`	Change `AGENT_MODEL` default from `qwen3.5:latest` to `openrouter/x-ai/grok-4.1-fast`; route via `/v1/chat` not `/generate`
`config/providers.toml`	Comment out `ollama` and `ollama_cloud` provider blocks
`lakehouse.toml` `[ai]`	`embed_model`, `gen_model` to cloud variants

After these changes, cargo check should still pass. The architecture is provider-agnostic by design (Phase 39 ProviderAdapter trait); the cloud-only path just unwires Ollama and wires a cloud embedding source.

9.3 KiB Raw Blame History