profit 42448c7db5 REPLICATION.md — Debian 13 clean install + cloud-only adaptation
Captures everything needed to stand this architecture up on a fresh
Debian 13 box with NO local AI (cloud-only via OpenRouter for
generation + OpenAI/Voyage/Cohere for embeddings).

Includes:
- Required external accounts (OpenRouter, OpenAI for embeddings,
  MinIO, Postgres+pgvector, optional Langfuse)
- The cloud-only embedding decision (nomic-embed-text via local
  Ollama is the one piece that MUST be swapped — recommended OpenAI
  text-embedding-3-small as the default cloud path)
- System packages, toolchains (Rust + Bun), Postgres setup
- All required env vars for gateway, sidecar, observer
- Configuration files (lakehouse.toml, providers.toml, secrets.toml)
- systemd unit for the gateway
- Validation steps (curl probes for gateway, sidecar, observer,
  /v1/chat through OpenRouter, embedding round-trip, vectorize a
  small corpus, run the agent test)
- Exact code spots to modify for cloud-only port (5 files, none
  fundamental — Phase 39 ProviderAdapter makes this provider-agnostic
  by design)

Heavy test data (.parquet files ~470 MB) deliberately excluded from
this snapshot — REPLICATION.md documents how to regenerate via the
dump_raw_corpus + vectorize_raw_corpus scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:44:46 -05:00
Description
Architectural checkpoint: matrix-driven agent loop with Mem0 versioning + deletion validated end-to-end on Chicago permit data
868 MiB
Languages
Rust 44.2%
TypeScript 30.4%
HTML 12.5%
Python 8.6%
JavaScript 1.7%
Other 2.6%