The startup health check in observer.ts:645 did fetch().then(r => r.json())
against the gateway /health endpoint, which returns text/plain "lakehouse ok"
not JSON. r.json() throws, .catch swallows to null, observer concludes
"gateway unreachable" and exits. Combined with systemd Restart=on-failure
this produced a 5-second crash loop on every boot of matrix-test.
Fix: r.ok ? r.text() : null — keeps the same null-on-failure contract for
the existing if (!health) guard while accepting the actual content type.
Sealed in pathway_memory as TypeConfusion:fetch-health-json (trace at
matrix_handover_validate|mcp-server/observer.ts|service_crash_loop) so
the matrix index preempts this on any future deploy probe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures everything needed to stand this architecture up on a fresh
Debian 13 box with NO local AI (cloud-only via OpenRouter for
generation + OpenAI/Voyage/Cohere for embeddings).
Includes:
- Required external accounts (OpenRouter, OpenAI for embeddings,
MinIO, Postgres+pgvector, optional Langfuse)
- The cloud-only embedding decision (nomic-embed-text via local
Ollama is the one piece that MUST be swapped — recommended OpenAI
text-embedding-3-small as the default cloud path)
- System packages, toolchains (Rust + Bun), Postgres setup
- All required env vars for gateway, sidecar, observer
- Configuration files (lakehouse.toml, providers.toml, secrets.toml)
- systemd unit for the gateway
- Validation steps (curl probes for gateway, sidecar, observer,
/v1/chat through OpenRouter, embedding round-trip, vectorize a
small corpus, run the agent test)
- Exact code spots to modify for cloud-only port (5 files, none
fundamental — Phase 39 ProviderAdapter makes this provider-agnostic
by design)
Heavy test data (.parquet files ~470 MB) deliberately excluded from
this snapshot — REPLICATION.md documents how to regenerate via the
dump_raw_corpus + vectorize_raw_corpus scripts.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.
WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.
WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
* UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
* REVISE: chains versions, parent.superseded_at + superseded_by stamped
* RETIRE: marks specific trace retired with reason, excluded from retrieval
* HISTORY: walks chain root→tip, cycle-safe
KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces
Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>