matrix-agent-validated

2 Commits 1 Branch 0 Tags

Author	SHA1	Message	Date
profit	42448c7db5	REPLICATION.md — Debian 13 clean install + cloud-only adaptation Captures everything needed to stand this architecture up on a fresh Debian 13 box with NO local AI (cloud-only via OpenRouter for generation + OpenAI/Voyage/Cohere for embeddings). Includes: - Required external accounts (OpenRouter, OpenAI for embeddings, MinIO, Postgres+pgvector, optional Langfuse) - The cloud-only embedding decision (nomic-embed-text via local Ollama is the one piece that MUST be swapped — recommended OpenAI text-embedding-3-small as the default cloud path) - System packages, toolchains (Rust + Bun), Postgres setup - All required env vars for gateway, sidecar, observer - Configuration files (lakehouse.toml, providers.toml, secrets.toml) - systemd unit for the gateway - Validation steps (curl probes for gateway, sidecar, observer, /v1/chat through OpenRouter, embedding round-trip, vectorize a small corpus, run the agent test) - Exact code spots to modify for cloud-only port (5 files, none fundamental — Phase 39 ProviderAdapter makes this provider-agnostic by design) Heavy test data (.parquet files ~470 MB) deliberately excluded from this snapshot — REPLICATION.md documents how to regenerate via the dump_raw_corpus + vectorize_raw_corpus scripts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 19:44:46 -05:00
profit	ac01fffd9a	checkpoint: matrix-agent-validated (2026-04-25) Architectural snapshot of the lakehouse codebase at the point where the full matrix-driven agent loop with Mem0 versioning + deletion was validated end-to-end. WHAT THIS REPO IS A clean single-commit snapshot of the lakehouse code. Heavy test data (.parquet datasets, vector indexes) excluded — see REPLICATION.md for regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse. WHAT WAS PROVEN - Vector retrieval across multi-corpora matrix (chicago_permits + entity briefs + sec_tickers + distilled procedural + llm_team runs) - Observer hand-review (cloud + heuristic fallback) gating each candidate - Local-model agent loop (qwen3.5:latest) with tool use + scratchpad - Playbook seal on success → next-iter retrieval surfaces it as preamble - Mem0 versioning + deletion in pathway_memory: * UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical * REVISE: chains versions, parent.superseded_at + superseded_by stamped * RETIRE: marks specific trace retired with reason, excluded from retrieval * HISTORY: walks chain root→tip, cycle-safe KEY DIRECTORIES - crates/vectord/src/pathway_memory.rs — Mem0 ops live here - crates/vectord/src/playbook_memory.rs — original Mem0 reference - tests/agent_test/ — local-model agent harness + PRD + session archives - scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus) - scripts/vectorize_raw_corpus.ts — corpus → vector indexes - scripts/analyze_chicago_contracts.ts — real inference pipeline - scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces Replication: see REPLICATION.md for Debian 13 clean install + cloud-only adaptation (no local Ollama). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 19:43:27 -05:00

Author

SHA1

Message

Date

profit

42448c7db5

REPLICATION.md — Debian 13 clean install + cloud-only adaptation

Captures everything needed to stand this architecture up on a fresh
Debian 13 box with NO local AI (cloud-only via OpenRouter for
generation + OpenAI/Voyage/Cohere for embeddings).

Includes:
- Required external accounts (OpenRouter, OpenAI for embeddings,
  MinIO, Postgres+pgvector, optional Langfuse)
- The cloud-only embedding decision (nomic-embed-text via local
  Ollama is the one piece that MUST be swapped — recommended OpenAI
  text-embedding-3-small as the default cloud path)
- System packages, toolchains (Rust + Bun), Postgres setup
- All required env vars for gateway, sidecar, observer
- Configuration files (lakehouse.toml, providers.toml, secrets.toml)
- systemd unit for the gateway
- Validation steps (curl probes for gateway, sidecar, observer,
  /v1/chat through OpenRouter, embedding round-trip, vectorize a
  small corpus, run the agent test)
- Exact code spots to modify for cloud-only port (5 files, none
  fundamental — Phase 39 ProviderAdapter makes this provider-agnostic
  by design)

Heavy test data (.parquet files ~470 MB) deliberately excluded from
this snapshot — REPLICATION.md documents how to regenerate via the
dump_raw_corpus + vectorize_raw_corpus scripts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-25 19:44:46 -05:00

profit

ac01fffd9a

checkpoint: matrix-agent-validated (2026-04-25)

Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.

WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.

WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
  briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
    * UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
    * REVISE: chains versions, parent.superseded_at + superseded_by stamped
    * RETIRE: marks specific trace retired with reason, excluded from retrieval
    * HISTORY: walks chain root→tip, cycle-safe

KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces

Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-25 19:43:27 -05:00