profit ac01fffd9a checkpoint: matrix-agent-validated (2026-04-25)
Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.

WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.

WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
  briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
    * UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
    * REVISE: chains versions, parent.superseded_at + superseded_by stamped
    * RETIRE: marks specific trace retired with reason, excluded from retrieval
    * HISTORY: walks chain root→tip, cycle-safe

KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces

Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:43:27 -05:00

78 lines
3.3 KiB
Bash
Executable File

#!/usr/bin/env bash
# A/B test of T3 overseer: does it actually make subsequent runs better?
# Chains Run B (T3 seed) → Run C (T3 + read-back) → Run D (T3 cloud).
# Run A is assumed already complete (launched separately). Aggregates
# metrics at the end into ab_scorecard.json.
set -e
cd "$(dirname "$0")/.."
export OLLAMA_CLOUD_KEY="$(python3 -c "import json; print(json.load(open('/root/llm_team_config.json'))['providers']['ollama_cloud']['api_key'])")"
echo "▶ A/B test start at $(date -Iseconds)"
echo "▶ prior lessons dir: $(ls data/_playbook_lessons 2>/dev/null | wc -l) files"
# Run B — T3 enabled local, no prior lessons should exist yet
echo "──── RUN B: T3 local, seeds first lesson ────"
bun tests/multi-agent/scenario.ts > /tmp/lakehouse_ab_B.log 2>&1 || true
echo " B exit=$?"
ls data/_playbook_lessons/*.json 2>/dev/null | head -5
# Run C — T3 enabled local, B's lesson should load
echo "──── RUN C: T3 local, reads B's lesson ────"
bun tests/multi-agent/scenario.ts > /tmp/lakehouse_ab_C.log 2>&1 || true
echo " C exit=$?"
# Run D — T3 enabled CLOUD (gpt-oss:120b), reads B+C lessons
echo "──── RUN D: T3 cloud, reads B+C lessons ────"
LH_OVERVIEW_CLOUD=1 bun tests/multi-agent/scenario.ts > /tmp/lakehouse_ab_D.log 2>&1 || true
echo " D exit=$?"
echo "▶ all runs done at $(date -Iseconds)"
echo "▶ scorecard:"
ls -1dt tests/multi-agent/playbooks/scenario-* | head -4 | tac | python3 -c "
import sys, os, json
runs = [l.strip() for l in sys.stdin if l.strip()]
labels = ['A(no-T3)','B(T3-seed)','C(T3-read)','D(T3-cloud)']
# Prepend Run A: most recent BEFORE the ab_t3_test kicked off is Run A
# (launched separately). But we only picked up the most recent 4 runs.
# Actually: ab_t3_test runs B/C/D, so recent 3 = B,C,D. Run A is the one
# BEFORE those — find it separately.
# Reread to include Run A:
import subprocess
all_runs = subprocess.check_output(['bash','-c','ls -1dt tests/multi-agent/playbooks/scenario-* | head -8']).decode().strip().split('\n')
# The 4 most recent are D, C, B, A (reverse chronological).
top4 = list(reversed(all_runs[:4])) # oldest first → A,B,C,D
rows = []
for i, path in enumerate(top4):
try:
results = json.load(open(os.path.join(path, 'results.json')))
except FileNotFoundError:
continue
ok = sum(1 for r in results if r.get('ok'))
turns = sum(r.get('turns', 0) for r in results)
gaps = sum(len(r.get('gap_signals', [])) for r in results)
cites = sum(len(r.get('playbook_citations') or []) for r in results)
prior = []
try:
prior = json.load(open(os.path.join(path, 'prior_lessons.json')))
except FileNotFoundError:
pass
rows.append({
'label': labels[i] if i < len(labels) else f'run{i}',
'path': path,
'ok_events': ok,
'total_events': len(results),
'total_turns': turns,
'total_gaps': gaps,
'total_citations': cites,
'prior_lessons_loaded': len(prior),
})
scorecard = {'generated_at': __import__('datetime').datetime.utcnow().isoformat()+'Z', 'runs': rows}
open('tests/multi-agent/playbooks/ab_scorecard.json','w').write(json.dumps(scorecard, indent=2))
print(json.dumps(scorecard, indent=2))
"
echo "▶ saved: tests/multi-agent/playbooks/ab_scorecard.json"