Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.
WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.
WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
* UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
* REVISE: chains versions, parent.superseded_at + superseded_by stamped
* RETIRE: marks specific trace retired with reason, excluded from retrieval
* HISTORY: walks chain root→tip, cycle-safe
KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces
Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
58 lines
1.9 KiB
Bash
58 lines
1.9 KiB
Bash
#!/usr/bin/env bash
|
|
# End-to-end test: upload Parquet → register dataset → SQL query
|
|
set -e
|
|
|
|
BASE="http://localhost:3100"
|
|
|
|
echo "=== Generate test Parquet file ==="
|
|
python3 -c "
|
|
import struct, io
|
|
|
|
# Minimal Parquet via pyarrow if available, else skip
|
|
try:
|
|
import pyarrow as pa
|
|
import pyarrow.parquet as pq
|
|
table = pa.table({
|
|
'id': [1, 2, 3, 4, 5],
|
|
'name': ['alice', 'bob', 'carol', 'dave', 'eve'],
|
|
'score': [9.5, 8.2, 7.8, 6.1, 9.9],
|
|
})
|
|
pq.write_table(table, '/tmp/test_data.parquet')
|
|
print('generated with pyarrow')
|
|
except ImportError:
|
|
print('pyarrow not available, generating via rust helper')
|
|
exit(1)
|
|
"
|
|
|
|
echo "=== Upload Parquet to storage ==="
|
|
curl -s -X PUT "$BASE/storage/objects/datasets/scores.parquet" \
|
|
--data-binary @/tmp/test_data.parquet
|
|
echo ""
|
|
|
|
echo "=== Register dataset in catalog ==="
|
|
SIZE=$(stat -c%s /tmp/test_data.parquet)
|
|
curl -s -X POST "$BASE/catalog/datasets" \
|
|
-H "Content-Type: application/json" \
|
|
-d "{\"name\":\"scores\",\"schema_fingerprint\":\"test\",\"objects\":[{\"bucket\":\"data\",\"key\":\"datasets/scores.parquet\",\"size_bytes\":$SIZE}]}" | python3 -m json.tool
|
|
echo ""
|
|
|
|
echo "=== SQL: SELECT * FROM scores ==="
|
|
curl -s -X POST "$BASE/query/sql" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"sql":"SELECT * FROM scores"}' | python3 -m json.tool
|
|
echo ""
|
|
|
|
echo "=== SQL: SELECT name, score FROM scores WHERE score > 8.0 ORDER BY score DESC ==="
|
|
curl -s -X POST "$BASE/query/sql" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"sql":"SELECT name, score FROM scores WHERE score > 8.0 ORDER BY score DESC"}' | python3 -m json.tool
|
|
echo ""
|
|
|
|
echo "=== SQL: SELECT COUNT(*), AVG(score) FROM scores ==="
|
|
curl -s -X POST "$BASE/query/sql" \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"sql":"SELECT COUNT(*) as cnt, AVG(score) as avg_score FROM scores"}' | python3 -m json.tool
|
|
echo ""
|
|
|
|
echo "=== DONE ==="
|