Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.
WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.
WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
* UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
* REVISE: chains versions, parent.superseded_at + superseded_by stamped
* RETIRE: marks specific trace retired with reason, excluded from retrieval
* HISTORY: walks chain root→tip, cycle-safe
KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces
Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
43 lines
1.5 KiB
TOML
43 lines
1.5 KiB
TOML
[package]
|
|
name = "lance-bench"
|
|
version = "0.1.0"
|
|
edition = "2024"
|
|
|
|
# Standalone pilot for Phase B (see docs/EXECUTION_PLAN.md).
|
|
# Deliberately NOT sharing workspace deps — Lance 4.x pulls in its own
|
|
# DataFusion and Arrow versions incompatible with the rest of the stack.
|
|
# Isolating the pilot means we don't force a workspace-wide upgrade until
|
|
# we've decided Lance is worth it.
|
|
|
|
[dependencies]
|
|
# Only the features we actually need — the default brings in AWS/Azure/GCP/HF etc
|
|
# which is ~200 extra crates we don't care about for a local pilot.
|
|
lance = { version = "4.0", default-features = false }
|
|
# Lance exposes DatasetIndexExt, IndexType, and IvfBuildParams through
|
|
# its sub-crates which must be imported directly — lance itself doesn't
|
|
# re-export them at a convenient path.
|
|
lance-index = { version = "4.0", default-features = false }
|
|
lance-linalg = { version = "4.0", default-features = false }
|
|
|
|
# Arrow re-exported by Lance; pin to a range Lance picks so types match.
|
|
arrow = "57"
|
|
arrow-array = "57"
|
|
arrow-schema = "57"
|
|
|
|
# Also need to read the EXISTING Parquet vector files so we can compare.
|
|
# These live in data/vectors/*.parquet. Lance's internal Parquet reading
|
|
# might differ from ours; using our format's Arrow/Parquet versions for
|
|
# the read side keeps the inputs identical.
|
|
parquet = "57"
|
|
|
|
tokio = { version = "1", features = ["full"] }
|
|
futures = "0.3"
|
|
serde = { version = "1", features = ["derive"] }
|
|
serde_json = "1"
|
|
anyhow = "1"
|
|
bytes = "1"
|
|
|
|
[[bin]]
|
|
name = "lance-bench"
|
|
path = "src/main.rs"
|