profit ac01fffd9a checkpoint: matrix-agent-validated (2026-04-25)
Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.

WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.

WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
  briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
    * UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
    * REVISE: chains versions, parent.superseded_at + superseded_by stamped
    * RETIRE: marks specific trace retired with reason, excluded from retrieval
    * HISTORY: walks chain root→tip, cycle-safe

KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces

Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:43:27 -05:00

43 lines
1.5 KiB
TOML

[package]
name = "lance-bench"
version = "0.1.0"
edition = "2024"
# Standalone pilot for Phase B (see docs/EXECUTION_PLAN.md).
# Deliberately NOT sharing workspace deps — Lance 4.x pulls in its own
# DataFusion and Arrow versions incompatible with the rest of the stack.
# Isolating the pilot means we don't force a workspace-wide upgrade until
# we've decided Lance is worth it.
[dependencies]
# Only the features we actually need — the default brings in AWS/Azure/GCP/HF etc
# which is ~200 extra crates we don't care about for a local pilot.
lance = { version = "4.0", default-features = false }
# Lance exposes DatasetIndexExt, IndexType, and IvfBuildParams through
# its sub-crates which must be imported directly — lance itself doesn't
# re-export them at a convenient path.
lance-index = { version = "4.0", default-features = false }
lance-linalg = { version = "4.0", default-features = false }
# Arrow re-exported by Lance; pin to a range Lance picks so types match.
arrow = "57"
arrow-array = "57"
arrow-schema = "57"
# Also need to read the EXISTING Parquet vector files so we can compare.
# These live in data/vectors/*.parquet. Lance's internal Parquet reading
# might differ from ours; using our format's Arrow/Parquet versions for
# the read side keeps the inputs identical.
parquet = "57"
tokio = { version = "1", features = ["full"] }
futures = "0.3"
serde = { version = "1", features = ["derive"] }
serde_json = "1"
anyhow = "1"
bytes = "1"
[[bin]]
name = "lance-bench"
path = "src/main.rs"