matrix-agent-validated/docs/SYSTEM_EVOLUTION_LAYERS.md
profit ac01fffd9a checkpoint: matrix-agent-validated (2026-04-25)
Architectural snapshot of the lakehouse codebase at the point where the
full matrix-driven agent loop with Mem0 versioning + deletion was
validated end-to-end.

WHAT THIS REPO IS
A clean single-commit snapshot of the lakehouse code. Heavy test data
(.parquet datasets, vector indexes) excluded — see REPLICATION.md for
regen path. Full lakehouse history at git.agentview.dev/profit/lakehouse.

WHAT WAS PROVEN
- Vector retrieval across multi-corpora matrix (chicago_permits + entity
  briefs + sec_tickers + distilled procedural + llm_team runs)
- Observer hand-review (cloud + heuristic fallback) gating each candidate
- Local-model agent loop (qwen3.5:latest) with tool use + scratchpad
- Playbook seal on success → next-iter retrieval surfaces it as preamble
- Mem0 versioning + deletion in pathway_memory:
    * UPSERT: ADD on new workflow, UPDATE bumps replay_count on identical
    * REVISE: chains versions, parent.superseded_at + superseded_by stamped
    * RETIRE: marks specific trace retired with reason, excluded from retrieval
    * HISTORY: walks chain root→tip, cycle-safe

KEY DIRECTORIES
- crates/vectord/src/pathway_memory.rs — Mem0 ops live here
- crates/vectord/src/playbook_memory.rs — original Mem0 reference
- tests/agent_test/ — local-model agent harness + PRD + session archives
- scripts/dump_raw_corpus.sh — MinIO bucket dump (raw test corpus)
- scripts/vectorize_raw_corpus.ts — corpus → vector indexes
- scripts/analyze_chicago_contracts.ts — real inference pipeline
- scripts/seal_agent_playbook.ts — Mem0 upsert from agent traces

Replication: see REPLICATION.md for Debian 13 clean install + cloud-only
adaptation (no local Ollama).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-25 19:43:27 -05:00

84 lines
4.0 KiB
Markdown

# Future Expansion — Advanced System Evolution Layers
Adopted 2026-04-24 from J. The system stops optimizing for task completion. It optimizes for **provable execution, repeatable outcomes, resilience under drift, failure, and adversarial conditions.**
## Layer roster + iteration mapping
| # | Layer | Short form | Target iter |
|---|---|---|---:|
| 1 | Counterfactual Execution | Generate synthetic failure variants from each success | iter 5 |
| 2 | Model Trust Profiling | Per-(model, task_type) success rate → routing weight | **iter 3** |
| 3 | Execution DNA | Compress successful runs into reusable patterns | iter 4 |
| 4 | Drift Sentinel | Re-validate historical tasks on a schedule | iter 5 |
| 5 | Adversarial Injection | Inject poisoned context / malformed outputs / conflicts | iter 6 |
| 6 | Permission Gradient | Confidence → execution tier (≥0.9 full, ≥0.7 dry-run, ≥0.5 sim, <0.5 block) | **iter 3** |
| 7 | Multi-Agent Disagreement | Planner/Critic/Validator disagreement = signal | iter 4 |
| 8 | Temporal Context | Time-aware memory with decay_score + last_validated_at | iter 4 |
| 9 | Execution Cost Intelligence | Tokens, iterations, cloud_calls, latency per task | **iter 3** |
| 10 | Human Override as Data | Capture manual fixes as jsonl rows | **iter 3** |
## Detail (J's original framing preserved)
### 1. Counterfactual Execution Layer
Simulate alternate failure paths for every successful task. Real Execution Success Generate Variations (env, version, inputs) Simulate Failure Cases Store Synthetic Failure Signatures. **Purpose:** pre-train against unseen failures before real exposure.
### 2. Model Trust Profiling ← iter 3
Per-(model, task_type) performance tracking.
```
{ "model": "...", "task_type": "...", "success_rate": 0.0, "failure_modes": [], "trust_score": 0.0 }
```
**Usage:** route by trust score, adjust validation strictness dynamically, per-model risk budgets.
### 3. Execution DNA (Trace Compression)
Successful executions reusable fragments.
```
{ "dna_id": "hash", "task_signature": "...", "critical_steps": [], "failure_avoidance": [] }
```
Replaces doc retrieval with pattern retrieval; faster convergence on similar tasks.
### 4. Drift Sentinel
Select Historical Task Re-run Current Env Compare If Failure Mark Drifted Trigger Re-learning. Detect silent decay; maintain long-term reliability.
### 5. Adversarial Injection Engine
Inject malformed outputs / outdated docs / conflicting instructions / poisoned memory. Verify validation catches, execution blocks unsafe actions, memory rejects corrupted data. Build system immunity.
### 6. Permission Gradient Execution ← iter 3
Confidence-based control replacing binary:
- confidence 0.9 full execution
- confidence 0.7 dry-run + diff
- confidence 0.5 simulation only
- confidence < 0.5 block
Inputs: validation score, model trust score, memory match confidence. Risk-aware control; reduced catastrophic-failure surface.
### 7. Multi-Agent Disagreement Engine
Planner / Critic / Validator; disagreement triggers more context, bigger model, stricter validation. Disagreement is signal, not noise.
### 8. Temporal Context Layer
```
{ "created_at": "ts", "last_validated_at": "ts", "decay_score": 0.0 }
```
Retrieval priority: recent + validated + high success rate. Avoid stale knowledge.
### 9. Execution Cost Intelligence ← iter 3
```
{ "task": "...", "tokens_used": 0, "iterations": 0, "cloud_calls": 0, "latency_ms": 0 }
```
Optimize local vs cloud; reduce unnecessary iterations.
### 10. Human Override as Data ← iter 3
```
{ "human_fix": "...", "reason": "...", "task_signature": "...", "validated": true }
```
Manual fixes become reusable knowledge.
## Final Principle
Memory is not passive recall. It is operational substrate:
- failures become structured knowledge
- successes become reusable execution patterns
- all outputs are validated before reuse
## System Directive
Not speed. Not convenience. **Correctness. Verifiability. Resilience under change.**