lakehouse

Author	SHA1	Message	Date
root	626f18d491	pathway_memory: audit-consensus → retire wire Some checks failed lakehouse/auditor 1 blocking issue: todo!() macro call in tests/real-world/scrum_master_pipeline.ts When observer's hand-review explicitly rejects the output of a hot-swap-recommended model, the matrix's recommendation was wrong for this context. Auto-retire the trace so future agents don't get the same poisoned recommendation in their preamble. crates/vectord/src/pathway_memory.rs — add `trace_uid` to HotSwapCandidate response and populate from the matched trace. This gives consumers single-trace precision for /pathway/retire. tests/real-world/scrum_master_pipeline.ts: - HotSwapCandidate interface gains trace_uid - new retirePathwayTrace() helper (fire-and-forget, fall-open) - in the obsVerdict reject branch: if hotSwap was active AND the rejected model is the hot-swap-recommended one AND observer confidence ≥0.7, fire retire and null hotSwap so post-loop replay bookkeeping doesn't double-process. - hotSwap declared `let` (was const) so it can be nulled Cycle verdicts ("needs different angle") don't trigger retire — only outright rejects do. Confidence gate avoids retiring on heuristic-fallback verdicts that come back without a confidence number. Closes the "audit-consensus → retire" item from HANDOVER.md. Live-tested: insert synthetic trace → /pathway/retire by trace_uid → retired counter 1 → 2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 00:01:20 -05:00
root	6ac7f61819	pathway_memory: Mem0 versioning + deletion (upsert/revise/retire/history) Per J 2026-04-25: pathway_memory was append-only — every agent run added a new trace, bad/failed runs polluted the matrix forever, no notion of "this is the canonical evolved playbook." Ported playbook_memory's Phase 25/27 patterns into pathway_memory so the agent loop's matrix converges on best-known approaches per task class instead of bloating. Fields added to PathwayTrace (all #[serde(default)] for back-compat): - trace_uid: stable UUID per individual trace within a bucket - version: u32 default 1 - parent_trace_uid, superseded_at, superseded_by_trace_uid - retirement_reason (paired with existing retired:bool) Methods added to PathwayMemory: - upsert(trace) → PathwayUpsertOutcome {Added\|Updated\|Noop} Workflow-fingerprint dedup: ladder_attempts + final_verdict hash. Identical workflow → bumps existing replay_count instead of duplicating. - revise(parent_uid, new_trace) → PathwayReviseOutcome Chains versions; rejects retired or already-superseded parents. - retire(trace_uid, reason) → bool Marks specific trace retired with reason. Idempotent. - history(trace_uid) → Vec<PathwayTrace> Walks parent_trace_uid back to root, then superseded_by forward to tip. Cycle-safe via visited set. Retrieval gates updated: - query_hot_swap skips superseded_at.is_some() - bug_fingerprints_for skips both retired AND superseded HTTP endpoints in service.rs: - POST /vectors/pathway/upsert - POST /vectors/pathway/retire - POST /vectors/pathway/revise - GET /vectors/pathway/history/{trace_uid} scripts/seal_agent_playbook.ts switched insert→upsert + accepts SESSION_DIR arg so it can seal any archived session, not just iter4. Verified live (4/4 ops): - UPSERT first run: Added trace_uid 542ae53f - UPSERT identical: Updated, replay_count bumped 0→1 (no duplicate) - REVISE 542ae53f→87a70a61: parent stamped superseded_at, v2 created - HISTORY of v2: chain_len=2, v1 superseded, v2 tip - RETIRE iter-6 broken trace: retired=true, retirement_reason preserved - pathway_memory.stats: total=79, retired=1, reuse_rate=0.0127 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 19:31:44 -05:00
root	0a0843b605	ADR-021: semantic-correctness layer lands in pathway_memory (A+B+C) Some checks failed lakehouse/auditor 4 blocking issues: todo!() macro call in tests/real-world/scrum_master_pipeline.ts Phase A — data model (vectord/src/pathway_memory.rs): + SemanticFlag enum (9 variants: UnitMismatch, TypeConfusion, NullableConfusion, OffByOne, StaleReference, PseudoImpl, DeadCode, WarningNoise, BoundaryViolation) as #[serde(tag = "kind")] + TypeHint { source, symbol, type_repr } + BugFingerprint { flag, pattern_key, example, occurrences } + PathwayTrace gains semantic_flags, type_hints_used, bug_fingerprints all #[serde(default)] for back-compat deserialization of pre-ADR-021 traces on disk + build_pathway_vec now tokenizes flag:{variant} + bug:{flag}:{key} so traces with different bug histories cluster separately in the similarity gate (proven by pathway_vec_differs_when_bug_fingerprint_added test) Phase B — producer (scrum_master_pipeline.ts): + Prompt addendum: each finding must carry `Flag: <CATEGORY>` tag alongside the existing Confidence: NN% tag. 9 category choices plus `None` for improvements that aren't bug-shaped. + Parser extracts tagged flags from reviewer markdown; falls back to bare-word match if reviewer omits the label. Deduplicated per trace. + PathwayTracePayload gains semantic_flags / type_hints_used / bug_fingerprints fields. Wire format matches Rust serde tagged enum so TS and Rust interop directly. Phase C — pre-review enrichment: + new `/vectors/pathway/bug_fingerprints` endpoint aggregates occurrences by (flag, pattern_key) across traces sharing a narrow fingerprint, sorts by frequency, returns top-K. + scrum calls it before the ladder and prepends a PATHWAY MEMORY preamble to the reviewer prompt ("these patterns appeared N times on this file area before — check for recurrences"). Empty on fresh install; grows as the matrix index learns. Tests: 27 pathway_memory tests green (was 18). New tests: - pathway_trace_deserializes_without_new_fields_backcompat - semantic_flag_serializes_as_tagged_enum - bug_fingerprint_roundtrips_through_serde - pathway_vec_differs_when_bug_fingerprint_added - semantic_flag_discriminates_by_variant - bug_fingerprints_aggregate_by_pattern_key (sums occurrences, sorts desc) - bug_fingerprints_empty_for_unseen_fingerprint - bug_fingerprints_respects_limit - insert_preserves_semantic_fields (roundtrip via persist + reload) Workspace warnings unchanged at 11. What's still queued (not this commit): - type_hints_used population from catalogd column types + Arrow schema - bug_fingerprint extraction from reviewer output (Phase D — for now semantic_flags populate but the fingerprint key requires parsing code-shape from the finding; next iteration's work) - auditor → pathway audit_consensus update wire (explicit-fail gate) Why this commit matters: the mechanical applier's gates are syntactic (warning count, patch size, rationale-token alignment). The queryd/delta.rs base_rows bug (86901f8) was found by human reading — unit mismatch between row counts and file counts. At 100 bugs this deep, humans can't catch them all; the matrix index has to learn the shapes. This commit gives it the fields to learn into and the surface to read from. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 05:49:10 -05:00
root	2f8b347f37	pathway_memory: consensus-designed sidecar + hot-swap learning loop Some checks failed lakehouse/auditor 11 warnings — see review 10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest / deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b / qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked the design across three rounds. Full JSON responses in data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json. What it does Preserves FULL backtrack context per reviewed file (ladder attempts + latencies + reject reasons, KB chunks with provenance + cosine + rank, observer signals, context7 bridge hits, sub-pipeline calls, audit consensus) and indexes them by narrow fingerprint for hot-swap of proven review pathways. When scrum reviews a file: 1. narrow fingerprint = task_class + file_prefix + signal_class 2. query_hot_swap checks pathway memory for a match that passes probation (≥3 replays @ ≥80% success) + audit gate + similarity (≥0.90 cosine on normalized-metadata-token embedding) 3. if hot-swap eligible, recommended model tried first in the ladder 4. replay outcome reported back, updating the pathway's success_rate 5. pathways below 0.80 after ≥3 replays retire permanently (sticky) 6. full PathwayTrace always inserted at end of review — hot-swap grows with use, it doesn't bootstrap from nothing Gate design is load-bearing: - narrow fingerprint (6 of 8 consensus models converged on the same 3-field composition; lock) — enables generalization within crate - probation ≥3 replays — binomial tail at 80% is ~5%, below is noise - success rate ≥0.80 — mistral + qwen3-coder independently proposed this exact threshold across two rounds - similarity ≥0.90 — middle of the 0.85/0.95 consensus spread - bootstrap: null audit_consensus ALLOWED (auditor → pathway update not wired yet; probation + success_rate gates alone enforce safety during bootstrap; explicit audit FAIL still blocks) - retirement is sticky — prevents oscillation on noise Files + crates/vectord/src/pathway_memory.rs (new, 600 lines + 18 tests) PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit, SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory, PathwayMemoryStats. 18/18 tests green. Cosine + 32-bucket L2-normalized embedding; mirror of TS impl. M crates/vectord/src/lib.rs pub mod pathway_memory; M crates/vectord/src/service.rs VectorState grows pathway_memory field; 4 HTTP handlers (/pathway/insert, /pathway/query, /pathway/record_replay, /pathway/stats). M crates/gateway/src/main.rs Construct PathwayMemory + load from storage on boot, wire into VectorState. M tests/real-world/scrum_master_pipeline.ts Byte-matching TS bucket-hash (verified same bucket indices as Rust); pre-ladder hot-swap query; ladder reorder on hit; per-attempt latency capture; post-accept trace insert (fire-and-forget); replay outcome recording; observer /event emits pathway_hot_swap_hit, pathway_similarity, rungs_saved per review for the VCP UI. M ui/server.ts /data/pathway_stats aggregates /vectors/pathway/stats + scrum_reviews.jsonl window for the value metric. M ui/ui.js Three new metric cards: · pathway reuse rate (activity: is it firing?) · avg rungs saved (value: is it earning its keep?) · pathways tracked (stability: retirement = learning) What's not in this commit (queued) - auditor → pathway audit_consensus update wire (explicit audit-fail block activates when this lands) - bridge_hits + sub_pipeline_calls population from context7 / LLM Team extract results (fields wired, callers not yet) - replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as a separate jsonl for forensic audit of why specific replays failed Why > summarization Summaries discard the causal chain. With this, auditor can verify citation provenance, applier can distinguish lucky from learned paths, and the matrix indexing actually stores end-to-end pathways instead of just RAG chunks — which is what J meant by "why aren't we using it for everything." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 05:15:32 -05:00

4 Commits