2 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
0a0843b605 |
ADR-021: semantic-correctness layer lands in pathway_memory (A+B+C)
Some checks failed
lakehouse/auditor 4 blocking issues: todo!() macro call in tests/real-world/scrum_master_pipeline.ts
Phase A — data model (vectord/src/pathway_memory.rs):
+ SemanticFlag enum (9 variants: UnitMismatch, TypeConfusion,
NullableConfusion, OffByOne, StaleReference, PseudoImpl, DeadCode,
WarningNoise, BoundaryViolation) as #[serde(tag = "kind")]
+ TypeHint { source, symbol, type_repr }
+ BugFingerprint { flag, pattern_key, example, occurrences }
+ PathwayTrace gains semantic_flags, type_hints_used, bug_fingerprints
all #[serde(default)] for back-compat deserialization of pre-ADR-021
traces on disk
+ build_pathway_vec now tokenizes flag:{variant} + bug:{flag}:{key}
so traces with different bug histories cluster separately in the
similarity gate (proven by pathway_vec_differs_when_bug_fingerprint_added
test)
Phase B — producer (scrum_master_pipeline.ts):
+ Prompt addendum: each finding must carry `**Flag: <CATEGORY>**` tag
alongside the existing Confidence: NN% tag. 9 category choices plus
`None` for improvements that aren't bug-shaped.
+ Parser extracts tagged flags from reviewer markdown; falls back to
bare-word match if reviewer omits the label. Deduplicated per trace.
+ PathwayTracePayload gains semantic_flags / type_hints_used /
bug_fingerprints fields. Wire format matches Rust serde tagged enum
so TS and Rust interop directly.
Phase C — pre-review enrichment:
+ new `/vectors/pathway/bug_fingerprints` endpoint aggregates
occurrences by (flag, pattern_key) across traces sharing a narrow
fingerprint, sorts by frequency, returns top-K.
+ scrum calls it before the ladder and prepends a PATHWAY MEMORY
preamble to the reviewer prompt ("these patterns appeared N times
on this file area before — check for recurrences"). Empty on
fresh install; grows as the matrix index learns.
Tests: 27 pathway_memory tests green (was 18). New tests:
- pathway_trace_deserializes_without_new_fields_backcompat
- semantic_flag_serializes_as_tagged_enum
- bug_fingerprint_roundtrips_through_serde
- pathway_vec_differs_when_bug_fingerprint_added
- semantic_flag_discriminates_by_variant
- bug_fingerprints_aggregate_by_pattern_key (sums occurrences, sorts desc)
- bug_fingerprints_empty_for_unseen_fingerprint
- bug_fingerprints_respects_limit
- insert_preserves_semantic_fields (roundtrip via persist + reload)
Workspace warnings unchanged at 11.
What's still queued (not this commit):
- type_hints_used population from catalogd column types + Arrow schema
- bug_fingerprint extraction from reviewer output (Phase D — for now
semantic_flags populate but the fingerprint key requires parsing
code-shape from the finding; next iteration's work)
- auditor → pathway audit_consensus update wire (explicit-fail gate)
Why this commit matters: the mechanical applier's gates are syntactic
(warning count, patch size, rationale-token alignment). The
queryd/delta.rs base_rows bug (86901f8) was found by human reading —
unit mismatch between row counts and file counts. At 100 bugs this
deep, humans can't catch them all; the matrix index has to learn the
shapes. This commit gives it the fields to learn into and the surface
to read from.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
2f8b347f37 |
pathway_memory: consensus-designed sidecar + hot-swap learning loop
Some checks failed
lakehouse/auditor 11 warnings — see review
10-probe N=3 consensus (kimi-k2:1t / gpt-oss:120b / qwen3.5:latest /
deepseek-v3.1:671b / qwen3-coder:480b / mistral-large-3:675b /
qwen3.5:397b + 2 stability re-probes; 2 openrouter probes 429'd) locked
the design across three rounds. Full JSON responses in
data/_kb/consensus_reducer_design_{mocq3akn,mocq6pi1,mocqatik}.json.
What it does
Preserves FULL backtrack context per reviewed file (ladder attempts +
latencies + reject reasons, KB chunks with provenance + cosine + rank,
observer signals, context7 bridge hits, sub-pipeline calls, audit
consensus) and indexes them by narrow fingerprint for hot-swap of
proven review pathways.
When scrum reviews a file:
1. narrow fingerprint = task_class + file_prefix + signal_class
2. query_hot_swap checks pathway memory for a match that passes
probation (≥3 replays @ ≥80% success) + audit gate + similarity
(≥0.90 cosine on normalized-metadata-token embedding)
3. if hot-swap eligible, recommended model tried first in the ladder
4. replay outcome reported back, updating the pathway's success_rate
5. pathways below 0.80 after ≥3 replays retire permanently (sticky)
6. full PathwayTrace always inserted at end of review — hot-swap
grows with use, it doesn't bootstrap from nothing
Gate design is load-bearing:
- narrow fingerprint (6 of 8 consensus models converged on the same
3-field composition; lock) — enables generalization within crate
- probation ≥3 replays — binomial tail at 80% is ~5%, below is noise
- success rate ≥0.80 — mistral + qwen3-coder independently proposed
this exact threshold across two rounds
- similarity ≥0.90 — middle of the 0.85/0.95 consensus spread
- bootstrap: null audit_consensus ALLOWED (auditor → pathway update
not wired yet; probation + success_rate gates alone enforce safety
during bootstrap; explicit audit FAIL still blocks)
- retirement is sticky — prevents oscillation on noise
Files
+ crates/vectord/src/pathway_memory.rs (new, 600 lines + 18 tests)
PathwayTrace, LadderAttempt, KbChunkRef, ObserverSignal, BridgeHit,
SubPipelineCall, AuditConsensus, HotSwapCandidate, PathwayMemory,
PathwayMemoryStats. 18/18 tests green.
Cosine + 32-bucket L2-normalized embedding; mirror of TS impl.
M crates/vectord/src/lib.rs
pub mod pathway_memory;
M crates/vectord/src/service.rs
VectorState grows pathway_memory field;
4 HTTP handlers (/pathway/insert, /pathway/query,
/pathway/record_replay, /pathway/stats).
M crates/gateway/src/main.rs
Construct PathwayMemory + load from storage on boot,
wire into VectorState.
M tests/real-world/scrum_master_pipeline.ts
Byte-matching TS bucket-hash (verified same bucket indices as
Rust); pre-ladder hot-swap query; ladder reorder on hit;
per-attempt latency capture; post-accept trace insert
(fire-and-forget); replay outcome recording;
observer /event emits pathway_hot_swap_hit, pathway_similarity,
rungs_saved per review for the VCP UI.
M ui/server.ts
/data/pathway_stats aggregates /vectors/pathway/stats +
scrum_reviews.jsonl window for the value metric.
M ui/ui.js
Three new metric cards:
· pathway reuse rate (activity: is it firing?)
· avg rungs saved (value: is it earning its keep?)
· pathways tracked (stability: retirement = learning)
What's not in this commit (queued)
- auditor → pathway audit_consensus update wire (explicit audit-fail
block activates when this lands)
- bridge_hits + sub_pipeline_calls population from context7 / LLM
Team extract results (fields wired, callers not yet)
- replay log (PathwayReplayOutcome {matched_id, succeeded, ts}) as
a separate jsonl for forensic audit of why specific replays failed
Why > summarization
Summaries discard the causal chain. With this, auditor can verify
citation provenance, applier can distinguish lucky from learned paths,
and the matrix indexing actually stores end-to-end pathways instead of
just RAG chunks — which is what J meant by "why aren't we using it
for everything."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|