root 154a72ea5e matrix: Shape B — inject playbook misses + 6/6 paraphrase recovery
The v0 boost-only stance documented in internal/matrix/playbook.go:22-27
("the boost only re-ranks results that ALREADY surfaced from the regular
retrieval") couldn't promote recorded answers that dropped out of a
paraphrase's top-K. playbook_lift_002 surfaced exactly that gap: 0/2
paraphrase recoveries because the recorded answers weren't in regular
retrieval at all (rank=-1).

Shape B: when warm-pass retrieval doesn't surface a playbook hit's
answer, inject a synthetic Result for it directly. Distance =
playbook_hit_distance × BoostFactor — same formula as the boost path so
injections land in comparable distance space. Caller re-sorts +
truncates after both boost and inject have run.

Result on playbook_lift_003 (Shape B + paraphrase pass):

  Verbatim discovery        6
  Verbatim lift             2 / 6
  **Paraphrase top-1**      **6 / 6**
  Paraphrase any-rank in K  6 / 6
  Mean Δ top-1 distance     -0.1637 (warm closer than cold)

Every paraphrase the judge generated landed the v1-recorded answer at
top-1 of the new query's results. The learning property holds — cosine
on embed(paraphrase) finds the recorded query's vector within
DefaultPlaybookMaxDistance (0.5), and Shape B injects the answer.

Verbatim lift dropped from v1's 7/8 because Shape B cross-pollinates
recorded answers across queries. w-4435 (Q2's recording) appears as
warm top-1 for several other queries because their embeddings are
within the playbook hit threshold of "OSHA-30 forklift Wisconsin." This
is a feature, not a bug — the matrix layer's purpose is to share
knowledge across queries — but the lift metric only counts "warm top-1
== cold judge best," so cross-pollinated lifts don't register. A v3
metric would re-judge warm pass to measure true judge improvement.

Tests:
- TestInjectPlaybookMisses_AddsMissingAnswers — primary claim
- TestInjectPlaybookMisses_SkipsAnswersAlreadyPresent — no double-inject
- TestInjectPlaybookMisses_DedupesPerAnswer — multi-hit same answer
- TestInjectPlaybookMisses_EmptyHits — fast-path no-op

Driver fix: ParaphraseRecordedRank int → *int. The `omitempty` int
silently dropped rank=0 (top-1, the WANTED value) from JSON, making the
v003 report show "null" instead of "0" for every successful recovery.
Pointer keeps nil/rank-0 distinguishable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 07:06:13 -05:00

269 lines
9.7 KiB
Go
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

package matrix
// Playbook memory — SPEC §3.4 component 5 (learning-loop integration).
//
// Concept: every time an external system confirms "(query → answer_id)
// was a successful match," record it. Future similar queries get that
// answer's score boosted, so the matrix indexer learns from outcomes
// rather than relying solely on the base embedder's geometry.
//
// Per feedback_meta_index_vision.md: this is the north star — a
// meta-index that LEARNS from playbooks over time, not a static
// hybrid search engine.
//
// Storage shape: a vectord index named DefaultPlaybookCorpus where:
// - The vector is embed(query_text)
// - The metadata is a serialized PlaybookEntry
// Retrieval shape: at /matrix/search time, when use_playbook=true,
// matrixd searches the playbook corpus with the same query vector,
// looks up each hit's answer_id, and if that answer is in the current
// matrix-search results, applies a boost to its distance.
//
// Composition: this layer is additive on top of the existing
// retrieve+merge — when use_playbook=false, behavior is unchanged.
// The boost only re-ranks results that ALREADY surfaced from the
// regular retrieval. A v1 enhancement would inject playbook hits
// directly even when they weren't in the top-K (Shape B from the
// design conversation), but v0 keeps the safer "boost-only" stance.
import (
"encoding/json"
"errors"
"sort"
"time"
)
// DefaultPlaybookCorpus is the vectord index name where playbook
// entries land by default. Callers can override per-request, but
// having one default makes the system observable from the outside
// (operator hits /vectors/index and sees this corpus in the list).
const DefaultPlaybookCorpus = "playbook_memory"
// DefaultPlaybookTopK is how many similar past queries to consider
// when applying boost. 3 keeps the influence focused — we want the
// boost to reward consistent matches, not let one stale playbook
// dominate. Caller can override.
const DefaultPlaybookTopK = 3
// DefaultPlaybookMaxDistance is the cosine ceiling for "this past
// query is similar enough to count." 0.5 lets in genuinely related
// queries while excluding pure-coincidence neighbors. Caller can
// override per-request as we learn what works for staffing data.
const DefaultPlaybookMaxDistance = 0.5
// PlaybookEntry is what gets stored as metadata on each playbook
// vector. RecordedAt is captured at write time; callers should not
// set it (the recorder fills it in).
type PlaybookEntry struct {
QueryText string `json:"query_text"`
AnswerID string `json:"answer_id"`
AnswerCorpus string `json:"answer_corpus"`
Score float64 `json:"score"` // 0..1; higher = better outcome
RecordedAtNs int64 `json:"recorded_at_ns"`
Tags []string `json:"tags,omitempty"`
}
// Validate returns an error if the entry is missing required fields.
// Callers should validate before storage so bad data doesn't pollute
// the corpus.
func (p PlaybookEntry) Validate() error {
if p.QueryText == "" {
return errors.New("playbook: query_text required")
}
if p.AnswerID == "" {
return errors.New("playbook: answer_id required")
}
if p.AnswerCorpus == "" {
return errors.New("playbook: answer_corpus required")
}
if p.Score < 0 || p.Score > 1 {
return errors.New("playbook: score must be in [0, 1]")
}
return nil
}
// BoostFactor returns the multiplier applied to a result's distance
// when this playbook entry matches it. Lower is better:
//
// score = 0 → 1.0 (no boost)
// score = 0.5 → 0.75 (mild boost)
// score = 1.0 → 0.5 (halve the distance — strong boost)
//
// Math: 1 - 0.5*score. Capped to [0.5, 1.0] for safety.
//
// Why halving as the maximum boost: a perfect-confidence playbook
// entry shouldn't completely override the base embedding (that
// invites runaway feedback loops where one early playbook
// dominates forever). Halving is enough to move a mid-rank result
// to the top in most cases without erasing the base ranking
// signal.
func (p PlaybookEntry) BoostFactor() float64 {
score := p.Score
if score < 0 {
score = 0
}
if score > 1 {
score = 1
}
return 1.0 - 0.5*score
}
// MarshalMetadata serializes the entry as the JSON RawMessage that
// vectord stores per item. Convenience for the recorder.
func (p PlaybookEntry) MarshalMetadata() (json.RawMessage, error) {
return json.Marshal(p)
}
// UnmarshalPlaybookMetadata is the inverse — used when fetching
// playbook hits to decode their metadata back into entries.
func UnmarshalPlaybookMetadata(raw json.RawMessage) (PlaybookEntry, error) {
var e PlaybookEntry
if len(raw) == 0 {
return e, errors.New("playbook: empty metadata")
}
if err := json.Unmarshal(raw, &e); err != nil {
return e, err
}
return e, nil
}
// NewPlaybookEntry stamps RecordedAtNs to now and returns the entry.
// Validation happens at storage; this is just construction.
func NewPlaybookEntry(query, answerID, answerCorpus string, score float64, tags []string) PlaybookEntry {
return PlaybookEntry{
QueryText: query,
AnswerID: answerID,
AnswerCorpus: answerCorpus,
Score: score,
RecordedAtNs: time.Now().UnixNano(),
Tags: tags,
}
}
// PlaybookHit is one similarity-search result from the playbook
// corpus, paired with its decoded entry. Distance is the cosine
// distance between the current query and this past playbook's
// query vector — used by the caller to filter out "too far"
// matches via PlaybookMaxDistance.
type PlaybookHit struct {
PlaybookID string `json:"playbook_id"`
Distance float32 `json:"distance"`
Entry PlaybookEntry `json:"entry"`
}
// InjectPlaybookMisses appends synthetic Results for playbook hits
// whose (AnswerCorpus, AnswerID) doesn't already appear in results.
// This is "Shape B" from the doc comment at the top of this file:
// the v0 boost-only stance (ApplyPlaybookBoost) can't promote a
// recorded answer that wasn't already in the regular retrieval's
// top-K. Paraphrase queries broke this — different embedding ⇒
// different top-K ⇒ recorded answer drops out ⇒ no boost can save
// it. Reality test playbook_lift_002 showed 0/2 paraphrase top-1
// lifts because of exactly that.
//
// Synthetic distance = playbook_hit_distance × BoostFactor — same
// formula as ApplyPlaybookBoost, applied to the playbook hit's own
// distance instead of a result's. Lower playbook hit distance
// (current query is similar to recorded query) AND higher score
// (recorded outcome was strong) push the injection toward top-1.
//
// fetchPlaybookHits has already filtered hits to those within
// DefaultPlaybookMaxDistance (0.5), so injected results land in the
// same distance range as regular retrieval — they don't dominate
// top-K from out-of-distribution playbooks.
//
// Returns the (possibly extended) results slice and how many synthetic
// rows were appended. Caller MUST re-sort + truncate to K afterwards.
func InjectPlaybookMisses(results []Result, hits []PlaybookHit) ([]Result, int) {
if len(hits) == 0 {
return results, 0
}
present := make(map[string]bool, len(results))
for _, r := range results {
present[r.Corpus+"|"+r.ID] = true
}
// For each (corpus, id) NOT in results, keep the playbook hit
// with the largest boost (lowest BoostFactor = highest score).
// Multiple hits to the same answer collapse to one injection.
bestForKey := make(map[string]PlaybookHit)
for _, h := range hits {
key := h.Entry.AnswerCorpus + "|" + h.Entry.AnswerID
if present[key] {
continue
}
if existing, ok := bestForKey[key]; !ok || h.Entry.BoostFactor() < existing.Entry.BoostFactor() {
bestForKey[key] = h
}
}
for _, h := range bestForKey {
injectedDist := h.Distance * float32(h.Entry.BoostFactor())
// Synthesize metadata that flags the injection so callers
// (driver/UI/observer) can distinguish "regular retrieval"
// from "playbook injection." Production consumers needing
// the actual worker metadata can fetch from vectord by
// (Corpus, ID) — synthetic results carry only provenance.
meta, _ := json.Marshal(map[string]any{
"playbook_injected": true,
"playbook_id": h.PlaybookID,
"playbook_score": h.Entry.Score,
"playbook_query_text": h.Entry.QueryText,
"playbook_recorded_at_ns": h.Entry.RecordedAtNs,
"playbook_hit_distance": h.Distance,
})
results = append(results, Result{
ID: h.Entry.AnswerID,
Corpus: h.Entry.AnswerCorpus,
Distance: injectedDist,
Metadata: meta,
})
}
return results, len(bestForKey)
}
// ApplyPlaybookBoost re-ranks results in place using matched
// playbook hits. For each hit whose (AnswerID, AnswerCorpus)
// matches a result, multiply that result's distance by the hit's
// BoostFactor. If multiple hits match the same result, the highest-
// score one wins (greatest reduction in distance).
//
// After applying boosts, results are re-sorted ascending by
// distance.
//
// Returns the number of distinct results that received a boost.
// Callers can log this as a signal of "how much the playbook
// influenced this query."
func ApplyPlaybookBoost(results []Result, hits []PlaybookHit) int {
if len(hits) == 0 || len(results) == 0 {
return 0
}
// For each result, find the hit with the lowest BoostFactor
// (= largest boost = highest score, since BoostFactor is
// 1-0.5*score and we minimize).
bestBoost := make(map[int]float64, len(results))
for i, r := range results {
for _, h := range hits {
if h.Entry.AnswerID != r.ID || h.Entry.AnswerCorpus != r.Corpus {
continue
}
bf := h.Entry.BoostFactor()
if cur, ok := bestBoost[i]; !ok || bf < cur {
bestBoost[i] = bf
}
}
}
for i, bf := range bestBoost {
results[i].Distance = float32(float64(results[i].Distance) * bf)
}
sort.SliceStable(results, func(i, j int) bool {
return results[i].Distance < results[j].Distance
})
return len(bestBoost)
}