root 89ca72d471 materializer + replay ports + vectord substrate fix verified at scale
Two threads landing together — the doc edits interleave so they ship
in a single commit.

1. **vectord substrate fix verified at original scale** (closes the
   2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211
   scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix).
   Throughput dropped 1,115 → 438/sec because previously-broken
   scenarios now do real HNSW Add work — honest cost of correctness.
   The fix (i.vectors side-store + safeGraphAdd recover wrappers +
   smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the
   footprint that originally surfaced the bug.

2. **Materializer port** — internal/materializer + cmd/materializer +
   scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts
   (12 transforms) + build_evidence_index.ts (idempotency, day-partition,
   receipt). On-wire JSON shape matches TS so Bun and Go runs are
   interchangeable. 14 tests green.

3. **Replay port** — internal/replay + cmd/replay +
   scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts
   (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL
   phase 7 live invocation on the Go side. Both runtimes append to the
   same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green.

Side effect on internal/distillation/types.go: EvidenceRecord gained
prompt_tokens, completion_tokens, and metadata fields to mirror the TS
shape the materializer transforms produce.

STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions
tracker moves the materializer + replay items from _open_ to DONE and
adds the substrate-fix scale verification row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:31:02 -05:00

67 lines
2.0 KiB
Go

package replay
import (
"fmt"
"regexp"
"strings"
)
// fillerPatterns are the hedge phrases the spec rejects. Compiled once
// per package — the gate runs on every replay call.
var fillerPatterns = []*regexp.Regexp{
regexp.MustCompile(`(?i)as an ai`),
regexp.MustCompile(`(?i)i cannot`),
regexp.MustCompile(`(?i)i'?m sorry, but`),
regexp.MustCompile(`(?i)i don'?t have access`),
regexp.MustCompile(`(?i)i am unable to`),
}
// ValidateResponse runs the deterministic gate on a model response.
// Empty / too-short / hedge-bearing / context-disconnected responses
// fail. Matches replay.ts:validateResponse one-to-one.
func ValidateResponse(response string, bundle *ContextBundle) ValidationResult {
trimmed := strings.TrimSpace(response)
var reasons []string
if len(trimmed) == 0 {
return ValidationResult{Passed: false, Reasons: []string{"empty response"}}
}
if len(trimmed) < 80 {
reasons = append(reasons, fmt.Sprintf("response too short (%d chars; min 80)", len(trimmed)))
}
for _, re := range fillerPatterns {
if re.MatchString(trimmed) {
reasons = append(reasons, fmt.Sprintf("filler/hedge phrase detected: %s", re.String()))
}
}
// Soft anchor: if a validation checklist was supplied, the response
// should share at least one token with it (≥3 chars per tokenize()).
if bundle != nil && len(bundle.ValidationSteps) > 0 {
checklistTokens := map[string]struct{}{}
for _, s := range bundle.ValidationSteps {
for t := range tokenize(s) {
checklistTokens[t] = struct{}{}
}
}
respTokens := tokenize(trimmed)
overlap := 0
for t := range checklistTokens {
if _, ok := respTokens[t]; ok {
overlap++
}
}
if len(checklistTokens) > 0 && overlap == 0 {
reasons = append(reasons, "response shares no tokens with validation checklist (may not have followed prior patterns)")
}
}
return ValidationResult{Passed: len(reasons) == 0, Reasons: reasonsOrEmpty(reasons)}
}
func reasonsOrEmpty(r []string) []string {
if r == nil {
return []string{}
}
return r
}