Two threads landing together — the doc edits interleave so they ship in a single commit. 1. **vectord substrate fix verified at original scale** (closes the 2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput dropped 1,115 → 438/sec because previously-broken scenarios now do real HNSW Add work — honest cost of correctness. The fix (i.vectors side-store + safeGraphAdd recover wrappers + smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the footprint that originally surfaced the bug. 2. **Materializer port** — internal/materializer + cmd/materializer + scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts (12 transforms) + build_evidence_index.ts (idempotency, day-partition, receipt). On-wire JSON shape matches TS so Bun and Go runs are interchangeable. 14 tests green. 3. **Replay port** — internal/replay + cmd/replay + scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL phase 7 live invocation on the Go side. Both runtimes append to the same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green. Side effect on internal/distillation/types.go: EvidenceRecord gained prompt_tokens, completion_tokens, and metadata fields to mirror the TS shape the materializer transforms produce. STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions tracker moves the materializer + replay items from _open_ to DONE and adds the substrate-fix scale verification row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
67 lines
2.0 KiB
Go
67 lines
2.0 KiB
Go
package replay
|
|
|
|
import (
|
|
"fmt"
|
|
"regexp"
|
|
"strings"
|
|
)
|
|
|
|
// fillerPatterns are the hedge phrases the spec rejects. Compiled once
|
|
// per package — the gate runs on every replay call.
|
|
var fillerPatterns = []*regexp.Regexp{
|
|
regexp.MustCompile(`(?i)as an ai`),
|
|
regexp.MustCompile(`(?i)i cannot`),
|
|
regexp.MustCompile(`(?i)i'?m sorry, but`),
|
|
regexp.MustCompile(`(?i)i don'?t have access`),
|
|
regexp.MustCompile(`(?i)i am unable to`),
|
|
}
|
|
|
|
// ValidateResponse runs the deterministic gate on a model response.
|
|
// Empty / too-short / hedge-bearing / context-disconnected responses
|
|
// fail. Matches replay.ts:validateResponse one-to-one.
|
|
func ValidateResponse(response string, bundle *ContextBundle) ValidationResult {
|
|
trimmed := strings.TrimSpace(response)
|
|
var reasons []string
|
|
|
|
if len(trimmed) == 0 {
|
|
return ValidationResult{Passed: false, Reasons: []string{"empty response"}}
|
|
}
|
|
if len(trimmed) < 80 {
|
|
reasons = append(reasons, fmt.Sprintf("response too short (%d chars; min 80)", len(trimmed)))
|
|
}
|
|
for _, re := range fillerPatterns {
|
|
if re.MatchString(trimmed) {
|
|
reasons = append(reasons, fmt.Sprintf("filler/hedge phrase detected: %s", re.String()))
|
|
}
|
|
}
|
|
// Soft anchor: if a validation checklist was supplied, the response
|
|
// should share at least one token with it (≥3 chars per tokenize()).
|
|
if bundle != nil && len(bundle.ValidationSteps) > 0 {
|
|
checklistTokens := map[string]struct{}{}
|
|
for _, s := range bundle.ValidationSteps {
|
|
for t := range tokenize(s) {
|
|
checklistTokens[t] = struct{}{}
|
|
}
|
|
}
|
|
respTokens := tokenize(trimmed)
|
|
overlap := 0
|
|
for t := range checklistTokens {
|
|
if _, ok := respTokens[t]; ok {
|
|
overlap++
|
|
}
|
|
}
|
|
if len(checklistTokens) > 0 && overlap == 0 {
|
|
reasons = append(reasons, "response shares no tokens with validation checklist (may not have followed prior patterns)")
|
|
}
|
|
}
|
|
|
|
return ValidationResult{Passed: len(reasons) == 0, Reasons: reasonsOrEmpty(reasons)}
|
|
}
|
|
|
|
func reasonsOrEmpty(r []string) []string {
|
|
if r == nil {
|
|
return []string{}
|
|
}
|
|
return r
|
|
}
|