Two threads landing together — the doc edits interleave so they ship in a single commit. 1. **vectord substrate fix verified at original scale** (closes the 2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput dropped 1,115 → 438/sec because previously-broken scenarios now do real HNSW Add work — honest cost of correctness. The fix (i.vectors side-store + safeGraphAdd recover wrappers + smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the footprint that originally surfaced the bug. 2. **Materializer port** — internal/materializer + cmd/materializer + scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts (12 transforms) + build_evidence_index.ts (idempotency, day-partition, receipt). On-wire JSON shape matches TS so Bun and Go runs are interchangeable. 14 tests green. 3. **Replay port** — internal/replay + cmd/replay + scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL phase 7 live invocation on the Go side. Both runtimes append to the same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green. Side effect on internal/distillation/types.go: EvidenceRecord gained prompt_tokens, completion_tokens, and metadata fields to mirror the TS shape the materializer transforms produce. STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions tracker moves the materializer + replay items from _open_ to DONE and adds the substrate-fix scale verification row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
99 lines
4.0 KiB
Go
99 lines
4.0 KiB
Go
// Package replay ports scripts/distillation/replay.ts to Go.
|
|
//
|
|
// Replay takes a task → retrieves matching playbooks/RAG records →
|
|
// builds a context bundle → calls a LOCAL model via the gateway's
|
|
// /v1/chat → validates → escalates to a stronger model if needed →
|
|
// logs the run as new evidence in `data/_kb/replay_runs.jsonl`.
|
|
//
|
|
// Spec invariants (carry over from replay.ts):
|
|
// - never bypass retrieval (unless caller passes NoRetrieval)
|
|
// - never discard provenance
|
|
// - never allow free-form hallucinated output (validation gate)
|
|
// - log every run as new evidence
|
|
//
|
|
// This is NOT training — it's runtime behavior shaping via retrieval.
|
|
package replay
|
|
|
|
// ReplayRequest mirrors the TS interface. NoRetrieval skips the
|
|
// context bundle entirely (baseline mode for A/B tests). DryRun returns
|
|
// a deterministic synthetic response without calling the gateway —
|
|
// used by tests to exercise retrieval/validation without an LLM.
|
|
type ReplayRequest struct {
|
|
Task string
|
|
LocalOnly bool
|
|
AllowEscalation bool
|
|
NoRetrieval bool
|
|
DryRun bool
|
|
GatewayURL string // overrides $LH_GATEWAY_URL
|
|
LocalModel string // overrides default
|
|
EscalationModel string // overrides default
|
|
}
|
|
|
|
// RagSample is one record in exports/rag/playbooks.jsonl.
|
|
type RagSample struct {
|
|
ID string `json:"id"`
|
|
Title string `json:"title"`
|
|
Content string `json:"content"`
|
|
Tags []string `json:"tags"`
|
|
SourceRunID string `json:"source_run_id"`
|
|
SuccessScore string `json:"success_score"`
|
|
SourceCategory string `json:"source_category"`
|
|
}
|
|
|
|
// RetrievedArtifact is one playbook surfaced into a ContextBundle.
|
|
type RetrievedArtifact struct {
|
|
RagID string `json:"rag_id"`
|
|
SourceRunID string `json:"source_run_id"`
|
|
Title string `json:"title"`
|
|
ContentPreview string `json:"content_preview"` // first 240 chars
|
|
SuccessScore string `json:"success_score"`
|
|
Tags []string `json:"tags"`
|
|
Score float64 `json:"score"`
|
|
}
|
|
|
|
// ContextBundle is what the prompt builder consumes. Empty bundles
|
|
// (no retrieved playbooks) still pass through — buildPrompt downgrades
|
|
// to a no-context prompt when both accepted and warnings are empty.
|
|
type ContextBundle struct {
|
|
RetrievedPlaybooks []RetrievedArtifact `json:"retrieved_playbooks"`
|
|
PriorSuccessfulOutputs []RetrievedArtifact `json:"prior_successful_outputs"`
|
|
FailurePatterns []RetrievedArtifact `json:"failure_patterns"`
|
|
ValidationSteps []string `json:"validation_steps"`
|
|
BundleTokenEstimate int `json:"bundle_token_estimate"`
|
|
}
|
|
|
|
// ValidationResult is the deterministic gate's verdict. Reasons is
|
|
// always non-nil so JSON consumers can iterate without a nil check.
|
|
type ValidationResult struct {
|
|
Passed bool `json:"passed"`
|
|
Reasons []string `json:"reasons"`
|
|
}
|
|
|
|
// ReplayResult is what Replay returns. Mirrors the TS type one-to-one
|
|
// so JSONL emitted by either runtime parses identically.
|
|
type ReplayResult struct {
|
|
InputTask string `json:"input_task"`
|
|
TaskHash string `json:"task_hash"`
|
|
RetrievedArtifacts RetrievedIDs `json:"retrieved_artifacts"`
|
|
ContextBundle *ContextBundle `json:"context_bundle"`
|
|
ModelResponse string `json:"model_response"`
|
|
ModelUsed string `json:"model_used"`
|
|
EscalationPath []string `json:"escalation_path"`
|
|
ValidationResult ValidationResult `json:"validation_result"`
|
|
RecordedRunID string `json:"recorded_run_id"`
|
|
RecordedAt string `json:"recorded_at"`
|
|
DurationMs int64 `json:"duration_ms"`
|
|
}
|
|
|
|
// RetrievedIDs is the {rag_ids} envelope the TS shape uses.
|
|
type RetrievedIDs struct {
|
|
RagIDs []string `json:"rag_ids"`
|
|
}
|
|
|
|
// Defaults match replay.ts. Override via env or ReplayRequest fields.
|
|
const (
|
|
DefaultLocalModel = "qwen3.5:latest"
|
|
DefaultEscalationModel = "deepseek-v3.1:671b"
|
|
DefaultGatewayURL = "http://localhost:3110"
|
|
)
|