root 89ca72d471 materializer + replay ports + vectord substrate fix verified at scale
Two threads landing together — the doc edits interleave so they ship
in a single commit.

1. **vectord substrate fix verified at original scale** (closes the
   2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211
   scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix).
   Throughput dropped 1,115 → 438/sec because previously-broken
   scenarios now do real HNSW Add work — honest cost of correctness.
   The fix (i.vectors side-store + safeGraphAdd recover wrappers +
   smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the
   footprint that originally surfaced the bug.

2. **Materializer port** — internal/materializer + cmd/materializer +
   scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts
   (12 transforms) + build_evidence_index.ts (idempotency, day-partition,
   receipt). On-wire JSON shape matches TS so Bun and Go runs are
   interchangeable. 14 tests green.

3. **Replay port** — internal/replay + cmd/replay +
   scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts
   (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL
   phase 7 live invocation on the Go side. Both runtimes append to the
   same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green.

Side effect on internal/distillation/types.go: EvidenceRecord gained
prompt_tokens, completion_tokens, and metadata fields to mirror the TS
shape the materializer transforms produce.

STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions
tracker moves the materializer + replay items from _open_ to DONE and
adds the substrate-fix scale verification row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 03:31:02 -05:00

99 lines
4.0 KiB
Go

// Package replay ports scripts/distillation/replay.ts to Go.
//
// Replay takes a task → retrieves matching playbooks/RAG records →
// builds a context bundle → calls a LOCAL model via the gateway's
// /v1/chat → validates → escalates to a stronger model if needed →
// logs the run as new evidence in `data/_kb/replay_runs.jsonl`.
//
// Spec invariants (carry over from replay.ts):
// - never bypass retrieval (unless caller passes NoRetrieval)
// - never discard provenance
// - never allow free-form hallucinated output (validation gate)
// - log every run as new evidence
//
// This is NOT training — it's runtime behavior shaping via retrieval.
package replay
// ReplayRequest mirrors the TS interface. NoRetrieval skips the
// context bundle entirely (baseline mode for A/B tests). DryRun returns
// a deterministic synthetic response without calling the gateway —
// used by tests to exercise retrieval/validation without an LLM.
type ReplayRequest struct {
Task string
LocalOnly bool
AllowEscalation bool
NoRetrieval bool
DryRun bool
GatewayURL string // overrides $LH_GATEWAY_URL
LocalModel string // overrides default
EscalationModel string // overrides default
}
// RagSample is one record in exports/rag/playbooks.jsonl.
type RagSample struct {
ID string `json:"id"`
Title string `json:"title"`
Content string `json:"content"`
Tags []string `json:"tags"`
SourceRunID string `json:"source_run_id"`
SuccessScore string `json:"success_score"`
SourceCategory string `json:"source_category"`
}
// RetrievedArtifact is one playbook surfaced into a ContextBundle.
type RetrievedArtifact struct {
RagID string `json:"rag_id"`
SourceRunID string `json:"source_run_id"`
Title string `json:"title"`
ContentPreview string `json:"content_preview"` // first 240 chars
SuccessScore string `json:"success_score"`
Tags []string `json:"tags"`
Score float64 `json:"score"`
}
// ContextBundle is what the prompt builder consumes. Empty bundles
// (no retrieved playbooks) still pass through — buildPrompt downgrades
// to a no-context prompt when both accepted and warnings are empty.
type ContextBundle struct {
RetrievedPlaybooks []RetrievedArtifact `json:"retrieved_playbooks"`
PriorSuccessfulOutputs []RetrievedArtifact `json:"prior_successful_outputs"`
FailurePatterns []RetrievedArtifact `json:"failure_patterns"`
ValidationSteps []string `json:"validation_steps"`
BundleTokenEstimate int `json:"bundle_token_estimate"`
}
// ValidationResult is the deterministic gate's verdict. Reasons is
// always non-nil so JSON consumers can iterate without a nil check.
type ValidationResult struct {
Passed bool `json:"passed"`
Reasons []string `json:"reasons"`
}
// ReplayResult is what Replay returns. Mirrors the TS type one-to-one
// so JSONL emitted by either runtime parses identically.
type ReplayResult struct {
InputTask string `json:"input_task"`
TaskHash string `json:"task_hash"`
RetrievedArtifacts RetrievedIDs `json:"retrieved_artifacts"`
ContextBundle *ContextBundle `json:"context_bundle"`
ModelResponse string `json:"model_response"`
ModelUsed string `json:"model_used"`
EscalationPath []string `json:"escalation_path"`
ValidationResult ValidationResult `json:"validation_result"`
RecordedRunID string `json:"recorded_run_id"`
RecordedAt string `json:"recorded_at"`
DurationMs int64 `json:"duration_ms"`
}
// RetrievedIDs is the {rag_ids} envelope the TS shape uses.
type RetrievedIDs struct {
RagIDs []string `json:"rag_ids"`
}
// Defaults match replay.ts. Override via env or ReplayRequest fields.
const (
DefaultLocalModel = "qwen3.5:latest"
DefaultEscalationModel = "deepseek-v3.1:671b"
DefaultGatewayURL = "http://localhost:3110"
)