Two threads landing together — the doc edits interleave so they ship in a single commit. 1. **vectord substrate fix verified at original scale** (closes the 2026-05-01 thread). Re-ran multitier 5min @ conc=50: 132,211 scenarios at 438/sec, 6/6 classes at 0% failure (was 4/6 pre-fix). Throughput dropped 1,115 → 438/sec because previously-broken scenarios now do real HNSW Add work — honest cost of correctness. The fix (i.vectors side-store + safeGraphAdd recover wrappers + smallIndexRebuildThreshold=32 + saveTask coalescing) holds at the footprint that originally surfaced the bug. 2. **Materializer port** — internal/materializer + cmd/materializer + scripts/materializer_smoke.sh. Ports scripts/distillation/transforms.ts (12 transforms) + build_evidence_index.ts (idempotency, day-partition, receipt). On-wire JSON shape matches TS so Bun and Go runs are interchangeable. 14 tests green. 3. **Replay port** — internal/replay + cmd/replay + scripts/replay_smoke.sh. Ports scripts/distillation/replay.ts (retrieve → bundle → /v1/chat → validate → log). Closes audit-FULL phase 7 live invocation on the Go side. Both runtimes append to the same data/_kb/replay_runs.jsonl (schema=replay_run.v1). 14 tests green. Side effect on internal/distillation/types.go: EvidenceRecord gained prompt_tokens, completion_tokens, and metadata fields to mirror the TS shape the materializer transforms produce. STATE_OF_PLAY refreshed to 2026-05-02; ARCHITECTURE_COMPARISON decisions tracker moves the materializer + replay items from _open_ to DONE and adds the substrate-fix scale verification row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
79 lines
2.2 KiB
Go
79 lines
2.2 KiB
Go
// materializer — Go-side build_evidence_index runner. Reads source
|
|
// JSONL streams in `data/_kb/`, transforms each row to an
|
|
// EvidenceRecord, writes day-partitioned output under `data/evidence/`
|
|
// + an audit-grade receipt under `reports/distillation/<ts>/`.
|
|
//
|
|
// Mirrors the Bun runner at scripts/distillation/build_evidence_index.ts
|
|
// — both runtimes can run against the same root and produce
|
|
// interoperable outputs (per ADR-001 #4: same logic, on-wire
|
|
// JSON shape preserved).
|
|
//
|
|
// Usage:
|
|
//
|
|
// materializer # full run, write outputs
|
|
// materializer -dry-run # count, no writes
|
|
// materializer -root /home/profit/lakehouse # custom repo root
|
|
package main
|
|
|
|
import (
|
|
"flag"
|
|
"fmt"
|
|
"log"
|
|
"os"
|
|
"time"
|
|
|
|
"git.agentview.dev/profit/golangLAKEHOUSE/internal/materializer"
|
|
)
|
|
|
|
func main() {
|
|
root := flag.String("root", defaultRoot(), "lakehouse repo root (defaults to $LH_DISTILL_ROOT or current dir)")
|
|
dryRun := flag.Bool("dry-run", false, "count rows but do not write outputs")
|
|
flag.Parse()
|
|
|
|
recordedAt := time.Now().UTC().Format(time.RFC3339Nano)
|
|
|
|
res, err := materializer.MaterializeAll(materializer.MaterializeOptions{
|
|
Root: *root,
|
|
Transforms: materializer.Transforms,
|
|
RecordedAt: recordedAt,
|
|
DryRun: *dryRun,
|
|
})
|
|
if err != nil {
|
|
log.Fatalf("materializer: %v", err)
|
|
}
|
|
|
|
suffix := ""
|
|
if *dryRun {
|
|
suffix = " (DRY RUN)"
|
|
}
|
|
fmt.Printf("[evidence_index] %d read · %d written · %d skipped · %d deduped%s\n",
|
|
res.Totals.RowsRead, res.Totals.RowsWritten, res.Totals.RowsSkipped, res.Totals.RowsDeduped, suffix)
|
|
for _, s := range res.Sources {
|
|
if !s.RowsPresent {
|
|
fmt.Printf(" %s: (missing — skipped)\n", s.SourceFileRelPath)
|
|
continue
|
|
}
|
|
fmt.Printf(" %s: read=%d wrote=%d skip=%d dedup=%d\n",
|
|
s.SourceFileRelPath, s.RowsRead, s.RowsWritten, s.RowsSkipped, s.RowsDeduped)
|
|
}
|
|
|
|
if !*dryRun {
|
|
fmt.Printf("[evidence_index] receipt: %s\n", res.ReceiptPath)
|
|
fmt.Printf("[evidence_index] validation_pass=%v\n", res.Receipt.ValidationPass)
|
|
}
|
|
|
|
if !res.Receipt.ValidationPass {
|
|
os.Exit(1)
|
|
}
|
|
}
|
|
|
|
func defaultRoot() string {
|
|
if r := os.Getenv("LH_DISTILL_ROOT"); r != "" {
|
|
return r
|
|
}
|
|
if cwd, err := os.Getwd(); err == nil {
|
|
return cwd
|
|
}
|
|
return "."
|
|
}
|