lakehouse/reports/kimi/audit-last-week.md
root 41b0a99ed2 chore: add real content that was sitting untracked
Surfaced by today's untracked-files audit. None of these are accidents —
multiple are referenced by name in CLAUDE.md and memory files but were
never added.

Categories:
- docs/PHASE_AUDIT_GUIDE.md (106 LOC) — Claude Code phase audit guidance
- ops/systemd/lakehouse-langfuse-bridge.service — Langfuse bridge unit
- package.json — top-level npm manifest
- scripts/e2e_pipeline_check.sh + production_smoke.sh — real test scripts
- reports/kimi/audit-last-week*.md — the "Two reports live" CLAUDE.md cites
- tests/multi-agent/scenarios/ — 44 staffing scenarios (cutover decision A)
- tests/multi-agent/playbooks/ — 102 playbook records
- tests/battery/, tests/agent_test/PRD.md, tests/real-world/* — real tests
- sidecar/sidecar/{lab_ui,pipeline_lab}.py — 888 LOC dev-only UIs that
  remain in service post-sidecar-drop (commit ba928b1 explicitly kept them)

Sensitivity check: scenarios use synthetic company names ("Heritage Foods",
"Cornerstone Fabrication"); audit reports describe code findings only;
no PII or secrets surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:22:10 -05:00

4.3 KiB
Raw Permalink Blame History

Kimi Forensic Audit — distillation v1.0.0 (last week)

Generated: 2026-04-27 by kimi-for-coding via gateway /v1/chat Latency: 157.6s | finish: stop | usage: {'prompt_tokens': 14014, 'completion_tokens': 6356, 'total_tokens': 20370} Input: /tmp/kimi-audit-input.md (56k chars · 12 commits · 6 files)


Verdict

hold — Runtime lock-in, integration mismatches, and truncated source files in the v1.0.0 payload make the tag unshippable without rework.

What's solid

  • scorer.ts is a pure, deterministic function with no I/O, no LLM calls, and an explicit version stamp (scorer.ts:22).
  • SFT export enforces defense-in-depth contamination firewalls via SFT_NEVER and schema validators (export_sft.ts:49-50; sft_sample.ts:43-48).
  • Evidence materialization is idempotent across reruns using sig_hash deduplication (build_evidence_index.ts:114-126).
  • Mode router falls back to a safe built-in default if config parsing fails (mode.rs:208-228).
  • Quarantine writer abstraction isolates bad records instead of failing the export (export_sft.ts).

What's risky

  • Integration mismatch: replay.ts posts to /v1/chat, but the provided gateway only declares /v1/mode and /v1/mode/execute (replay.ts:186 vs mode.rs:13-18), suggesting an undocumented or broken proxy contract.
  • Bun runtime lock-in: Multiple files depend on Bun.CryptoHasher, which throws in Node.js (export_sft.ts:235; build_evidence_index.ts:89).
  • Unauditable files in scope: Critical files listed in the diff—transforms.ts, receipts.ts, quarantine.ts, score_runs.ts—were not provided, so their logic is unseen.
  • Every shown implementation file is truncated: scorer.ts, export_sft.ts, build_evidence_index.ts, replay.ts, and mode.rs all end mid-block, hiding error handling, receipt finalization, and gateway dispatch code.
  • Type safety escape: (ev as any).contractor in SFT synthesis bypasses the schema layer (export_sft.ts:138).

Specific findings

  1. scripts/distillation/scorer.ts:22SCORER_VERSION reads from process.env, introducing environment-dependent output drift that contradicts the files “identical input → identical output forever” contract.
  2. scripts/distillation/export_sft.ts:138(ev as any).contractor is an unguarded any cast; a malformed EvidenceRecord will inject the string "undefined" or crash at runtime inside the SFT instruction template.
  3. scripts/distillation/export_sft.ts:235new Bun.CryptoHasher("sha256") is a Bun-only API; this path will fail under Node.js/Deno and makes the substrate non-portable.
  4. scripts/distillation/build_evidence_index.ts:89 — Same Bun crypto lock-in in sha256OfFile, fragmenting the hashing implementation (here Bun.CryptoHasher, elsewhere canonicalSha256).
  5. scripts/distillation/replay.ts:178 — Provider routing relies on fragile string heuristics (model.includes("/"), prefix lists); models with unexpected names will route to the wrong backend or hit the ollama default incorrectly.
  6. scripts/distillation/replay.ts:186fetch(${gatewayUrl()}/v1/chat`` targets an endpoint absent from the provided mode.rs router; without the missing gateway dispatch code, this call will 404.
  7. crates/gateway/src/v1/mode.rs:141deserialize_string_or_vec uses serde_json::Value::deserialize against a TOML source, which is non-idiomatic and risks mis-handling TOML-specific types (datetime, inline tables) compared to a native toml::Value.
  8. scripts/distillation/build_evidence_index.ts:185await canonicalSha256(row) is async, yet sha256OfFile is sync; the mixing of sync/async crypto calls in the same module hints at inconsistent I/O boundaries.

Direction recommendation

Keep the substrate architecture, but do not expand staffing audit work on top of v1.0.0 until three blockers are fixed: (1) replace Bun.CryptoHasher with portable WebCrypto or Node crypto so the build is runtime-agnostic; (2) align replay.ts to the actual gateway contract (/v1/mode/execute) or document the /v1/chat proxy route; and (3) eliminate any casts in the export path. The schema firewalls, deterministic scorer, and receipt provenance are the right foundation—rework the runtime/contract gaps rather than rebuilding from scratch.