Surfaced by today's untracked-files audit. None of these are accidents —
multiple are referenced by name in CLAUDE.md and memory files but were
never added.
Categories:
- docs/PHASE_AUDIT_GUIDE.md (106 LOC) — Claude Code phase audit guidance
- ops/systemd/lakehouse-langfuse-bridge.service — Langfuse bridge unit
- package.json — top-level npm manifest
- scripts/e2e_pipeline_check.sh + production_smoke.sh — real test scripts
- reports/kimi/audit-last-week*.md — the "Two reports live" CLAUDE.md cites
- tests/multi-agent/scenarios/ — 44 staffing scenarios (cutover decision A)
- tests/multi-agent/playbooks/ — 102 playbook records
- tests/battery/, tests/agent_test/PRD.md, tests/real-world/* — real tests
- sidecar/sidecar/{lab_ui,pipeline_lab}.py — 888 LOC dev-only UIs that
remain in service post-sidecar-drop (commit ba928b1 explicitly kept them)
Sensitivity check: scenarios use synthetic company names ("Heritage Foods",
"Cornerstone Fabrication"); audit reports describe code findings only;
no PII or secrets surfaced.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.3 KiB
Kimi Forensic Audit — distillation v1.0.0 (last week)
Generated: 2026-04-27 by kimi-for-coding via gateway /v1/chat
Latency: 157.6s | finish: stop | usage: {'prompt_tokens': 14014, 'completion_tokens': 6356, 'total_tokens': 20370}
Input: /tmp/kimi-audit-input.md (56k chars · 12 commits · 6 files)
Verdict
hold — Runtime lock-in, integration mismatches, and truncated source files in the v1.0.0 payload make the tag unshippable without rework.
What's solid
scorer.tsis a pure, deterministic function with no I/O, no LLM calls, and an explicit version stamp (scorer.ts:22).- SFT export enforces defense-in-depth contamination firewalls via
SFT_NEVERand schema validators (export_sft.ts:49-50;sft_sample.ts:43-48). - Evidence materialization is idempotent across reruns using
sig_hashdeduplication (build_evidence_index.ts:114-126). - Mode router falls back to a safe built-in default if config parsing fails (
mode.rs:208-228). - Quarantine writer abstraction isolates bad records instead of failing the export (
export_sft.ts).
What's risky
- Integration mismatch:
replay.tsposts to/v1/chat, but the provided gateway only declares/v1/modeand/v1/mode/execute(replay.ts:186vsmode.rs:13-18), suggesting an undocumented or broken proxy contract. - Bun runtime lock-in: Multiple files depend on
Bun.CryptoHasher, which throws in Node.js (export_sft.ts:235;build_evidence_index.ts:89). - Unauditable files in scope: Critical files listed in the diff—
transforms.ts,receipts.ts,quarantine.ts,score_runs.ts—were not provided, so their logic is unseen. - Every shown implementation file is truncated:
scorer.ts,export_sft.ts,build_evidence_index.ts,replay.ts, andmode.rsall end mid-block, hiding error handling, receipt finalization, and gateway dispatch code. - Type safety escape:
(ev as any).contractorin SFT synthesis bypasses the schema layer (export_sft.ts:138).
Specific findings
scripts/distillation/scorer.ts:22—SCORER_VERSIONreads fromprocess.env, introducing environment-dependent output drift that contradicts the file’s “identical input → identical output forever” contract.scripts/distillation/export_sft.ts:138—(ev as any).contractoris an unguardedanycast; a malformedEvidenceRecordwill inject the string"undefined"or crash at runtime inside the SFT instruction template.scripts/distillation/export_sft.ts:235—new Bun.CryptoHasher("sha256")is a Bun-only API; this path will fail under Node.js/Deno and makes the substrate non-portable.scripts/distillation/build_evidence_index.ts:89— Same Bun crypto lock-in insha256OfFile, fragmenting the hashing implementation (hereBun.CryptoHasher, elsewherecanonicalSha256).scripts/distillation/replay.ts:178— Provider routing relies on fragile string heuristics (model.includes("/"), prefix lists); models with unexpected names will route to the wrong backend or hit theollamadefault incorrectly.scripts/distillation/replay.ts:186—fetch(${gatewayUrl()}/v1/chat`` targets an endpoint absent from the providedmode.rsrouter; without the missing gateway dispatch code, this call will 404.crates/gateway/src/v1/mode.rs:141—deserialize_string_or_vecusesserde_json::Value::deserializeagainst a TOML source, which is non-idiomatic and risks mis-handling TOML-specific types (datetime, inline tables) compared to a nativetoml::Value.scripts/distillation/build_evidence_index.ts:185—await canonicalSha256(row)is async, yetsha256OfFileis sync; the mixing of sync/async crypto calls in the same module hints at inconsistent I/O boundaries.
Direction recommendation
Keep the substrate architecture, but do not expand staffing audit work on top of v1.0.0 until three blockers are fixed: (1) replace Bun.CryptoHasher with portable WebCrypto or Node crypto so the build is runtime-agnostic; (2) align replay.ts to the actual gateway contract (/v1/mode/execute) or document the /v1/chat proxy route; and (3) eliminate any casts in the export path. The schema firewalls, deterministic scorer, and receipt provenance are the right foundation—rework the runtime/contract gaps rather than rebuilding from scratch.