lakehouse/reports/kimi/audit-last-week.md
root 41b0a99ed2 chore: add real content that was sitting untracked
Surfaced by today's untracked-files audit. None of these are accidents —
multiple are referenced by name in CLAUDE.md and memory files but were
never added.

Categories:
- docs/PHASE_AUDIT_GUIDE.md (106 LOC) — Claude Code phase audit guidance
- ops/systemd/lakehouse-langfuse-bridge.service — Langfuse bridge unit
- package.json — top-level npm manifest
- scripts/e2e_pipeline_check.sh + production_smoke.sh — real test scripts
- reports/kimi/audit-last-week*.md — the "Two reports live" CLAUDE.md cites
- tests/multi-agent/scenarios/ — 44 staffing scenarios (cutover decision A)
- tests/multi-agent/playbooks/ — 102 playbook records
- tests/battery/, tests/agent_test/PRD.md, tests/real-world/* — real tests
- sidecar/sidecar/{lab_ui,pipeline_lab}.py — 888 LOC dev-only UIs that
  remain in service post-sidecar-drop (commit ba928b1 explicitly kept them)

Sensitivity check: scenarios use synthetic company names ("Heritage Foods",
"Cornerstone Fabrication"); audit reports describe code findings only;
no PII or secrets surfaced.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 22:22:10 -05:00

37 lines
4.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Kimi Forensic Audit — distillation v1.0.0 (last week)
**Generated:** 2026-04-27 by `kimi-for-coding` via gateway /v1/chat
**Latency:** 157.6s | **finish:** stop | **usage:** {'prompt_tokens': 14014, 'completion_tokens': 6356, 'total_tokens': 20370}
**Input:** /tmp/kimi-audit-input.md (56k chars · 12 commits · 6 files)
---
## Verdict
**hold** — Runtime lock-in, integration mismatches, and truncated source files in the v1.0.0 payload make the tag unshippable without rework.
## What's solid
- `scorer.ts` is a pure, deterministic function with no I/O, no LLM calls, and an explicit version stamp (`scorer.ts:22`).
- SFT export enforces defense-in-depth contamination firewalls via `SFT_NEVER` and schema validators (`export_sft.ts:49-50`; `sft_sample.ts:43-48`).
- Evidence materialization is idempotent across reruns using `sig_hash` deduplication (`build_evidence_index.ts:114-126`).
- Mode router falls back to a safe built-in default if config parsing fails (`mode.rs:208-228`).
- Quarantine writer abstraction isolates bad records instead of failing the export (`export_sft.ts`).
## What's risky
- **Integration mismatch**: `replay.ts` posts to `/v1/chat`, but the provided gateway only declares `/v1/mode` and `/v1/mode/execute` (`replay.ts:186` vs `mode.rs:13-18`), suggesting an undocumented or broken proxy contract.
- **Bun runtime lock-in**: Multiple files depend on `Bun.CryptoHasher`, which throws in Node.js (`export_sft.ts:235`; `build_evidence_index.ts:89`).
- **Unauditable files in scope**: Critical files listed in the diff—`transforms.ts`, `receipts.ts`, `quarantine.ts`, `score_runs.ts`—were not provided, so their logic is unseen.
- **Every shown implementation file is truncated**: `scorer.ts`, `export_sft.ts`, `build_evidence_index.ts`, `replay.ts`, and `mode.rs` all end mid-block, hiding error handling, receipt finalization, and gateway dispatch code.
- **Type safety escape**: `(ev as any).contractor` in SFT synthesis bypasses the schema layer (`export_sft.ts:138`).
## Specific findings
1. `scripts/distillation/scorer.ts:22``SCORER_VERSION` reads from `process.env`, introducing environment-dependent output drift that contradicts the files “identical input → identical output forever” contract.
2. `scripts/distillation/export_sft.ts:138``(ev as any).contractor` is an unguarded `any` cast; a malformed `EvidenceRecord` will inject the string `"undefined"` or crash at runtime inside the SFT instruction template.
3. `scripts/distillation/export_sft.ts:235``new Bun.CryptoHasher("sha256")` is a Bun-only API; this path will fail under Node.js/Deno and makes the substrate non-portable.
4. `scripts/distillation/build_evidence_index.ts:89` — Same Bun crypto lock-in in `sha256OfFile`, fragmenting the hashing implementation (here `Bun.CryptoHasher`, elsewhere `canonicalSha256`).
5. `scripts/distillation/replay.ts:178` — Provider routing relies on fragile string heuristics (`model.includes("/")`, prefix lists); models with unexpected names will route to the wrong backend or hit the `ollama` default incorrectly.
6. `scripts/distillation/replay.ts:186``fetch(`${gatewayUrl()}/v1/chat`` targets an endpoint absent from the provided `mode.rs` router; without the missing gateway dispatch code, this call will 404.
7. `crates/gateway/src/v1/mode.rs:141``deserialize_string_or_vec` uses `serde_json::Value::deserialize` against a TOML source, which is non-idiomatic and risks mis-handling TOML-specific types (datetime, inline tables) compared to a native `toml::Value`.
8. `scripts/distillation/build_evidence_index.ts:185``await canonicalSha256(row)` is async, yet `sha256OfFile` is sync; the mixing of sync/async crypto calls in the same module hints at inconsistent I/O boundaries.
## Direction recommendation
Keep the substrate architecture, but **do not expand staffing audit work on top of v1.0.0 until three blockers are fixed**: (1) replace `Bun.CryptoHasher` with portable WebCrypto or Node `crypto` so the build is runtime-agnostic; (2) align `replay.ts` to the actual gateway contract (`/v1/mode/execute`) or document the `/v1/chat` proxy route; and (3) eliminate `any` casts in the export path. The schema firewalls, deterministic scorer, and receipt provenance are the right foundation—rework the runtime/contract gaps rather than rebuilding from scratch.