profit 9c893fbb8c Auditor: hybrid fixture — found a pre-existing bug on first live run
auditor/fixtures/hybrid_38_40_45.ts — the never-before-run hybrid
test. Exercises Phase 38 /v1/chat → Phase 40 Langfuse → Phase 45
slice 1 seed+doc_refs → Phase 45 slice 2 bridge drift → (expected-
fail) Phase 45 slice 3 drift-check endpoint.

auditor/fixtures/cli.ts — standalone runner. Human-readable summary
to stderr, machine-readable JSON to stdout, exit code 0/1/2 for
pass / fail / partial_pass.

Live run results — honest measurements, not hand-waved:
  ✓ Phase 38     /v1/chat returns 9 visible tokens, 6.7s latency
                 ("docker run is a common Docker command.")
  ✓ Phase 40     Langfuse trace 18a8a0b7 landed in 2.5s
  ✗ Phase 45.1   seed endpoint returns empty reply — discovered a
                 PRE-EXISTING BUG unrelated to doc_refs:

                 playbook_memory.rs:257 UpsertOutcome has newtype
                 variants Added(String) and Noop(String) under
                 #[serde(tag="mode")] — serde panics on serialize.

                 panicked at crates/vectord/src/service.rs:2323:
                 Error("cannot serialize tagged newtype variant
                 UpsertOutcome::Added containing a string")

                 Reproduced: curl /seed with AND without doc_refs
                 both get "Empty reply from server" (socket closed
                 mid-response). This bug has existed since Phase 26
                 shipped (commit 640db8c, 2026-04-21). No test or
                 caller in the repo exercised the response path live
                 against the gateway until this fixture did.

  ✓ Phase 45.2   context7 bridge confirms drift: current hash
                 475a0396ca436bba vs our stale input, upstream last
                 updated 2026-04-20
  ✗ Phase 45.3   /doc_drift/check endpoint — correctly unreachable
                 because layer 3 blocked us from getting a playbook_id;
                 endpoint still doesn't exist independent of that

Real numbers published: per-layer latency_ms, token counts,
trace_age_ms, library_id, current_hash_length. All stored in the
JSON output for downstream audit.

Value delivered: the fixture's first live run found a bug that
unit tests, compile checks, and my own "phase shipped" commits all
missed. Exactly the gap J called out — the auditor is doing what
it's supposed to do.

Bug fix is a SEPARATE concern: new task #11 tracks a separate PR
(fix/upsert-outcome-serde) so the audit finding and the fix stay
cleanly attributed.
2026-04-22 03:34:20 -05:00

50 lines
1.8 KiB
TypeScript

// Standalone runner for auditor fixtures. Invoke:
//
// bun run auditor/fixtures/cli.ts hybrid_38_40_45
//
// Prints human-readable per-layer breakdown + JSON result so the
// output can be captured by the dynamic check without re-running.
import { runHybridFixture, type FixtureResult } from "./hybrid_38_40_45.ts";
const fixtureArg = process.argv[2] ?? "hybrid_38_40_45";
let result: FixtureResult;
switch (fixtureArg) {
case "hybrid_38_40_45":
result = await runHybridFixture();
break;
default:
console.error(`unknown fixture: ${fixtureArg}`);
process.exit(2);
}
// Human-readable summary
console.error(""); // blank line to stderr so stdout JSON stays clean
console.error(`─── Fixture: ${result.fixture} ───`);
console.error(` overall: ${result.overall.toUpperCase()}`);
console.error(` shipped: [${result.shipped_phases.join(", ") || "—"}]`);
console.error(` placeholder: [${result.placeholder_phases.join(", ") || "—"}]`);
console.error("");
for (const l of result.layers) {
const mark = l.ok ? "✓" : "✗";
const phaseStr = `Phase ${l.phase}`.padEnd(11);
console.error(` ${mark} ${phaseStr} ${l.layer.padEnd(30)} ${String(l.latency_ms).padStart(6)}ms`);
if (l.ok) {
console.error(` ${l.evidence.slice(0, 200)}`);
} else {
console.error(` ERROR: ${l.error?.slice(0, 240) ?? "unknown"}`);
}
}
console.error("");
for (const n of result.notes) console.error(` note: ${n}`);
console.error("");
// Machine-readable output on stdout.
console.log(JSON.stringify(result, null, 2));
// Exit code reflects overall: 0 pass, 2 partial, 1 fail.
// Dynamic check reads this AND the JSON; partial-pass is treated
// as informative (some layers shipped), not blocking on its own.
process.exit(result.overall === "pass" ? 0 : result.overall === "partial_pass" ? 2 : 1);