lakehouse/STATE_OF_PLAY.md
root 87b034f5f9 phase 1.6: ops dashboard + consent_versions allowlist + subject timeline tool
Closes the afternoon's "all four" wave (per J's request to do all the
items in one pass instead of pick-one-of-options):

(1) Live demo on WORKER-100 — full lifecycle exercised end-to-end
    against the running gateway. 3 audit rows landed in correct
    order (consent_grant → biometric_collection →
    consent_withdrawal), chain_verified=true, photo on disk at
    data/biometric/uploads/WORKER-100/1778011967957907731_027b6bb1.jpg
    (180 bytes JFIF). retention_until=2026-06-04 (30d from
    withdrawal per consent template v1 §2).

(2) GET /biometric/stats — read-only aggregate over all subjects.
    Returns counts by biometric.status + subject.status, photo
    count, oldest_active_retention_until, and the last 20
    state-change events (consent_grant / collection / withdrawal /
    erasure — validator_lookup and other noise filtered out).
    Walks per-subject audit logs via the existing writer; cheap
    for 100 subjects, would want an event-stream index at 100k.
    Legal-tier auth (same posture as /audit). 4 unit tests.

(3) /biometric/dashboard mcp-server frontend. Auto-refreshes
    /biometric/stats every 15s, neo-brutalist tile layout for
    the per-status counts + retention horizon block + recent
    events table with kind badges + event-kind breakdown pills.
    sessionStorage-backed token; logout button clears state.
    DOM-built throughout (textContent + createElement) — never
    innerHTML on audit-row values, since trace_id et al. could
    in theory carry operator-supplied strings.

(4) consent_versions allowlist. BiometricEndpointState gains
    `allowed_consent_versions: Option<Arc<HashSet<String>>>`,
    loaded at startup from /etc/lakehouse/consent_versions.json
    (override via LH_CONSENT_VERSIONS_FILE). process_consent
    refuses unknown hashes with HTTP 400 consent_version_unknown
    when configured. Resolution semantics:
      - Missing file → permissive (v1 compat, warn-log)
      - Parse error → permissive (error-log; broken config
        silently going strict would be worse)
      - Empty array → strict, refuse all (deliberate freeze
        mode for "counsel hasn't signed v1 yet")
      - Populated → strict, lowercase-normalized comparison
    5 unit tests (known/unknown/case/empty/none-permissive).
    Example template at ops/consent_versions.example.json with
    a counsel-tier deployment note.

(5) scripts/staffing/subject_timeline.sh — operator one-shot
    pretty-print of any subject's full BIPA lifecycle. Curls
    /audit/subject/{id} with legal token; renders manifest
    summary + on-disk photo state + chronological audit chain
    with kind badges + chain verification status. Smoke-tested
    on WORKER-100 (3 rows verified).

(6) STATE_OF_PLAY.md refresh. New section "afternoon wave"
    captures all four commits (76cb5ac, 7f0f500, 68d226c, this
    one) + the live demo evidence + the v1 endpoint matrix +
    UI/CLI inventory + the production-cutover blocking set
    (counsel calendar only — eng substrate is done).

Verified live post-restart:
- /audit/health + /biometric/health both 200
- /biometric/stats returns 100 subjects, 2 withdrawn (WORKER-2 from
  earlier scrum + WORKER-100 from today's demo), 1 photo on record,
  6 recent state-change events
- /biometric/intake + /biometric/withdraw + /biometric/dashboard
  all 200 on mcp-server :3700
- subject_timeline.sh on WORKER-100: chain_verified=true,
  chain_root=a47563ff937d50de…
- 88/88 catalogd lib tests + 55/55 biometric_endpoint tests green

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 15:27:52 -05:00

453 lines
45 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# STATE OF PLAY — Lakehouse
**Last verified:** 2026-05-05 morning CDT
**Verified by:** live probe (`/audit/health` 200, `/biometric/subject/{id}/erase` 21-test substrate + `/audit/subject/{id}` legal-tier endpoint live verified against WORKER-100; new `verify_biometric_erasure.sh` + `biometric_destruction_report.sh` + `bundle_counsel_packet.sh` smoke-tested clean against live data) — not memory.
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
---
## WHAT LANDED 2026-05-05 (afternoon wave — full BIPA lifecycle endpoints + UIs + ops dashboard)
After the morning's doc-reconciliation wave (described below), the afternoon wave shipped the full **operational lifecycle** for biometric data. Phase 1.6 now has the complete end-to-end runtime: candidate consent flow → photo upload → withdrawal → retention sweep flag → erase. Five commits, all on `demo/post-pr11-polish-2026-04-28`.
| Commit | What | Verified |
|---|---|---|
| `76cb5ac` | `POST /biometric/subject/{id}/consent` — Gate 2 backend. Flips status NeverCollected→Given, computes retention_until = given_at + 18mo (per retention schedule v1 §4), captures consent_version_hash + collection_method (closed set: electronic_signature/paper/click_acceptance) + operator + evidence path to audit row. State machine: 409 already_given, 409 post-withdrawal-requires-erase, 403 subject_inactive. | 12 unit tests; live POST returns 401/400/404 on guards |
| `7f0f500` | Candidate intake UI at `/biometric/intake?candidate_id=XXX` (mcp-server :3700, exposed via nginx at devop.live/lakehouse/biometric/intake). 4-screen flow: operator auth → consent template render + click-accept → photo capture (file or webcam getUserMedia) → confirmation showing audit hmac. SHA-256 of rendered consent block becomes consent_version_hash. sessionStorage-backed token (clears on tab close), neo-brutalist style matching onboard.html. | 22631-byte HTML, mcp-server route returns 200 |
| `68d226c` | Three things in one wave: <br>(a) `POST /biometric/subject/{id}/withdraw` — BIPA right of withdrawal. Flips Given→Withdrawn, accelerates retention_until from 18mo→30d (per consent template v1 §2 SLA), audit kind=biometric_consent_withdrawal. State machine 409s on NeverCollected/Pending (nothing_to_withdraw), Withdrawn (already_withdrawn), Expired (already_expired). 12 unit tests. <br>(b) Withdraw UI at `/biometric/withdraw` — 3-screen operator flow (token+name auth → reason+evidence form → confirmation showing 30-day clock + verify curl recipe). <br>(c) `lakehouse-retention-sweep.{service,timer}` systemd units in `ops/systemd/`. Daily 03:00 UTC, Persistent=true, install.sh updated to handle paired timer+oneshot service. <br>Plus operator_of_record bug fix in intake UI (was hardcoded `'intake_ui_operator'`). | 46/46 biometric_endpoint + 71/71 catalogd lib tests; manual sweep run: 100 subjects, 0 overdue, exit 0 |
| **(current HEAD post-this-wave)** | Stats endpoint `GET /biometric/stats` (legal-tier auth, returns subject counts by status + photo count + oldest active retention + last 20 state-change events with anonymizable trace_ids) + ops dashboard at `/biometric/dashboard` (single-page, polls /stats every 15s, table + status tiles, XSS-safe DOM construction not innerHTML). Plus consent_versions allowlist: `BiometricEndpointState.allowed_consent_versions: Option<Arc<HashSet<String>>>`, loaded from `/etc/lakehouse/consent_versions.json` (`LH_CONSENT_VERSIONS_FILE` override), missing-file = permissive (v1 compat), present + populated = strict mode (refuses unknown hashes with 400 consent_version_unknown). Plus `scripts/staffing/subject_timeline.sh` — pretty-prints any subject's full BIPA lifecycle from /audit/subject/{id} (manifest + on-disk photo + chronological audit chain + verification status). | 5 new allowlist unit tests + 4 stats tests; live demo on WORKER-100 ran end-to-end (consent → photo → withdraw, chain verified=true, chain_root=a47563ff…) |
### Live demo evidence (WORKER-100, 2026-05-05 20:12 UTC)
The full lifecycle was exercised against the live gateway as a verification artifact. The audit chain on WORKER-100 now contains 3 rows:
```
20:12:33.054 BIOMETRIC_CONSENT_GRANT result=given hmac=9c6f4153341e97d2… trace=live-demo-2026-05-05
20:12:47.957 BIOMETRIC_COLLECTION result=success hmac=856be6173c88277c… trace=live-demo-2026-05-05
20:15:27.298 BIOMETRIC_CONSENT_WITHDRAWAL result=withdrawn hmac=a47563ff937d50de… trace=live-demo-2026-05-05
```
`chain_verified=true`, chain_root = a47563ff937d50de43b09a0c903cff954233836c219a928ee8ca2aa6792272dd. Photo file at `data/biometric/uploads/WORKER-100/1778011967957907731_027b6bb1.jpg` (180 bytes — a minimal real JFIF JPEG), retention_until=2026-06-04 (= 30 days from withdrawal). Retention sweep will flag this subject on or after that date; operator runs `/biometric/subject/WORKER-100/erase` to destroy.
To re-verify: `./scripts/staffing/subject_timeline.sh WORKER-100`.
### Endpoint matrix (v1 BIPA lifecycle complete)
| Event | Endpoint | Method | Auth | Status flip | retention_until |
|---|---|---|---|---|---|
| consent given | `/biometric/subject/{id}/consent` | POST | legal | NeverCollected/Pending → Given | now + 18mo |
| photo collected | `/biometric/subject/{id}/photo` | POST | legal + consent gate | (no change) | (no change) |
| consent withdrawn | `/biometric/subject/{id}/withdraw` | POST | legal | Given → Withdrawn | now + 30d |
| destruction | `/biometric/subject/{id}/erase` | POST | legal | (manifest cleared) | n/a |
| audit read | `/audit/subject/{id}` | GET | legal | (read-only) | (read-only) |
| ops aggregates | `/biometric/stats` | GET | legal | (read-only) | (read-only) |
UIs:
- `/biometric/intake?candidate_id=X` — operator-driven consent + photo
- `/biometric/withdraw` — operator-driven withdrawal recording
- `/biometric/dashboard` — read-only ops aggregate, auto-refresh
CLI tools:
- `scripts/staffing/verify_biometric_erasure.sh <id>` — post-erasure verification
- `scripts/staffing/biometric_destruction_report.sh --month YYYY-MM` — anonymized monthly report
- `scripts/staffing/subject_timeline.sh <id>` — full lifecycle pretty-print (NEW 2026-05-05)
- `scripts/staffing/bundle_counsel_packet.sh` — counsel review tarball
- `scripts/staffing/attest_pre_identityd_biometric_state.sh` — defense attestation generator
### What's blocking production cutover NOW
**Counsel calendar.** Engineering substrate is done end-to-end: every state transition has a defensible endpoint, every endpoint has tests + live verification, every UI is reachable, retention sweep is scheduled, allowlist hardening is wired. The remaining work is signature/review:
1. Counsel review of consent template v1 (revised for Option C — classifications deferred)
2. Counsel review of retention schedule v1 (revised for Option C)
3. Counsel review of destruction runbook
4. Counsel + J signatures on §2 attestation
5. Once counsel signs the consent template, populate `/etc/lakehouse/consent_versions.json` with the signed hash to flip the gateway from permissive to strict mode
Counsel-review packet at `reports/counsel/counsel_packet_2026-05-05.tar.gz` (regenerable via `bundle_counsel_packet.sh` to pick up the latest doc state).
---
## WHAT LANDED 2026-05-05 (morning wave — doc reconciliation + Gate 3b decision + counsel packet)
This was a **doc-only wave**, not code. Background: J asked for an audit of the BIPA/biometric documentation before production cutover. Audit found moderate fragmentation between docs and shipped code (post-`identityd` collapse, post-Gate-3a-ship, pre-Gate-3b-decision). Closed it in one pass.
| Item | What changed | Status |
|---|---|---|
| **Gate 3b — DECIDED: Option C (defer classifications)** | `BiometricCollection.classifications` stays `Option<JSON> = None` in v1. `docs/specs/GATE_3B_DEEPFACE_DESIGN.md` status flipped from "draft / awaits product" to "DECIDED 2026-05-05". | Locked |
| **Endpoint-path drift** | `PHASE_1_6_BIPA_GATES.md` (3 spots), `BIPA_DESTRUCTION_RUNBOOK.md` (2 spots), `biometric_retention_schedule_v1.md` (1 spot) updated from legacy `/v1/identity/subjects/*` (proposed under separate identityd daemon, never shipped) to actual `/biometric/subject/*` (catalogd-local, shipped `848a458`). Schema block in `PHASE_1_6_BIPA_GATES.md` rewritten to reflect JSON `SubjectManifest.biometric_collection` substrate (not the proposed Postgres `subjects` table). | Reconciled |
| **Consent template + retention schedule** | Both revised for Option C: removed all "automated facial-classification" / "deepface" language so disclosed scope matches implemented scope. Pending counsel review — they were already eng-staged with ⚖ markers. | Eng-staged for counsel |
| **`scripts/staffing/verify_biometric_erasure.sh`** (NEW) | Operator-side verification of an erasure event. Curls `/audit/subject/{id}` with legal-tier token, checks: manifest.biometric_collection null, uploads dir empty, last audit row is `biometric_erasure`/`full_erasure` with `erased`/`success`, chain_verified=true. Writes a hashed report to `reports/biometric/`. | Smoke-tested live |
| **`scripts/staffing/biometric_destruction_report.sh`** (NEW) | Monthly destruction-event aggregation. Anonymizes candidate IDs (sha256-12 prefix), counts by scope + trigger, flags anomalies. Smoke-test on May 2026 data found 1 historical `biometric_erasure`/`consent_withdrawal` event (test fixture). | Smoke-tested live |
| **`docs/runbooks/LEGAL_AUDIT_KEY_ROTATION.md`** (NEW) | Captures the rotation procedure operationalized after the 2026-05-05 `/tmp` wipe incident. Covers: when to rotate, pre-rotation snapshot, atomic-swap procedure, post-rotation verification (incl. expected pre-rotation chain tamper-detect under new key), recovery from lost key, ⚖ counsel notes. | Authored |
| **`docs/counsel/COUNSEL_REVIEW_PACKET_2026-05-05.md`** + `bundle_counsel_packet.sh` (NEW) | Cover note bundling all eng-staged BIPA docs for counsel review with per-doc questions, sign-off checklist, recommended review sequence. Bundler script tarballs the 8 referenced files + emits a SHA-256 manifest. Tarball ready for transmission: `reports/counsel/counsel_packet_2026-05-05.tar.gz`. | Bundled, ready to send |
### Eng follow-up that this wave surfaced
- **Double-upload file leak — DIAGNOSED + FIXED** (2026-05-05 same wave). `verify_biometric_erasure.sh` smoked WORKER-2 and surfaced a stranded photo file. Investigation showed:
- The file was 13 bytes of test fixture (`ff d8 ff d9 + ASCII "TESTBYTES"`), byte-identical to the unit-test fixture at `biometric_endpoint.rs:841`. NO PII, NO biometric content, NO synthetic-face content. Came from manual integration testing on 2026-05-03.
- Audit log timeline showed two consecutive uploads (09:54, 10:04) followed by one erasure (10:22). The erasure unlinked only the SECOND file (which the manifest pointed at by then); the first file was orphaned because the second upload had silently overwritten `manifest.data_path`.
- **Real bug found**: the upload handler did NOT refuse a second upload to a subject with `biometric_collection.is_some()`. Patched `process_upload` to return HTTP 409 + `error: "biometric_already_collected"` when a re-upload is attempted; operator must explicitly POST `/biometric/subject/{id}/erase` first.
- Stranded test file removed (`rm` of the 13-byte fixture).
- New unit test `second_upload_without_erase_returns_409` asserts the 409 + that the first photo's data_path remains unchanged + that the first file remains untouched on disk.
- Existing `repeated_uploads_grow_the_chain` replaced with `upload_erase_upload_grows_the_chain_cleanly` (covers the legitimate re-collection cycle: upload → erase → upload, chain grows to 3 rows).
- Existing `content_type_with_parameters_accepted` test updated to use two distinct subjects (it had used one subject for two uploads to test content-type parsing — now would 409).
- **22 biometric_endpoint tests + 59 catalogd lib tests all green** post-patch (was 21+58 pre-patch).
- Production posture: gateway binary needs rebuild (`cargo build --release`) + `systemctl restart lakehouse.service` to pick up the new 409 behavior in live traffic.
- **Pre-rotation chain tamper-detect (expected, not a bug).** WORKER-{1..5} had pre-2026-05-05 audit chains under the prior `LH_SUBJECT_AUDIT_KEY`. Under the new key (post-`/tmp` wipe rotation), those chains correctly tamper-detect. The rotation runbook §4.4 documents this as expected; a §2.2 pre-rotation snapshot is what would prove they were intact pre-rotation if defensibility ever needs it.
### What's blocking production cutover NOW (after this wave)
- **Counsel calendar:** the four sign-off items in `COUNSEL_REVIEW_PACKET_2026-05-05.md` (retention schedule, consent template, destruction runbook, pre-identityd attestation). The packet tarball is ready; ⚖ counsel is the bottleneck.
- **Nothing else.** Engineering is no longer the long pole.
### Phase 1.6 BIPA gates — status table (this is the final post-Option-C state)
| # | Gate | Status |
|---|---|---|
| 1 | Public retention schedule | **eng-staged**, revised for Option C, ready for counsel |
| 2 | Informed consent template | **eng-staged**, revised for Option C, ready for counsel |
| 3a | Photo upload endpoint | **DONE** (shipped `f1fa6e4`, 11 unit tests, live verified) |
| 3b | Deepface classification | **DECIDED 2026-05-05: Option C (defer)** |
| 4 | Name → ethnicity inference removal | **DONE** (shipped, 4/4 mcp-server tests pass) |
| 5 | Destruction runbook + erasure endpoint | **eng-DONE** (`848a458`, 21 tests). Runbook scripts (verify + report) shipped 2026-05-05. Counsel review pending. |
| §2 | Pre-identityd attestation | **eng-DONE** (3/3 evidence checks). Awaits J + counsel signature. |
| §3 | Employee training | **deferred** (consolidated into runbook §7 acknowledgment for current operator population) |
---
## WHAT LANDED 2026-05-03 (16 commits this wave — local-first audit substrate + Phase 1.6 BIPA gates)
The dominant work today: **`docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md` Steps 1-8 SHIPPED end-to-end** + **5 of 7 Phase 1.6 BIPA pre-launch gates** + **6th cross-runtime parity probe**. Wave was structured as eight ship-then-scrum cycles — every wave caught real bugs, every fix wave landed within the same session.
| Commit | What | Verified |
|---|---|---|
| `d259909` | catalogd: Step 1 — `SubjectManifest` type + Registry CRUD | 17 catalogd subject tests PASS |
| `d16131b` | catalogd: Step 2 — `SubjectAuditWriter` HMAC-SHA256 chain + per-subject Mutex + canonical-JSON via BTreeMap | tamper-detection + concurrent-append race tests PASS |
| `bce6dfd` | catalogd: Step 3 — `bin/backfill_subjects` (BIPA-defensible defaults: vertical=unknown, consent=pending_backfill_review, retention=4yr) | 100 subjects loaded into live catalog |
| `fef1efd` | gateway: Step 4 — wire SubjectAuditWriter into `/v1/chat` tool dispatch + `audit_subject_hits_in` (inline, not spawn) | tool calls log accessor.kind="gateway_lookup" |
| `cd8c59a` | gateway: Step 5 — `AuditingWorkerLookup` decorator wraps validator's WorkerLookup; spawns audit on every find() | live `/v1/validate` produces audit rows |
| `e38f357` | subjects Steps 1-4 fixes from cross-lineage scrum (concurrency race, schema-evolution HMAC drift, hardcoded "success" classifier) | 41 tests green |
| `15cfd76` | catalogd + gateway: Step 6 — `/audit/subject/{id}` legal-tier endpoint with constant-time-eq token check + tampering detection | live curl returns chain_verified=true |
| `2a4b316` | subjects 2nd scrum fix wave (token min 16→32, chain_root from full chain via `chain_tip()`, rebuild collision warn, tightened result-state heuristic) | 17 catalogd + 6 gateway tests PASS |
| `8fc6238` | catalogd: Step 7 — `bin/retention_sweep` (BIPA-aware on biometric clock, idempotent across daily runs, no auto-mutation) | 8 sweep tests PASS, live verified at `--as-of 2031-06-01` flagging 100/100 expired |
| `2413c96` | catalogd: Step 8 — `bin/parity_subject_audit` (Rust side of cross-runtime parity probe) | known-answer + verify modes match Go byte-for-byte |
| `2222227` | parity helper hardening (panic-noise → die() helper, abs path stripped from doc) from scrum | parity probe still 6/6 |
| `4708717` | **Phase 1.6 BIPA wave** — Gate 4 absence test (4/4 with bypass coverage), §2 attestation script (3/3 evidence checks), Gate 1/2/5 doc scaffolds with ⚖ COUNSEL markers | 4/4 mcp-server Bun tests, 3/3 evidence on live data |
| `c7aa607` | Phase 1.6 scrum fixes — schema fingerprint hashes name+type+nullable, Gate 4 catches object-literal + class-field bypasses, pyarrow dep gate, item 7 deferral rationale | 4/4 + 3/3 still pass |
| `f1fa6e4` | **Phase 1.6 Gate 3a**`crates/catalogd/src/biometric_endpoint.rs`: `POST /biometric/subject/{id}/photo` with consent gate, quarantined storage (mode 0700/0600), audit chain link, `BiometricCollection` field on SubjectManifest | 11 unit tests PASS, live roundtrip 200 |
| `3708e6a` | Gate 3a scrum fixes — transactional rollback on audit failure (BIPA convergent BLOCK), Content-Type parameter handling, relative data_path, ts+uuid filename, dead code removed | 11 tests + cross-runtime parity 6/6 |
| `7e0112b` | retention_sweep: stray indent fix on biometric_collection field | sweep tests still 8/8 |
| `848a458` | **Phase 1.6 Gate 5**`POST /biometric/subject/{id}/erase` per BIPA destruction runbook. Two scopes (biometric_only / full); audit row appended BEFORE photo unlink so the chain has legal proof of intent even if file delete fails; manifest rolled back on audit failure. Trigger taxonomy: retention_expiry / consent_withdrawal / rtbf / court_order. | 21 unit tests (10 erasure-specific) PASS |
| `8ec43e0` | **Phase 1.6 Gate 3b** — deepface integration design doc (Option A subprocess / Option B ONNX-in-Rust / **Option C defer**). Recommends C: BIPA-safest, classifications field stays None, all load-bearing surfaces (consent + audit + retention + erasure) ship without it. Forces "do we actually need classifications" to be answered by product, not spec inertia. | doc-only |
**Cross-runtime parity (post-this-wave):** 6 probes, 38/38 byte-identical assertions —
`validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8) + subject_audit(6/6)`.
Run `cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done` to re-verify.
**Three runtime-divergence classes caught + fixed by the parity probe authoring loop** (cataloged because they recur):
1. Go `omitempty` on string fields strips empty values that Rust serde always emits → broken HMAC
2. `time.RFC3339Nano` truncates trailing-zero nanoseconds where chrono `AutoSi` keeps 9 digits → broken HMAC
3. Go `json.Marshal` HTML-escapes `<>&` where serde keeps literal → broken HMAC on any field with those chars
All three have regression-locked tests; structural impossibility going forward.
**Phase 1.6 BIPA pre-launch gates — status table:**
| Item | Status | Evidence |
|---|---|---|
| Gate 1 — public retention schedule | eng-staged, ⚖ counsel pending | `docs/policies/consent/biometric_retention_schedule_v1.md` |
| Gate 2 — informed consent template | eng-staged, ⚖ counsel pending | `docs/policies/consent/biometric_consent_template_v1.md` |
| Gate 3a — photo-upload endpoint | **DONE** | 11 unit tests + live `POST /biometric/subject/{id}/photo` |
| Gate 3b — deepface classification | **design doc shipped** (`8ec43e0`) — recommends Option C (defer); awaits product confirmation | `docs/PHASE_1_6_BIPA_GATES.md` Gate 3b section |
| Gate 4 — name→ethnicity removal | **DONE** | `mcp-server/phase_1_6_gate_4.test.ts` 4/4 with bypass coverage |
| Gate 5 — destruction runbook + erasure endpoint | **eng-DONE** (`848a458`); ⚖ counsel review of runbook still pending | `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` + `POST /biometric/subject/{id}/erase` (21 tests) |
| §2 cryptographic attestation | eng-DONE, signature pending | `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (SHA-256 evidence hash, 3/3 checks pass on live data) |
| §3 employee training | deferred | conditional on operator population size |
**Calendar bottleneck:** counsel review of items 1/2/5-runbook/§2 attestation. Engineering long pole is Gate 3b (deepface) — design doc landed (`8ec43e0`); needs product confirmation that classifications are required before engineering starts. Recommendation in doc is Option C (defer) on BIPA-safety grounds.
**Operational state:**
- `LH_SUBJECT_AUDIT_KEY=/etc/lakehouse/subject_audit.key` (32-byte HMAC signing key, mode 0400) loaded into systemd unit. **Moved off /tmp 2026-05-05** — /tmp wipes on reboot, which on May 5 disabled `/audit` + `/biometric` endpoints (gateway fails-closed at `crates/gateway/src/main.rs:459` if signing key is absent). Persistent path is per spec line 112.
- `LH_LEGAL_AUDIT_TOKEN_FILE=/etc/lakehouse/legal_audit.token` (44-char legal-tier token, mode 0400) loaded into systemd unit
- **Key rotation 2026-05-05:** prior key was lost when /tmp wiped on reboot. New key generated at canonical path. The 5 pre-rotation audit chains for `WORKER-{1..5}` (backfill data with `consent=pending_backfill_review`) will tamper-detect under the new key — expected and correct behavior on key rotation, not a bug. New chain entries from 2026-05-05 forward verify cleanly.
- `data/_catalog/subjects/` holds 100 backfilled `WORKER-N.json` manifests + per-subject `WORKER-N.audit.jsonl` HMAC chains
- `data/biometric/uploads/<safe_id>/<ts>_<uuid>.<ext>` quarantined photo storage (mode 0700 dir / 0600 file). 2 photos uploaded for WORKER-2 during live verify.
- `/audit/subject/{id}` mounted on gateway with chain_verified=true on every probe
- `/biometric/subject/{id}/photo` mounted on gateway, refuses 403 without `consent.biometric.status="given"`
---
## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits — Lance gauntlet + cross-runtime parity wave)
| Commit | What | Verified |
|---|---|---|
| `5d30b3d` | lance: auto-build doc_id btree in `lance_migrate` handler | doc-fetch ~5ms (was ~100ms full scan) on scale_test_10m |
| `044650a` | lance-bench: same scalar build post-IVF (matches gateway) | cargo check clean |
| `7594725` | lance: 4-pack — `sanitize_lance_err` + 7 unit tests + 9-probe smoke + 10M re-bench | smoke 9/9 PASS, tests 7/7 PASS |
| `98b6647` | gateway: `IterateResponse.trace_id` echoed; session_log_path enabled | parity probes see one unified JSONL |
| `57bde63` | gateway: trace-id propagation + coordinator session JSONL (Rust parity with Go wave) | session_log_parity 4/4 |
| `ba928b1` | aibridge: drop Python sidecar from hot path; AiClient → direct Ollama | aibridge tests 32/32 PASS, /ai/embed live 768d |
| `654797a` | gateway: pub `extract_json` + `parity_extract_json` bin | extract_json_parity 12/12 |
| `c5654d4` | docs: pointer to `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md` | — |
| `150cc3b` | aibridge: LRU embed cache, 236× RPS warm (78ms → 129us p50) | load test |
| `9eed982` | mcp-server: /_go/* pass-through for G5 cutover slice | — |
| `6e34ef7` | gitignore: stop tracking 100+ runtime ephemera (data/_*, lance, logs, node_modules) | untracked dropped 100+ → 0 |
| `41b0a99` | chore: add 33 real items that were sitting untracked (scripts, scenarios, kimi reports, dev UIs) | clean working tree |
**Cross-runtime parity (post-this-wave):** 32/32 across 5 probes — `validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8)`. Run `cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done` to re-verify.
**Lance backend (was untested 5 days ago, now gauntlet-ready):**
- `cargo test -p vectord-lance --release` → 7/7 PASS
- `./scripts/lance_smoke.sh` → 9/9 PASS against live gateway
- `reports/lance_10m_rebench_2026-05-02.md` — search warm ~20ms / cold ~46ms median, doc-fetch ~5ms post-btree
---
## VERIFIED WORKING RIGHT NOW
### The client demo (Staffing Co-Pilot)
**Public URL:** `https://devop.live/lakehouse/` — 200, "Staffing Co-Pilot" (159 KB SPA, leaflet maps, dark theme).
**Local URL:** `http://localhost:3700/` — same page, served by `mcp-server/index.ts` (PID 1271, started 09:48 CDT today).
**The staffers console** (the one the client was thoroughly impressed with):
- `https://devop.live/lakehouse/console` — 200, "Lakehouse — What Your Staffing System Would Do" (26 KB)
- Pulls project index via `/api/catalog/datasets` (36 datasets) + playbook memory via `/api/vectors/playbook_memory/stats` (4,701 entries with embeddings, real ops like *"fill: Maintenance Tech x2 in Milwaukee, WI"*)
Client-visible flow that works end-to-end on the public URL:
| Endpoint | Sample output |
|---|---|
| `GET /api/catalog/datasets` | 36 datasets indexed: timesheets 1M, call_log 800K, workers_500k 500K, email_log 500K, workers_100k 100K, candidates 100K, placements 50K, job_orders 15K, successful_playbooks_live 2,077 |
| `GET /api/vectors/playbook_memory/stats` | 4,701 fill operations with embeddings |
| `GET /system/summary` | 36 datasets, 2.98M rows, 60 indexes, 500K workers loaded, 1K candidates |
| `POST /intelligence/staffing_forecast` | 744 Production Workers needed in 30d, 11,281 bench (4,687 reliable), coverage 1,444%, risk=ok. Same for Electrician (need 32, bench 2,440) and Maintenance Tech (need 17, bench 5,004). |
| `POST /intelligence/permit_contracts` | permit `3442956` $500K → 3 Production Workers, 886-candidate pool, 95% fill, $36K gross. 5 more Chicago permits with 8 workers each, same pool, 95% fill, $96K each. |
| `POST /intelligence/market` | major Chicago permits ranked: $730M O'Hare, $615M 307 N Michigan, $580M casino, $445M Loop transit (real geo coords). |
| `POST /intelligence/permit_entities` | architects + contractors from permit contacts (e.g. "KACPRZYNSKI, ANDY", "SLS ELECTRICAL SERVICE"). |
| `POST /intelligence/activity` + `/intelligence/arch_signals` + `/intelligence/chat` | all 200 |
The demo tells the story: *"upcoming Chicago contracts → workers needed → coverage from the bench → architects/contractors involved → revenue and margin."* That's the "live data + anticipating contracts + complete workflow" pitch — working as of right now.
### Backend, verified live this session
| Surface | State |
|---|---|
| Gateway `:3100` | up, 4 providers configured, `/v1/health` 200 with 500K workers loaded |
| MCP server `:3700` (Co-Pilot demo) | up, all `/intelligence/*` endpoints respond |
| VCP UI `:3950` | started this session, `/data/*` 200, real numbers |
| Observer `:3800` | ring full (2,000/2,000) — older events evicted, query Langfuse for 24h-ago state |
| Sidecar `:3200` | up |
| Langfuse `:3001` | recording, `gw:/log` + `v1.chat:openrouter` traces visible |
| LLM Team UI `:5000` | up, only `extract` mode registered |
| OpenCode fleet | **40 models reachable through one `sk-*` key** (verified live `GET https://opencode.ai/zen/v1/models`) |
OpenCode catalog (live):
- Claude: opus-4-7, opus-4-6, opus-4-5, opus-4-1, sonnet-4-6, sonnet-4-5, sonnet-4, haiku-4-5
- GPT-5: 5.5-pro, 5.5, 5.4-pro, 5.4, 5.4-mini, 5.4-nano, 5.3-codex-spark, 5.3-codex, 5.2, 5.2-codex, 5.1-codex-max, 5.1-codex, 5.1-codex-mini, 5.1, 5-codex, 5-nano, 5
- Gemini: 3.1-pro, 3-flash
- GLM: 5.1, 5
- Minimax: m2.7, m2.5
- Kimi: k2.6, k2.5
- Qwen: 3.6-plus, 3.5-plus
- Other: BIG-PKL (was a typo-prone name in the catalog, model id starts with "big-pkl-something")
- Free tier: minimax-m2.5-free, hy3-preview-free, ling-2.6-flash-free, trinity-large-preview-free
### The substrate (frozen — do not re-architect)
- Distillation v1.0.0 at tag `e7636f2`**145/145 bun tests pass, 22/22 acceptance, 16/16 audit-full**
- Output: `data/_kb/distilled_{facts,procedures,config_hints}.jsonl` + `data/vectors/distilled_{factual,procedural,config_hint}_v20260423102847.parquet`
- Auditor cross-lineage: Kimi K2.6 ↔ Haiku 4.5 alternation, Opus auto-promote on diffs >100k chars, **per-PR cap=3 with auto-reset on new head SHA**
- Pathway memory: 88 traces, 11/11 successful replays (probation gate crossed)
- Mode runner: 5 native modes; `codereview_isolation` is default; composed-corpus auto-downgrade verified Apr 26 (composed lost 5/5 vs isolation, p=0.031)
### Matrix indexer
30+ live corpora including:
- 5 versions of `workers_500k_v1..v9` (50K embedded chunks each)
- 11 batched 2K-row shards `w500k_b3..b17`
- `chicago_permits_v1` (3,420), `resumes_100k_v2` (100K candidates), `ethereal_workers_v1` (10K)
- `lakehouse_arch_v1` (2,119), `lakehouse_symbols_v1` (2,470), `lakehouse_answers_v1` (1,269), `scrum_findings_v1` (1,260)
- `kb_team_runs_v1` (12,693) + `kb_team_runs_agent` (4,407) — LLM-team play history embedded
- `distilled_factual_v20260423102507` (8) — distillation output
### Code health
- `cargo check --workspace`**0 warnings, 0 errors**
- `bun test auditor + tests/distillation`**145/145 pass**
- `ui/server.ts` + `auditor.ts` bundle clean
---
## DO NOT RELITIGATE
- **PR #11 is merged into `origin/main` as `ed57eda`** — do not "still need to merge PR #11."
- **Distillation tag `distillation-v1.0.0` at `e7636f2` is FROZEN** — do not re-architect schemas, scorer rules, audit fixtures.
- **Kimi forensic HOLD verdict (2026-04-27) was 2/8 false + 6/8 latent** — do not re-debate, see `reports/kimi/audit-last-week-full.md`.
- **`candidates_safe` `vertical` column bug** — fixed at catalog metadata layer in commit `c3c9c21`. Do not "discover" it again.
- **Decisions A/B/C/D from `synthetic-data-gap-report.md`** — all four scripts shipped today (`d56f08e`, `940737d`, `c3c9c21`). Do not "ask J for approval."
- **`workers_500k.phone` type fixup** — already string. The fixup script is idempotent; running it is a no-op.
- **`client_workerskjkk` typo dataset** — was breaking every SQL query (catalog had it registered, file didn't exist). Removed via `DELETE /catalog/datasets/by-name/client_workerskjkk` this session. Do not re-add. Adding a startup gate that errors on unrecognized parquet names is the long-term fix per now.md Step 2C.
- **Python sidecar dropped from hot path 2026-05-02 (`ba928b1`)** — AiClient calls Ollama directly. Do not "wire python embedding back in." `lab_ui.py` + `pipeline_lab.py` keep running as dev-only UIs (not on the runtime path).
- **Lance backend gauntlet (2026-05-02)** — sanitizer over all 5 routes, 7 unit tests, 9-probe smoke, 10M re-bench. The `doc_id` btree auto-builds inside `lance_migrate` AND `lance-bench`. Do not "discover" the missing scalar index again or the leaked filesystem paths in error bodies.
- **Cross-runtime parity = 32/32** across 5 probes in `golangLAKEHOUSE/scripts/cutover/parity/`. Do not "build a parity probe for X" without checking — validator, extract_json, session_log, materializer, and embed are all already covered.
- **Decisions tracker is `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`** — single living source of truth for cross-runtime decisions. As of 2026-05-02 it has 0 `_open_` code work items; only 2 strategic items left (Lance vs Parquet+HNSW-with-spilling, Go-vs-Rust primary cutover).
- **PRD line 70 is load-bearing — "Everything runs locally, no cloud APIs."** Yesterday's PR #13 violated this by routing customer hot-path inference to opencode/openrouter/ollama_cloud. **REVERTED 2026-05-03 (`d054c0b`).** The customer hot path (modes.toml staffing_inference, doc_drift_check; execution_loop overseer escalation) is now local Ollama (qwen3.5:latest). Cloud providers stay configured in providers.toml for **explicit dev-tool opt-in only** (scrum, auditor, bot/propose). Do NOT re-add cloud models to the hot path defaults — the customer demo runs on local + free.
- **`/v1/usage` shows by_provider=ollama only** for any customer-shape request. If you see by_provider including kimi/openrouter/opencode/ollama_cloud for normal /v1/iterate or /v1/respond traffic, something has regressed. Verify with: `curl -sf http://127.0.0.1:3100/v1/usage | jq .by_provider`.
- **`./scrum` is a TOOL, not architecture.** Lives at repo root. J runs it manually. Outputs findings to `data/_kb/scrum_findings.jsonl` (the KB). Findings inform development; they do NOT auto-fold into PRDs/design docs/code. If you find yourself proposing to "wire scrum into [some pipeline]" — stop. It's J's tool.
- **Test code in main is ACTIVELY being cleaned out.** 2026-05-03 commits `6aafd41` + `f4ebd22` removed 12 orphaned dev experiments (~2900 LOC) from `tests/real-world/` and `scripts/`. If you find more zero-reference experimental files, surface them — don't auto-delete unless the pattern is clearly a one-time-experiment-with-zero-consumers like the ones already removed.
---
## FIXES MADE THIS SESSION (2026-04-27 evening)
1. **`crates/gateway/src/v1/iterate.rs:93`** — `state``_state` (cleared the one cargo warning).
2. **`lakehouse-ui.service` (Dioxus)** — disabled. Was failing 7,242 times against a missing `target/dx/ui/debug/web/public` build dir. `systemctl stop && disable`.
3. **VCP UI on `:3950`** — started `bun run ui/server.ts` (PID 1162212, log `/tmp/lakehouse_ui.log`). `/data/*` endpoints now 200 with real data.
4. **`client_workerskjkk` catalog entry** — `DELETE /catalog/datasets/by-name/client_workerskjkk` removed the dead manifest. **This was the actual root cause** of `/system/summary` reporting `workers_500k_rows: 0` and the demo showing zero bench. Every SQL query was failing schema inference on the missing file before reaching its target table. Fixed → `workers_500k_rows: 500000`, `candidates_rows: 1000`, demo coverage flipped from "critical 0%" to actual percentages on devop.live/lakehouse.
## FIXES MADE THIS SESSION (2026-04-28 early — face pool)
5. **Synthetic StyleGAN face pool — 1000 faces, gender+race+age tagged.** `scripts/staffing/fetch_face_pool.py` fetches from thispersondoesnotexist.com; `scripts/staffing/tag_face_pool.py --min-age 22` runs deepface and excludes minors. `data/headshots/manifest.jsonl` now has gender (494 men / 458 women), race (caucasian 662 · east_asian 128 · hispanic 86 · middle_eastern 59 · black 14 · south_asian 3), age, and 48 minor exclusions. Server pool = 952 servable faces.
6. **`mcp-server/index.ts:1308` `/headshots/:key` route** — gender×race×age intersection bucketing with graceful fallback (gender-only → all). Same key always returns same face; different keys spread evenly.
7. **`/headshots/_thumbs/` pre-resized 384×384 webp** (60× smaller: 587KB → ~11KB). Without this, 40-card grids overran Chrome's parallel-connection budget and ~75% of tiles never finished decoding. Generated via parallel ffmpeg (`xargs -P 8`); `.gitignore`d.
8. **`mcp-server/search.html` + `console.html`** — dropped `img.loading='lazy'`. With 11KB thumbs, eager load is cheap (~500KB for 50 cards) and avoids the off-screen race that lazy decode produced.
9. **ComfyUI on-demand uniqueness — `serve_imagegen.py:32`** added `seed` to `_cache_key()` (was caching by prompt only — 3 different worker seeds collapsed to 1 cached image). Verified: seed=839185194/195/196 → 3 distinct md5s.
10. **`mcp-server/index.ts:1234` `/headshots/generate/:key`** — ComfyUI hot-path that derives a deterministic-per-worker seed via djb2-style hash; cold ~1.5s, cached ~1ms. Worker prompt format: `professional corporate headshot portrait of a {age}-year-old {race} {gender}, {role}, neutral expression, plain studio background, soft natural lighting, sharp focus, photorealistic, dslr`. Cache at `data/headshots_gen/` (gitignored, regeneratable).
11. **Confidence-default name resolution** in `search.html``genderFor()` and `guessEthnicityFromFirstName()` lookup tables (FEMALE_NAMES, MALE_NAMES, NAMES_HISPANIC, NAMES_BLACK, NAMES_SOUTH_ASIAN, NAMES_EAST_ASIAN, NAMES_MIDDLE_EASTERN). Xavier → man+hispanic, Aisha → woman+black, etc. Every worker resolves to a face-pool bucket.
End-to-end verified: playwright run on `https://devop.live/lakehouse/?q=forklift+operators+IL` → 21/21 cards loaded, 0 broken, all 384×384 webp thumbs.
---
## ⚠ PRODUCTION-READY BLOCKER (2026-05-03)
**Audit-trail capability is the gate to client signature.** Smoke + parity tests prove the surface compiles; they do NOT prove an audit response can be produced for a specific person. Staffing client won't sign without defensible discrimination-claim response capability.
**Authoritative document:** `docs/AUDIT_TRAIL_PRD.md` — drafted 2026-05-03. Defines worked example (John Martinez at Warehouse B), the per-decision audit row schema, the surface map of where decisions happen today, current-state-vs-target gap table, and 9-phase implementation sequence.
**Phase 1 (discovery walk) requires NO J approval — it's read-only.** Phases 2+ have explicit open questions in §10 of the PRD that need J's call before they can start.
Until phase 9 exit criterion is met, **do not claim "production-ready" on customer-facing surfaces.** Internal substrate (Lance, sidecar drop, parity probes) is solid; subject-of-record audit story is not.
---
## OPEN — but not blocking the demo
| Item | What | When to act |
|---|---|---|
| `modes.toml` `staffing_inference.matrix_corpus` | still says `workers_500k_v8`. v9 in vector index is from Apr 17 (raw-sourced, not safe-view). The new `build_workers_v9.sh` rebuilds from `workers_safe`. | Run when you have 30+ min for the rebuild. |
| Open PRs #6, #7, #10 | All closed 2026-05-02 — superseded / empty / stale. PR #12 merged 2026-05-03 (`a5d9070`). PR #13 merged 2026-05-03 (`feb638e`). | Done. |
| `test/enrich-prd-pipeline` branch | 35 unmerged commits, includes more-evolved auditor/inference.ts (666 vs main's 580 lines), curation+fact-extractor wiring | Reconcile or formally archive — see `memory/project_unmerged_architecture_work.md`. |
| `federation-hnsw-trials` stash | Lance + S3/MinIO prototype, `aws-config` crate added, 708 insertions | Phase B from EXECUTION_PLAN.md — revisit when Parquet vector ceiling actually hurts. |
| `candidates` manifest drift | manifest 100K vs SQL 1K. Cosmetic. | Run a metadata resync if it matters. |
---
## RUNTIME CHEATSHEET
```bash
# Verify the demo (public + local both work)
curl -sS https://devop.live/lakehouse/ # Co-Pilot HTML
curl -sS https://devop.live/lakehouse/console # staffers console
curl -sS -X POST https://devop.live/lakehouse/intelligence/staffing_forecast \
-d '{}' -H 'content-type: application/json' \
| jq '.forecast[] | {role, demand_workers, bench_total, coverage_pct, risk}'
# Restart sequence (after Rust changes)
sudo systemctl restart lakehouse.service # gateway :3100
sudo systemctl restart lakehouse-auditor # auditor daemon
sudo systemctl restart lakehouse-observer # observer :3800
# UI bun on :3950 is NOT systemd-managed (lakehouse-ui.service is disabled).
# Restart manually: kill <pid>; nohup bun run ui/server.ts > /tmp/lakehouse_ui.log 2>&1 &
# Health checks
curl -sS http://localhost:3100/v1/health | jq # workers_count, providers
curl -sS http://localhost:3100/vectors/pathway/stats | jq
curl -sS http://localhost:3100/v1/usage | jq # since-restart cost
curl -sS http://localhost:3700/system/summary | jq # dataset counts
```
---
## VISION — what we're actually building (not what's done)
J's framing for the legacy staffing company:
- Pull live data, anticipate contracts based on Chicago permits → real architect/contractor associations, headcount, time period, money, scope.
- Hybrid + memory index → search large corpora cheaply.
- Email comes in → verify against contract; SMS comes in → alert when index changes.
- Real-time.
- Invent metrics nobody else has using the hybrid index.
- Next stage: workers download an app → geolocation clock-in → automatic responsiveness measurement, no user effort, with incentives for using it.
- Find people getting certificates (passive cert tracking).
- Pull union data → bring contracts that work for **employees**, not just employers.
- All metrics visible, nothing hidden, value-aligned with what each side actually needs.
If a future session is shaving away from this vision toward "fix the cutover" or "land Phase X," the vision wins. Phases are scaffolding for the vision, not the goal.
---
## CURRENT PLAN — fix the demo for the legacy staffing client
Built from playwright audit of the live demo (2026-04-27 evening). Each item ends in something the client can SEE, not internal cleanups.
**Demo state is anchored by git tag `demo-2026-04-27`** (commit `ed57eda`, the merge of PR #11). To restore code state: `git checkout demo-2026-04-27`. To restore runtime state: `DELETE /catalog/datasets/by-name/client_workerskjkk` (catalog hot-fix is not in git).
### P1 — Search box that actually filters (highest visible impact)
**Problem:** typing in `#sq` and pressing Enter fires `POST /intelligence/chat` with body `{"message":"<query>"}`. The state (`#sst`) and role (`#srl`) selects are ignored — never sent in the body. So every search returns a generic chat completion, never a SQL+vector hybrid filter against `workers_500k`. That is the "cached/generic response" the client sees.
**Fix:** in `mcp-server/search.html`, change the search-submit handler to call the real worker search endpoint with `{query, state, role, top_k}`. The MCP `search_workers` tool surface already exists; route the form there. Render returned worker rows in the existing card grid.
**Done when:** typing "forklift" + state IL + role "Forklift Operator" returns ≤ top_k IL Forklift Operators, and changing state to WI returns different workers.
### P2 — Contractor-name click → `/contractor` profile page
**Problem:** clicking a contractor name in any rendered card stays on `/lakehouse/`. URL doesn't change.
**Fix:** wrap contractor names in `<a href="/contractor?name=<encoded>">`. The page `mcp-server/contractor.html` (14.8 KB, "Contractor Profile · Staffing Co-Pilot") already exists at `/contractor` and the data endpoint `/intelligence/contractor_profile` already returns rich data.
**Then check contractor.html actually shows:** full history of every record the database has on that contractor + heat map of locations underneath + relevant info (per J 2026-04-27). If the page is incomplete, finish it. Otherwise just wire the link.
**Done when:** clicking "KACPRZYNSKI, ANDY" opens a profile with: every Chicago permit they're contact_1 or contact_2 on, a leaflet map with markers for each address, and any matched workers from prior placements at their sites.
### P3 — Substrate signal at the bottom shows the right numbers
**Problem:** J reports the bottom panel says "playbook memory empty, 80 traces 0 replies." Reality from the live endpoints: `/api/vectors/playbook_memory/stats` = 4,701 entries with embeddings; `/vectors/pathway/stats` = 88 traces, 11/11 replays.
**Fix:** find the renderer in search.html that builds the substrate signal panel; verify it's hitting the right endpoints and reading the right keys; fix shape mismatches.
**Done when:** bottom panel shows real numbers (4,701 playbooks, 88 traces, 11/11 replays) and references at least one specific recent operation from the playbook stats sample.
### P4 — Top nav reflects today's architecture
**Problem:** Walkthrough/Architecture/Spec/Onboard/Alerts/Workspaces tabs all return 200 but content is from old architecture. Doesn't mention: gateway scratchpad, memory indexer, ranker, mode runner, OpenCode 40-model fleet, distillation substrate, auditor cross-lineage.
**Fix:** rewrite `mcp-server/proof.html` (or add a single new page "What's running" that replaces Architecture+Spec) to describe what's actually shipped as of `demo-2026-04-27`. Keep one architecture page, drop redundancy. Either complete or hide Onboard/Alerts/Workspaces — J's call which.
**Done when:** the architecture page tells a non-technical reader, in 2 minutes, what each piece does in coordinator-relatable terms ("intern that read every email", not "3-stage adversarial inference pipeline").
### P5 — Caching for the project-index build_signal (J flagged unfinished)
**Problem:** "we never finished our caching for project index build signal it's not pulling new information." Need to find what `build_signal` refers to. Likely a scrum/auditor signal that should rebuild the `lakehouse_arch_v1` corpus on commit but isn't wired to.
**Fix:** identify the build-signal pipeline (likely in `auditor/` or `crates/vectord/`), wire its emit to a corpus rebuild, verify by making a test commit and watching the new chunk appear in `/vectors/indexes` for `lakehouse_arch_v1`.
**Done when:** committing a new file to `crates/` causes `lakehouse_arch_v1` chunk_count to increase within N minutes.
### P0 — Anchor the demo state (DONE)
Tagged `ed57eda` as `demo-2026-04-27`. Future sessions: `git checkout demo-2026-04-27` to land in this exact code state.
---
## EXECUTION ORDER
1. **P1 first** — biggest visible bug, ~30-60 min
2. **P2 next** — contractor click is the second-biggest "doesn't work" the client sees, ~20 min if profile is mostly done
3. **P3** — small fix, big "looks alive" win
4. **P4** — biggest scope; might split across sessions
5. **P5** — feature work, only after the visible bugs are fixed
Each item commits independently with the format `demo: P<n> — <one-line>` so the commit log doubles as a progress journal. After each merge to main, re-tag `demo-latest` to point at the new HEAD.
Stop here and let J pick which item to start with. Do not silently extend scope.