diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md index 799dd1f..0923e72 100644 --- a/STATE_OF_PLAY.md +++ b/STATE_OF_PLAY.md @@ -1,13 +1,71 @@ # STATE OF PLAY — Lakehouse -**Last verified:** 2026-05-02 evening CDT -**Verified by:** live probe (smoke 9/9, parity 32/32, gateway restarted), not memory. +**Last verified:** 2026-05-03 evening CDT +**Verified by:** live probe (gateway restarted 2x, all 11 catalogd subject tests + 11 biometric tests + 6 audit tests + 4 mcp-server Gate-4 tests green; cross-runtime parity 6/6 byte-identical against live audit logs; live curl roundtrip on /biometric returned 200 + chained audit row), not memory. > **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. --- -## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits this wave) +## WHAT LANDED 2026-05-03 (13 commits this wave — local-first audit substrate + Phase 1.6 BIPA gates) + +The dominant work today: **`docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md` Steps 1-8 SHIPPED end-to-end** + **5 of 7 Phase 1.6 BIPA pre-launch gates** + **6th cross-runtime parity probe**. Wave was structured as eight ship-then-scrum cycles — every wave caught real bugs, every fix wave landed within the same session. + +| Commit | What | Verified | +|---|---|---| +| `d259909` | catalogd: Step 1 — `SubjectManifest` type + Registry CRUD | 17 catalogd subject tests PASS | +| `d16131b` | catalogd: Step 2 — `SubjectAuditWriter` HMAC-SHA256 chain + per-subject Mutex + canonical-JSON via BTreeMap | tamper-detection + concurrent-append race tests PASS | +| `bce6dfd` | catalogd: Step 3 — `bin/backfill_subjects` (BIPA-defensible defaults: vertical=unknown, consent=pending_backfill_review, retention=4yr) | 100 subjects loaded into live catalog | +| `fef1efd` | gateway: Step 4 — wire SubjectAuditWriter into `/v1/chat` tool dispatch + `audit_subject_hits_in` (inline, not spawn) | tool calls log accessor.kind="gateway_lookup" | +| `cd8c59a` | gateway: Step 5 — `AuditingWorkerLookup` decorator wraps validator's WorkerLookup; spawns audit on every find() | live `/v1/validate` produces audit rows | +| `e38f357` | subjects Steps 1-4 fixes from cross-lineage scrum (concurrency race, schema-evolution HMAC drift, hardcoded "success" classifier) | 41 tests green | +| `15cfd76` | catalogd + gateway: Step 6 — `/audit/subject/{id}` legal-tier endpoint with constant-time-eq token check + tampering detection | live curl returns chain_verified=true | +| `2a4b316` | subjects 2nd scrum fix wave (token min 16→32, chain_root from full chain via `chain_tip()`, rebuild collision warn, tightened result-state heuristic) | 17 catalogd + 6 gateway tests PASS | +| `8fc6238` | catalogd: Step 7 — `bin/retention_sweep` (BIPA-aware on biometric clock, idempotent across daily runs, no auto-mutation) | 8 sweep tests PASS, live verified at `--as-of 2031-06-01` flagging 100/100 expired | +| `2413c96` | catalogd: Step 8 — `bin/parity_subject_audit` (Rust side of cross-runtime parity probe) | known-answer + verify modes match Go byte-for-byte | +| `2222227` | parity helper hardening (panic-noise → die() helper, abs path stripped from doc) from scrum | parity probe still 6/6 | +| `4708717` | **Phase 1.6 BIPA wave** — Gate 4 absence test (4/4 with bypass coverage), §2 attestation script (3/3 evidence checks), Gate 1/2/5 doc scaffolds with ⚖ COUNSEL markers | 4/4 mcp-server Bun tests, 3/3 evidence on live data | +| `c7aa607` | Phase 1.6 scrum fixes — schema fingerprint hashes name+type+nullable, Gate 4 catches object-literal + class-field bypasses, pyarrow dep gate, item 7 deferral rationale | 4/4 + 3/3 still pass | +| `f1fa6e4` | **Phase 1.6 Gate 3a** — `crates/catalogd/src/biometric_endpoint.rs`: `POST /biometric/subject/{id}/photo` with consent gate, quarantined storage (mode 0700/0600), audit chain link, `BiometricCollection` field on SubjectManifest | 11 unit tests PASS, live roundtrip 200 | +| `3708e6a` | Gate 3a scrum fixes — transactional rollback on audit failure (BIPA convergent BLOCK), Content-Type parameter handling, relative data_path, ts+uuid filename, dead code removed | 11 tests + cross-runtime parity 6/6 | + +**Cross-runtime parity (post-this-wave):** 6 probes, 38/38 byte-identical assertions — +`validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8) + subject_audit(6/6)`. +Run `cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done` to re-verify. + +**Three runtime-divergence classes caught + fixed by the parity probe authoring loop** (cataloged because they recur): +1. Go `omitempty` on string fields strips empty values that Rust serde always emits → broken HMAC +2. `time.RFC3339Nano` truncates trailing-zero nanoseconds where chrono `AutoSi` keeps 9 digits → broken HMAC +3. Go `json.Marshal` HTML-escapes `<>&` where serde keeps literal → broken HMAC on any field with those chars + +All three have regression-locked tests; structural impossibility going forward. + +**Phase 1.6 BIPA pre-launch gates — status table:** + +| Item | Status | Evidence | +|---|---|---| +| Gate 1 — public retention schedule | eng-staged, ⚖ counsel pending | `docs/policies/consent/biometric_retention_schedule_v1.md` | +| Gate 2 — informed consent template | eng-staged, ⚖ counsel pending | `docs/policies/consent/biometric_consent_template_v1.md` | +| Gate 3a — photo-upload endpoint | **DONE** | 11 unit tests + live `POST /biometric/subject/{id}/photo` | +| Gate 3b — deepface classification | deferred | needs Python subprocess design (sidecar dropped 2026-05-02) | +| Gate 4 — name→ethnicity removal | **DONE** | `mcp-server/phase_1_6_gate_4.test.ts` 4/4 with bypass coverage | +| Gate 5 — destruction runbook | eng-staged, ⚖ counsel pending | `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` | +| §2 cryptographic attestation | eng-DONE, signature pending | `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (SHA-256 evidence hash, 3/3 checks pass on live data) | +| §3 employee training | deferred | conditional on operator population size | + +**Calendar bottleneck:** counsel review of items 1/2/5/6. Engineering long pole is Gate 3b (deepface) — needs design conversation before engineering starts. + +**Operational state:** +- `LH_SUBJECT_AUDIT_KEY=/tmp/lakehouse_audit/subject_audit.key` (32-byte HMAC signing key) loaded into systemd unit +- `LH_LEGAL_AUDIT_TOKEN_FILE=/tmp/lakehouse_audit/legal_audit.token` (44-char legal-tier token) loaded into systemd unit +- `data/_catalog/subjects/` holds 100 backfilled `WORKER-N.json` manifests + per-subject `WORKER-N.audit.jsonl` HMAC chains +- `data/biometric/uploads//_.` quarantined photo storage (mode 0700 dir / 0600 file). 2 photos uploaded for WORKER-2 during live verify. +- `/audit/subject/{id}` mounted on gateway with chain_verified=true on every probe +- `/biometric/subject/{id}/photo` mounted on gateway, refuses 403 without `consent.biometric.status="given"` + +--- + +## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits — Lance gauntlet + cross-runtime parity wave) | Commit | What | Verified | |---|---|---|