STATE_OF_PLAY: 2026-05-03 evening wave (subject-manifest substrate + Phase 1.6 BIPA)
Documents the 13-commit wave end-to-end: - SUBJECT_MANIFESTS_ON_CATALOGD spec Steps 1-8 SHIPPED - 5/7 Phase 1.6 BIPA gates engineering-complete or eng-staged - 6th cross-runtime parity probe (subject_audit, 6/6 byte-identical) Status table for Phase 1.6 with evidence pointers per item. Operational state captures the LH_SUBJECT_AUDIT_KEY + LH_LEGAL_AUDIT_TOKEN_FILE systemd configuration so next session knows what's in place. Cataloged the three runtime-divergence classes the parity probe loop caught + structurally killed (omitempty stripping, time.RFC3339Nano truncation, json.Marshal HTML-escape). Future Go↔Rust work can reference these patterns instead of rediscovering them. Calendar bottleneck is now counsel review of items 1/2/5/6 — engineering has staged everything it can without legal sign-off. Engineering long pole is Gate 3b (deepface), deferred for design conversation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
3708e6abf1
commit
47a26fdaa8
@ -1,13 +1,71 @@
|
|||||||
# STATE OF PLAY — Lakehouse
|
# STATE OF PLAY — Lakehouse
|
||||||
|
|
||||||
**Last verified:** 2026-05-02 evening CDT
|
**Last verified:** 2026-05-03 evening CDT
|
||||||
**Verified by:** live probe (smoke 9/9, parity 32/32, gateway restarted), not memory.
|
**Verified by:** live probe (gateway restarted 2x, all 11 catalogd subject tests + 11 biometric tests + 6 audit tests + 4 mcp-server Gate-4 tests green; cross-runtime parity 6/6 byte-identical against live audit logs; live curl roundtrip on /biometric returned 200 + chained audit row), not memory.
|
||||||
|
|
||||||
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits this wave)
|
## WHAT LANDED 2026-05-03 (13 commits this wave — local-first audit substrate + Phase 1.6 BIPA gates)
|
||||||
|
|
||||||
|
The dominant work today: **`docs/specs/SUBJECT_MANIFESTS_ON_CATALOGD.md` Steps 1-8 SHIPPED end-to-end** + **5 of 7 Phase 1.6 BIPA pre-launch gates** + **6th cross-runtime parity probe**. Wave was structured as eight ship-then-scrum cycles — every wave caught real bugs, every fix wave landed within the same session.
|
||||||
|
|
||||||
|
| Commit | What | Verified |
|
||||||
|
|---|---|---|
|
||||||
|
| `d259909` | catalogd: Step 1 — `SubjectManifest` type + Registry CRUD | 17 catalogd subject tests PASS |
|
||||||
|
| `d16131b` | catalogd: Step 2 — `SubjectAuditWriter` HMAC-SHA256 chain + per-subject Mutex + canonical-JSON via BTreeMap | tamper-detection + concurrent-append race tests PASS |
|
||||||
|
| `bce6dfd` | catalogd: Step 3 — `bin/backfill_subjects` (BIPA-defensible defaults: vertical=unknown, consent=pending_backfill_review, retention=4yr) | 100 subjects loaded into live catalog |
|
||||||
|
| `fef1efd` | gateway: Step 4 — wire SubjectAuditWriter into `/v1/chat` tool dispatch + `audit_subject_hits_in` (inline, not spawn) | tool calls log accessor.kind="gateway_lookup" |
|
||||||
|
| `cd8c59a` | gateway: Step 5 — `AuditingWorkerLookup` decorator wraps validator's WorkerLookup; spawns audit on every find() | live `/v1/validate` produces audit rows |
|
||||||
|
| `e38f357` | subjects Steps 1-4 fixes from cross-lineage scrum (concurrency race, schema-evolution HMAC drift, hardcoded "success" classifier) | 41 tests green |
|
||||||
|
| `15cfd76` | catalogd + gateway: Step 6 — `/audit/subject/{id}` legal-tier endpoint with constant-time-eq token check + tampering detection | live curl returns chain_verified=true |
|
||||||
|
| `2a4b316` | subjects 2nd scrum fix wave (token min 16→32, chain_root from full chain via `chain_tip()`, rebuild collision warn, tightened result-state heuristic) | 17 catalogd + 6 gateway tests PASS |
|
||||||
|
| `8fc6238` | catalogd: Step 7 — `bin/retention_sweep` (BIPA-aware on biometric clock, idempotent across daily runs, no auto-mutation) | 8 sweep tests PASS, live verified at `--as-of 2031-06-01` flagging 100/100 expired |
|
||||||
|
| `2413c96` | catalogd: Step 8 — `bin/parity_subject_audit` (Rust side of cross-runtime parity probe) | known-answer + verify modes match Go byte-for-byte |
|
||||||
|
| `2222227` | parity helper hardening (panic-noise → die() helper, abs path stripped from doc) from scrum | parity probe still 6/6 |
|
||||||
|
| `4708717` | **Phase 1.6 BIPA wave** — Gate 4 absence test (4/4 with bypass coverage), §2 attestation script (3/3 evidence checks), Gate 1/2/5 doc scaffolds with ⚖ COUNSEL markers | 4/4 mcp-server Bun tests, 3/3 evidence on live data |
|
||||||
|
| `c7aa607` | Phase 1.6 scrum fixes — schema fingerprint hashes name+type+nullable, Gate 4 catches object-literal + class-field bypasses, pyarrow dep gate, item 7 deferral rationale | 4/4 + 3/3 still pass |
|
||||||
|
| `f1fa6e4` | **Phase 1.6 Gate 3a** — `crates/catalogd/src/biometric_endpoint.rs`: `POST /biometric/subject/{id}/photo` with consent gate, quarantined storage (mode 0700/0600), audit chain link, `BiometricCollection` field on SubjectManifest | 11 unit tests PASS, live roundtrip 200 |
|
||||||
|
| `3708e6a` | Gate 3a scrum fixes — transactional rollback on audit failure (BIPA convergent BLOCK), Content-Type parameter handling, relative data_path, ts+uuid filename, dead code removed | 11 tests + cross-runtime parity 6/6 |
|
||||||
|
|
||||||
|
**Cross-runtime parity (post-this-wave):** 6 probes, 38/38 byte-identical assertions —
|
||||||
|
`validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8) + subject_audit(6/6)`.
|
||||||
|
Run `cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done` to re-verify.
|
||||||
|
|
||||||
|
**Three runtime-divergence classes caught + fixed by the parity probe authoring loop** (cataloged because they recur):
|
||||||
|
1. Go `omitempty` on string fields strips empty values that Rust serde always emits → broken HMAC
|
||||||
|
2. `time.RFC3339Nano` truncates trailing-zero nanoseconds where chrono `AutoSi` keeps 9 digits → broken HMAC
|
||||||
|
3. Go `json.Marshal` HTML-escapes `<>&` where serde keeps literal → broken HMAC on any field with those chars
|
||||||
|
|
||||||
|
All three have regression-locked tests; structural impossibility going forward.
|
||||||
|
|
||||||
|
**Phase 1.6 BIPA pre-launch gates — status table:**
|
||||||
|
|
||||||
|
| Item | Status | Evidence |
|
||||||
|
|---|---|---|
|
||||||
|
| Gate 1 — public retention schedule | eng-staged, ⚖ counsel pending | `docs/policies/consent/biometric_retention_schedule_v1.md` |
|
||||||
|
| Gate 2 — informed consent template | eng-staged, ⚖ counsel pending | `docs/policies/consent/biometric_consent_template_v1.md` |
|
||||||
|
| Gate 3a — photo-upload endpoint | **DONE** | 11 unit tests + live `POST /biometric/subject/{id}/photo` |
|
||||||
|
| Gate 3b — deepface classification | deferred | needs Python subprocess design (sidecar dropped 2026-05-02) |
|
||||||
|
| Gate 4 — name→ethnicity removal | **DONE** | `mcp-server/phase_1_6_gate_4.test.ts` 4/4 with bypass coverage |
|
||||||
|
| Gate 5 — destruction runbook | eng-staged, ⚖ counsel pending | `docs/runbooks/BIPA_DESTRUCTION_RUNBOOK.md` |
|
||||||
|
| §2 cryptographic attestation | eng-DONE, signature pending | `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (SHA-256 evidence hash, 3/3 checks pass on live data) |
|
||||||
|
| §3 employee training | deferred | conditional on operator population size |
|
||||||
|
|
||||||
|
**Calendar bottleneck:** counsel review of items 1/2/5/6. Engineering long pole is Gate 3b (deepface) — needs design conversation before engineering starts.
|
||||||
|
|
||||||
|
**Operational state:**
|
||||||
|
- `LH_SUBJECT_AUDIT_KEY=/tmp/lakehouse_audit/subject_audit.key` (32-byte HMAC signing key) loaded into systemd unit
|
||||||
|
- `LH_LEGAL_AUDIT_TOKEN_FILE=/tmp/lakehouse_audit/legal_audit.token` (44-char legal-tier token) loaded into systemd unit
|
||||||
|
- `data/_catalog/subjects/` holds 100 backfilled `WORKER-N.json` manifests + per-subject `WORKER-N.audit.jsonl` HMAC chains
|
||||||
|
- `data/biometric/uploads/<safe_id>/<ts>_<uuid>.<ext>` quarantined photo storage (mode 0700 dir / 0600 file). 2 photos uploaded for WORKER-2 during live verify.
|
||||||
|
- `/audit/subject/{id}` mounted on gateway with chain_verified=true on every probe
|
||||||
|
- `/biometric/subject/{id}/photo` mounted on gateway, refuses 403 without `consent.biometric.status="given"`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits — Lance gauntlet + cross-runtime parity wave)
|
||||||
|
|
||||||
| Commit | What | Verified |
|
| Commit | What | Verified |
|
||||||
|---|---|---|
|
|---|---|---|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user