phase 1.6 BIPA: scrum-driven fixes

Per 2026-05-03 phase_1_6_bipa_gates scrum (13 findings, 0 convergent).
1 BLOCK verified false positive, 4 real fixes shipped:

False positive (verified):
- opus BLOCK on attest:55 — claimed `set -uo pipefail` without `-e`
  makes the post-python3 `if [ $? -ne 0 ]` check unreachable. Verified
  WRONG: `X=$(false); echo $?` prints 1. Bash propagates command-
  substitution exit through $? on the assignment line. The check IS
  the python3 exit gate. Inline comment added to the script noting
  the false positive so future scrums don't re-flag.

Real fixes:
1. opus WARN attestation:18 — schema fingerprint hashed names ONLY,
   missing column-type changes. A column repurposed to hold base64
   photo bytes under its existing name would pass undetected. Now
   hashes "name<TAB>type<TAB>nullable=bool" per row. Re-run produced
   evidence SHA-256 1fdcc9f1... (vs old 230fffeb..., reflecting the
   broader fingerprint scope).

2. opus WARN gate_4_test:60 — definition regex didn't catch
   object-literal property forms (`const t = { FEMALE_NAMES: [...] }`)
   or TypeScript class fields (`class L { public NAMES_X: string[] = [] }`).
   Added two new patterns + a regression test
   (Gate 4: object-literal and class-field bypasses are caught) that
   exercises 5 bypass forms. 4/4 tests green; 1 minor regex tweak
   needed mid-fix to handle single-line class bodies.

3. kimi WARN python3-reliance — script assumed pyarrow installed and
   would emit a stack trace into the attestation if not. Added
   `python3 -c "import pyarrow"` gate at top with clean install
   instructions on failure.

4. opus INFO PHASE_1_6:200 — item 7 (training) silently dropped from
   blocking set with bare "deferred" rationale. Now explicitly states
   the deferral is conditional on small operator population (J + 1-2
   named ops); item 7 re-promotes to blocking if population grows.
   ⚖ COUNSEL marker added.

Skipped (acceptable as ⚖ COUNSEL placeholders by design):
- kimi WARN consent template:30-day-SLA (counsel decides number)
- kimi WARN consent template:email-placeholder (counsel supplies)
- kimi WARN parquet absence (env override exists; redeployment-aware)
- kimi INFO runbook manual-erasure (marked TODO when /erase ships)
- qwen INFO doc path/status nits (already addressed by file moves)

Tests: 4/4 Gate 4 absence test (incl. new bypass-coverage), 3/3
attestation evidence checks pass on live data.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
root 2026-05-03 04:43:17 -05:00
parent 4708717f6b
commit c7aa607ae4
4 changed files with 92 additions and 23 deletions

View File

@ -199,7 +199,17 @@ PLUS:
| 6 | Cryptographic attestation pre-identityd | DONE — `scripts/staffing/attest_pre_identityd_biometric_state.sh` + `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (3/3 evidence checks pass; signature lines pending) | pending signature | **eng-DONE, signature-pending** | | 6 | Cryptographic attestation pre-identityd | DONE — `scripts/staffing/attest_pre_identityd_biometric_state.sh` + `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (3/3 evidence checks pass; signature lines pending) | pending signature | **eng-DONE, signature-pending** |
| 7 | Employee training material | scaffold deferred — Gate 5 runbook §7 acknowledgment may serve as substrate | pending | **deferred** | | 7 | Employee training material | scaffold deferred — Gate 5 runbook §7 acknowledgment may serve as substrate | pending | **deferred** |
Until items 1-5 + 6 are checked off, **identity service backfill (Phase 2 §5 Step 5) cannot proceed.** **Blocking set for Phase 2 backfill:** items **1, 2, 3, 4, 5, 6** must
all be DONE. Item 7 (employee training) is reduced from blocking to
"deferred" because the Gate 5 destruction runbook §7 already requires
operator acknowledgment before legal-tier credentials are issued —
that acknowledgment is procedurally equivalent to the training-record
requirement when the operator population is small (J + 1-2 named
operators). If the operator population grows beyond that, item 7
re-promotes to blocking and a separate training program must be authored.
⚖ COUNSEL — confirm whether item 7 deferral is acceptable for the
expected operator population size, or restore it to the blocking set.
**Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel **Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel
review of the engineering scaffolds. Gate 3 (photo-upload endpoint) review of the engineering scaffolds. Gate 3 (photo-upload endpoint)

View File

@ -22,27 +22,27 @@ tamper-evident store (filesystem with backups + version control).
**Schema columns** (18 total): **Schema columns** (18 total):
``` ```
worker_id worker_id int64 nullable=True
name name string nullable=True
role role string nullable=True
email email string nullable=True
phone phone string nullable=True
city city string nullable=True
state state string nullable=True
zip zip int64 nullable=True
skills skills string nullable=True
certifications certifications string nullable=True
archetype archetype string nullable=True
reliability reliability double nullable=True
responsiveness responsiveness double nullable=True
engagement engagement double nullable=True
compliance compliance double nullable=True
availability availability double nullable=True
communications communications string nullable=True
resume_text resume_text string nullable=True
``` ```
**Schema SHA-256:** `4ba17870ce25a186a62bdfc29a3b336947dc2fba8a62c42ca249c81f41d32e30` **Schema SHA-256:** `973b9abe56420de8f88122278b633e813f90a64cf0ddaac6a9811dc0940be676`
- PASS: no biometric / photo / face / image column present - PASS: no biometric / photo / face / image column present
@ -81,7 +81,7 @@ No biometric identifiers or biometric information from real
candidates have been collected, processed, or stored prior to candidates have been collected, processed, or stored prior to
the deployment of the Phase 1.6 BIPA pre-launch gates. the deployment of the Phase 1.6 BIPA pre-launch gates.
**Evidence SHA-256:** `230fffeb77b502717bcd7161cc74d5a3401b8722acc8d6ed3d524f93e261cd0b` **Evidence SHA-256:** `1fdcc9f1682de27e1a0556d698ce221b74c1e71cf54128763828b4bca7b5c1bf`
--- ---

View File

@ -65,6 +65,12 @@ function* walkSource(dir: string): Generator<string> {
// definitionPatternsFor: returns regexes that match common DEFINITION // definitionPatternsFor: returns regexes that match common DEFINITION
// forms in JS/TS/HTML embedded scripts. A bare reference inside a // forms in JS/TS/HTML embedded scripts. A bare reference inside a
// comment is intentionally NOT matched. // comment is intentionally NOT matched.
//
// 2026-05-03 opus scrum WARN (gate_4_test:60) added object-literal +
// class-field patterns: a developer wrapping the lookup tables in
// `const tables = { FEMALE_NAMES: [...] }` or a TypeScript class
// field `FEMALE_NAMES: string[] = [...]` would have bypassed the
// original 4 patterns silently.
function definitionPatternsFor(symbol: string): RegExp[] { function definitionPatternsFor(symbol: string): RegExp[] {
return [ return [
// var / const / let SYMBOL = // var / const / let SYMBOL =
@ -73,8 +79,16 @@ function definitionPatternsFor(symbol: string): RegExp[] {
new RegExp(`\\bfunction\\s+${symbol}\\s*\\(`), new RegExp(`\\bfunction\\s+${symbol}\\s*\\(`),
// SYMBOL = function( OR SYMBOL = (...) => // SYMBOL = function( OR SYMBOL = (...) =>
new RegExp(`\\b${symbol}\\s*=\\s*(?:function\\s*\\(|\\([^)]*\\)\\s*=>|async\\s*(?:\\(|function))`), new RegExp(`\\b${symbol}\\s*=\\s*(?:function\\s*\\(|\\([^)]*\\)\\s*=>|async\\s*(?:\\(|function))`),
// class member: SYMBOL(...) { (a method declaration) // class method: SYMBOL(...) {
new RegExp(`^\\s*${symbol}\\s*\\([^)]*\\)\\s*\\{`, "m"), new RegExp(`^\\s*${symbol}\\s*\\([^)]*\\)\\s*\\{`, "m"),
// object-literal property assigned to an array OR object value:
// { SYMBOL: [...] } or SYMBOL: {...} or SYMBOL: new Set(...)
new RegExp(`(?:^|[,{])\\s*${symbol}\\s*:\\s*(?:\\[|\\{|new\\s+(?:Set|Map|Array)\\b)`, "m"),
// TypeScript / class field with type annotation. The boundary
// before SYMBOL is start-of-line OR `{`/`;`/`}` so single-line
// class bodies (`class L { public NAMES_X: string[] = []; }`)
// are caught alongside multi-line ones.
new RegExp(`(?:^|[{};])\\s*(?:public|private|protected|readonly|static\\s+)*\\s*${symbol}\\s*:\\s*[^=;{]+=\\s*[\\[\\{]`, "m"),
]; ];
} }
@ -128,3 +142,29 @@ test("Gate 4: regex catches a synthetic positive (defense in depth)", () => {
expect(offenders.some((o) => o.includes("NAMES_HISPANIC"))).toBe(true); expect(offenders.some((o) => o.includes("NAMES_HISPANIC"))).toBe(true);
expect(offenders.some((o) => o.includes("guessEthnicityFromFirstName"))).toBe(true); expect(offenders.some((o) => o.includes("guessEthnicityFromFirstName"))).toBe(true);
}); });
// 2026-05-03 opus scrum WARN regression: the bypass forms a developer
// might use to wrap the lookup tables without tripping the original
// four patterns. All of these MUST trip a definition regex.
test("Gate 4: object-literal and class-field bypasses are caught", () => {
const bypassForms = [
// Inline object-literal property → array
`const tables = { FEMALE_NAMES: ["Maria"] };`,
// Object property → Set/Map constructor
`const lookups = { NAMES_BLACK: new Set(["X"]) };`,
// Multi-line object literal
`const all = {\n NAMES_HISPANIC: [...],\n};`,
// TypeScript class field with type annotation + initializer
`class Lookup {\n SURNAMES_BLACK: string[] = ["X"];\n}`,
// TypeScript public field
`class L { public NAMES_EAST_ASIAN: string[] = []; }`,
];
for (const synthetic of bypassForms) {
const offenders = findOffenders("bypass_synthetic", synthetic);
if (offenders.length === 0) {
throw new Error(
`Gate 4 bypass not caught — pattern would slip past:\n ${synthetic}`,
);
}
}
});

View File

@ -34,6 +34,17 @@
set -uo pipefail set -uo pipefail
cd "$(dirname "$0")/../.." cd "$(dirname "$0")/../.."
# Dependency gate: pyarrow is required to read the parquet schema. Fail
# fast with a clear message rather than letting python3 -c emit a stack
# trace that gets captured into the attestation as "evidence". (Caught
# 2026-05-03 kimi scrum WARN python3-reliance.)
if ! python3 -c "import pyarrow" 2>/dev/null; then
echo "[attest] FAIL: python3 -c 'import pyarrow' failed." >&2
echo "[attest] pyarrow is required to verify workers_500k.parquet schema." >&2
echo "[attest] Install with: pip install pyarrow" >&2
exit 2
fi
DATE="${OVERRIDE_DATE:-$(date -u +%Y-%m-%d)}" DATE="${OVERRIDE_DATE:-$(date -u +%Y-%m-%d)}"
OUT_DIR="docs/attestations" OUT_DIR="docs/attestations"
OUT="$OUT_DIR/BIPA_PRE_IDENTITYD_ATTESTATION_${DATE}.md" OUT="$OUT_DIR/BIPA_PRE_IDENTITYD_ATTESTATION_${DATE}.md"
@ -62,12 +73,20 @@ if [ ! -r "$WORKERS_PARQUET" ]; then
rm -f "$EVIDENCE" rm -f "$EVIDENCE"
exit 2 exit 2
fi fi
# Hash NAME + TYPE + nullability per column, not just names. A schema
# fingerprint over names alone would not invalidate if a column got
# repurposed (e.g. resume_text reused to hold base64 photo bytes under
# its existing name). Including types catches that class of evasion.
# (Caught 2026-05-03 opus scrum WARN on attestation:18.)
SCHEMA=$(python3 -c " SCHEMA=$(python3 -c "
import sys, pyarrow.parquet as pq import sys, pyarrow.parquet as pq
schema = pq.read_schema('$WORKERS_PARQUET') schema = pq.read_schema('$WORKERS_PARQUET')
for f in schema: for f in schema:
print(f.name) print(f'{f.name}\t{f.type}\tnullable={f.nullable}')
" 2>&1) " 2>&1)
# Bash assigns + propagates the substitution's exit through \$?.
# Verified: X=\$(false); echo \$? -> 1. opus 2026-05-03 BLOCK on this
# location was a false positive — the check IS the python3 exit gate.
if [ $? -ne 0 ]; then if [ $? -ne 0 ]; then
echo "[attest] FAIL: schema read error: $SCHEMA" >&2 echo "[attest] FAIL: schema read error: $SCHEMA" >&2
rm -f "$EVIDENCE" rm -f "$EVIDENCE"