phase 1.6 BIPA: scrum-driven fixes
Per 2026-05-03 phase_1_6_bipa_gates scrum (13 findings, 0 convergent).
1 BLOCK verified false positive, 4 real fixes shipped:
False positive (verified):
- opus BLOCK on attest:55 — claimed `set -uo pipefail` without `-e`
makes the post-python3 `if [ $? -ne 0 ]` check unreachable. Verified
WRONG: `X=$(false); echo $?` prints 1. Bash propagates command-
substitution exit through $? on the assignment line. The check IS
the python3 exit gate. Inline comment added to the script noting
the false positive so future scrums don't re-flag.
Real fixes:
1. opus WARN attestation:18 — schema fingerprint hashed names ONLY,
missing column-type changes. A column repurposed to hold base64
photo bytes under its existing name would pass undetected. Now
hashes "name<TAB>type<TAB>nullable=bool" per row. Re-run produced
evidence SHA-256 1fdcc9f1... (vs old 230fffeb..., reflecting the
broader fingerprint scope).
2. opus WARN gate_4_test:60 — definition regex didn't catch
object-literal property forms (`const t = { FEMALE_NAMES: [...] }`)
or TypeScript class fields (`class L { public NAMES_X: string[] = [] }`).
Added two new patterns + a regression test
(Gate 4: object-literal and class-field bypasses are caught) that
exercises 5 bypass forms. 4/4 tests green; 1 minor regex tweak
needed mid-fix to handle single-line class bodies.
3. kimi WARN python3-reliance — script assumed pyarrow installed and
would emit a stack trace into the attestation if not. Added
`python3 -c "import pyarrow"` gate at top with clean install
instructions on failure.
4. opus INFO PHASE_1_6:200 — item 7 (training) silently dropped from
blocking set with bare "deferred" rationale. Now explicitly states
the deferral is conditional on small operator population (J + 1-2
named ops); item 7 re-promotes to blocking if population grows.
⚖ COUNSEL marker added.
Skipped (acceptable as ⚖ COUNSEL placeholders by design):
- kimi WARN consent template:30-day-SLA (counsel decides number)
- kimi WARN consent template:email-placeholder (counsel supplies)
- kimi WARN parquet absence (env override exists; redeployment-aware)
- kimi INFO runbook manual-erasure (marked TODO when /erase ships)
- qwen INFO doc path/status nits (already addressed by file moves)
Tests: 4/4 Gate 4 absence test (incl. new bypass-coverage), 3/3
attestation evidence checks pass on live data.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
parent
4708717f6b
commit
c7aa607ae4
@ -199,7 +199,17 @@ PLUS:
|
||||
| 6 | Cryptographic attestation pre-identityd | DONE — `scripts/staffing/attest_pre_identityd_biometric_state.sh` + `docs/attestations/BIPA_PRE_IDENTITYD_ATTESTATION_2026-05-03.md` (3/3 evidence checks pass; signature lines pending) | pending signature | **eng-DONE, signature-pending** |
|
||||
| 7 | Employee training material | scaffold deferred — Gate 5 runbook §7 acknowledgment may serve as substrate | pending | **deferred** |
|
||||
|
||||
Until items 1-5 + 6 are checked off, **identity service backfill (Phase 2 §5 Step 5) cannot proceed.**
|
||||
**Blocking set for Phase 2 backfill:** items **1, 2, 3, 4, 5, 6** must
|
||||
all be DONE. Item 7 (employee training) is reduced from blocking to
|
||||
"deferred" because the Gate 5 destruction runbook §7 already requires
|
||||
operator acknowledgment before legal-tier credentials are issued —
|
||||
that acknowledgment is procedurally equivalent to the training-record
|
||||
requirement when the operator population is small (J + 1-2 named
|
||||
operators). If the operator population grows beyond that, item 7
|
||||
re-promotes to blocking and a separate training program must be authored.
|
||||
|
||||
⚖ COUNSEL — confirm whether item 7 deferral is acceptable for the
|
||||
expected operator population size, or restore it to the blocking set.
|
||||
|
||||
**Calendar bottleneck:** Items 1, 2, 5, 6 (and #7) await counsel
|
||||
review of the engineering scaffolds. Gate 3 (photo-upload endpoint)
|
||||
|
||||
@ -22,27 +22,27 @@ tamper-evident store (filesystem with backups + version control).
|
||||
**Schema columns** (18 total):
|
||||
|
||||
```
|
||||
worker_id
|
||||
name
|
||||
role
|
||||
email
|
||||
phone
|
||||
city
|
||||
state
|
||||
zip
|
||||
skills
|
||||
certifications
|
||||
archetype
|
||||
reliability
|
||||
responsiveness
|
||||
engagement
|
||||
compliance
|
||||
availability
|
||||
communications
|
||||
resume_text
|
||||
worker_id int64 nullable=True
|
||||
name string nullable=True
|
||||
role string nullable=True
|
||||
email string nullable=True
|
||||
phone string nullable=True
|
||||
city string nullable=True
|
||||
state string nullable=True
|
||||
zip int64 nullable=True
|
||||
skills string nullable=True
|
||||
certifications string nullable=True
|
||||
archetype string nullable=True
|
||||
reliability double nullable=True
|
||||
responsiveness double nullable=True
|
||||
engagement double nullable=True
|
||||
compliance double nullable=True
|
||||
availability double nullable=True
|
||||
communications string nullable=True
|
||||
resume_text string nullable=True
|
||||
```
|
||||
|
||||
**Schema SHA-256:** `4ba17870ce25a186a62bdfc29a3b336947dc2fba8a62c42ca249c81f41d32e30`
|
||||
**Schema SHA-256:** `973b9abe56420de8f88122278b633e813f90a64cf0ddaac6a9811dc0940be676`
|
||||
|
||||
- PASS: no biometric / photo / face / image column present
|
||||
|
||||
@ -81,7 +81,7 @@ No biometric identifiers or biometric information from real
|
||||
candidates have been collected, processed, or stored prior to
|
||||
the deployment of the Phase 1.6 BIPA pre-launch gates.
|
||||
|
||||
**Evidence SHA-256:** `230fffeb77b502717bcd7161cc74d5a3401b8722acc8d6ed3d524f93e261cd0b`
|
||||
**Evidence SHA-256:** `1fdcc9f1682de27e1a0556d698ce221b74c1e71cf54128763828b4bca7b5c1bf`
|
||||
|
||||
---
|
||||
|
||||
|
||||
@ -65,6 +65,12 @@ function* walkSource(dir: string): Generator<string> {
|
||||
// definitionPatternsFor: returns regexes that match common DEFINITION
|
||||
// forms in JS/TS/HTML embedded scripts. A bare reference inside a
|
||||
// comment is intentionally NOT matched.
|
||||
//
|
||||
// 2026-05-03 opus scrum WARN (gate_4_test:60) added object-literal +
|
||||
// class-field patterns: a developer wrapping the lookup tables in
|
||||
// `const tables = { FEMALE_NAMES: [...] }` or a TypeScript class
|
||||
// field `FEMALE_NAMES: string[] = [...]` would have bypassed the
|
||||
// original 4 patterns silently.
|
||||
function definitionPatternsFor(symbol: string): RegExp[] {
|
||||
return [
|
||||
// var / const / let SYMBOL =
|
||||
@ -73,8 +79,16 @@ function definitionPatternsFor(symbol: string): RegExp[] {
|
||||
new RegExp(`\\bfunction\\s+${symbol}\\s*\\(`),
|
||||
// SYMBOL = function( OR SYMBOL = (...) =>
|
||||
new RegExp(`\\b${symbol}\\s*=\\s*(?:function\\s*\\(|\\([^)]*\\)\\s*=>|async\\s*(?:\\(|function))`),
|
||||
// class member: SYMBOL(...) { (a method declaration)
|
||||
// class method: SYMBOL(...) {
|
||||
new RegExp(`^\\s*${symbol}\\s*\\([^)]*\\)\\s*\\{`, "m"),
|
||||
// object-literal property assigned to an array OR object value:
|
||||
// { SYMBOL: [...] } or SYMBOL: {...} or SYMBOL: new Set(...)
|
||||
new RegExp(`(?:^|[,{])\\s*${symbol}\\s*:\\s*(?:\\[|\\{|new\\s+(?:Set|Map|Array)\\b)`, "m"),
|
||||
// TypeScript / class field with type annotation. The boundary
|
||||
// before SYMBOL is start-of-line OR `{`/`;`/`}` so single-line
|
||||
// class bodies (`class L { public NAMES_X: string[] = []; }`)
|
||||
// are caught alongside multi-line ones.
|
||||
new RegExp(`(?:^|[{};])\\s*(?:public|private|protected|readonly|static\\s+)*\\s*${symbol}\\s*:\\s*[^=;{]+=\\s*[\\[\\{]`, "m"),
|
||||
];
|
||||
}
|
||||
|
||||
@ -128,3 +142,29 @@ test("Gate 4: regex catches a synthetic positive (defense in depth)", () => {
|
||||
expect(offenders.some((o) => o.includes("NAMES_HISPANIC"))).toBe(true);
|
||||
expect(offenders.some((o) => o.includes("guessEthnicityFromFirstName"))).toBe(true);
|
||||
});
|
||||
|
||||
// 2026-05-03 opus scrum WARN regression: the bypass forms a developer
|
||||
// might use to wrap the lookup tables without tripping the original
|
||||
// four patterns. All of these MUST trip a definition regex.
|
||||
test("Gate 4: object-literal and class-field bypasses are caught", () => {
|
||||
const bypassForms = [
|
||||
// Inline object-literal property → array
|
||||
`const tables = { FEMALE_NAMES: ["Maria"] };`,
|
||||
// Object property → Set/Map constructor
|
||||
`const lookups = { NAMES_BLACK: new Set(["X"]) };`,
|
||||
// Multi-line object literal
|
||||
`const all = {\n NAMES_HISPANIC: [...],\n};`,
|
||||
// TypeScript class field with type annotation + initializer
|
||||
`class Lookup {\n SURNAMES_BLACK: string[] = ["X"];\n}`,
|
||||
// TypeScript public field
|
||||
`class L { public NAMES_EAST_ASIAN: string[] = []; }`,
|
||||
];
|
||||
for (const synthetic of bypassForms) {
|
||||
const offenders = findOffenders("bypass_synthetic", synthetic);
|
||||
if (offenders.length === 0) {
|
||||
throw new Error(
|
||||
`Gate 4 bypass not caught — pattern would slip past:\n ${synthetic}`,
|
||||
);
|
||||
}
|
||||
}
|
||||
});
|
||||
|
||||
@ -34,6 +34,17 @@
|
||||
set -uo pipefail
|
||||
cd "$(dirname "$0")/../.."
|
||||
|
||||
# Dependency gate: pyarrow is required to read the parquet schema. Fail
|
||||
# fast with a clear message rather than letting python3 -c emit a stack
|
||||
# trace that gets captured into the attestation as "evidence". (Caught
|
||||
# 2026-05-03 kimi scrum WARN python3-reliance.)
|
||||
if ! python3 -c "import pyarrow" 2>/dev/null; then
|
||||
echo "[attest] FAIL: python3 -c 'import pyarrow' failed." >&2
|
||||
echo "[attest] pyarrow is required to verify workers_500k.parquet schema." >&2
|
||||
echo "[attest] Install with: pip install pyarrow" >&2
|
||||
exit 2
|
||||
fi
|
||||
|
||||
DATE="${OVERRIDE_DATE:-$(date -u +%Y-%m-%d)}"
|
||||
OUT_DIR="docs/attestations"
|
||||
OUT="$OUT_DIR/BIPA_PRE_IDENTITYD_ATTESTATION_${DATE}.md"
|
||||
@ -62,12 +73,20 @@ if [ ! -r "$WORKERS_PARQUET" ]; then
|
||||
rm -f "$EVIDENCE"
|
||||
exit 2
|
||||
fi
|
||||
# Hash NAME + TYPE + nullability per column, not just names. A schema
|
||||
# fingerprint over names alone would not invalidate if a column got
|
||||
# repurposed (e.g. resume_text reused to hold base64 photo bytes under
|
||||
# its existing name). Including types catches that class of evasion.
|
||||
# (Caught 2026-05-03 opus scrum WARN on attestation:18.)
|
||||
SCHEMA=$(python3 -c "
|
||||
import sys, pyarrow.parquet as pq
|
||||
schema = pq.read_schema('$WORKERS_PARQUET')
|
||||
for f in schema:
|
||||
print(f.name)
|
||||
print(f'{f.name}\t{f.type}\tnullable={f.nullable}')
|
||||
" 2>&1)
|
||||
# Bash assigns + propagates the substitution's exit through \$?.
|
||||
# Verified: X=\$(false); echo \$? -> 1. opus 2026-05-03 BLOCK on this
|
||||
# location was a false positive — the check IS the python3 exit gate.
|
||||
if [ $? -ne 0 ]; then
|
||||
echo "[attest] FAIL: schema read error: $SCHEMA" >&2
|
||||
rm -f "$EVIDENCE"
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user