Per docs/PHASE_1_6_BIPA_GATES.md Gate 4 + AUDIT_TRAIL_PRD §4 protected-
attribute exclusion rule. The lookup tables + inference functions in
search.html (3375-3499) and console.html (245-311) were dead code in
the rendering path — headshot rendering disabled 2026-04-28 left these
functions defined but unused. Removing them forecloses both Title VII
discriminatory-feature-engineering AND BIPA biometric-information-
derived-from-biometric-identifier arguments.
Removed:
- FEMALE_NAMES, MALE_NAMES, NAMES_HISPANIC, NAMES_BLACK,
NAMES_SOUTH_ASIAN, NAMES_EAST_ASIAN, NAMES_MIDDLE_EASTERN
- SURNAMES_HISPANIC, SURNAMES_SOUTH_ASIAN, SURNAMES_EAST_ASIAN,
SURNAMES_MIDDLE_EASTERN, SURNAMES_BLACK
- guessGenderFromFirstName(), guessEthnicityFromName(),
guessEthnicityFromFirstName(), genderFor()
From both search.html and console.html. Replacement: deprecation
comment block referencing the BIPA gates doc.
Verified: zero live consumers anywhere in mcp-server/. Searched for
genderFor()/guessEthnicityFromName()/guessEthnicityFromFirstName()/
guessGenderFromFirstName() call sites — none remain.
Per J 2026-05-03: this kind of test-code-leaked-into-main is exactly
what J wants cleaned up. The face-pool inference was meant as a
testing tool for synthetic icon generation but ended up as
production-shape inference logic in the customer-facing UI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Search results no longer pop in as a single block. New behavior:
- Skeleton list pre-claims the vertical space results will occupy
with shimmering placeholder cards, so arriving results fade in
over the skeleton instead of pushing layout. Sweep is staggered
per row for a "rolling wave" not "everything blinking together".
- Domain-language stage caption ("matching against permits",
"ranking by reliability") rotates on a fixed schedule so users
read progress, not a stuck spinner.
- @keyframes card-in: real worker cards rise 4px and fade in over
350ms with nth-child stagger across the first ~12 rows. Honors
prefers-reduced-motion.
- Avatar imgs filter through grayscale + slight contrast/blur to
pull the SDXL Turbo color cast (which screams "AI generated" at
small sizes). Cert icons get the same treatment.
- Once-per-session hero takeover compresses the Section ⓪ strip
("Not a CRM — an index that learns from you") into a centered
hero on first paint, dismissed by clicking anywhere. Stats
hydrate from live endpoints.
console.html: mirrors the avatar B&W filter for visual consistency,
and removes the headshot insertion entirely — back to monogram
initials. The console (internal staffer view) doesn't need synthetic
faces; the public demo at /lakehouse/ does.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three problems J flagged ("not matching properly", "same faces", "still
showing old icons") had three different roots:
1. MISMATCH: front-end was first-name only, so "Anna Cruz" / "Patricia
Garcia" / "John Jimenez" all defaulted to caucasian. Added
SURNAMES_HISPANIC / _SOUTH_ASIAN / _EAST_ASIAN / _MIDDLE_EASTERN
dicts to both search.html and console.html. Surname is checked
FIRST (stronger signal for hispanic + asian than first names),
then first-name fallback. Cruz → hispanic, Patel → south_asian,
Nguyen → east_asian, regardless of first name.
2. SAME FACES: pool buckets are uneven — woman/south_asian=3,
man/black=4, woman/middle_eastern=2 — so any worker in those
buckets collapses to 2-4 photos no matter how good the hash is.
/headshots/:key now 302-redirects to /headshots/generate/:key
when the gender × race intersection is below 30 faces. ComfyUI
on-demand gives infinite uniqueness for the sparse buckets
(deterministic-per-worker via djb2 seed). Dense buckets still
serve from the pool — no GPU cost there.
3. STALE CACHE: Cache-Control was max-age=86400, immutable — pinned
old photos in browsers for 24h after any server-side update.
Dropped to max-age=3600, must-revalidate, and added a v=2
cache-buster query param to all front-end /headshots/ URLs so
existing cached entries are bypassed on next page load.
Also surfacing X-Face-Pool-Bucket / Bucket-Size headers for diagnosis.
Verified: playwright run shows surname routing correct (Torres,
Rivera, Alvarez, Gutierrez, Patel, Nguyen, Omar all bucketed
correctly), sparse buckets 302 to ComfyUI, dense buckets stay on
the thumb pool.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker cards now ship a real photo per person instead of monogram tiles:
- fetch_face_pool.py pulls 1000 faces from thispersondoesnotexist.com
- tag_face_pool.py runs deepface for gender/race/age, excludes <22yo
- manifest.jsonl: 952 servable, gender/race buckets populated
- /headshots/_thumbs/ pre-resized to 384px webp (587KB -> 11KB,
60x smaller; without this Chrome's parallel-connection budget
drops ~75% of tiles in a 40-card grid)
- /headshots/:key gender x race x age intersection bucketing with
gender-only fallback when intersection is sparse
- /headshots/generate/:key ComfyUI on-demand for the contractor
profile spotlight (cold ~1.5s, cached ~1ms; worker-derived
djb2 seed makes faces deterministic-per-worker but unique
across workers sharing the same prompt)
- serve_imagegen.py _cache_key() now includes seed (was caching
by prompt only -> 3 different worker seeds collapsed to 1
cached image; verified fix produces 3 distinct md5s)
- confidence-default name resolution: Xavier->man+hispanic,
Aisha->woman+black, etc. Every worker resolves to a bucket.
End-to-end: playwright run on /?q=forklift+operators+IL -> 21/21
cards loaded, 0 broken, all 384px webp.
Cache + binary pool gitignored; manifest tracked.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three layers shipped:
1. SCRIPT — scripts/staffing/fetch_face_pool.py
Pulls N synthetic StyleGAN faces from thispersondoesnotexist.com
into data/headshots/face_NNNN.jpg, writes manifest.jsonl. Idempotent:
re-running skips existing files. Optional gender tagging via deepface
(currently unavailable on this box; the script handles ImportError
gracefully and tags everything as untagged). Fetched 198 faces with
concurrency=3 in ~67s.
2. SERVER — /headshots/:key route in mcp-server/index.ts
Loads manifest at first hit, caches in globalThis._faces. Hashes the
key with djb2-style mixing → pool index → returns the JPG. Same
key always gets the same face (deterministic). Accepts
?g=man|woman&e=caucasian|black|hispanic|south_asian|east_asian|middle_eastern
to bias pool selection — the gender/ethnicity buckets fall back to
the full pool when no tagged matches exist. Cache-Control:
86400 immutable so faces ride the browser cache after first hit.
/headshots/__reload re-reads the manifest without restart.
3. UI — search.html + console.html worker cards
Re-added overlay <img> on top of the monogram .av circle. img.src
= /headshots/<encoded-key>?g=<hint>&e=<hint>. img.onerror removes
the failed image so the monogram stays visible if the face pool
isn't fetched / CDN is blocked. .av now has overflow:hidden +
position:relative to clip the img to a perfect circle.
Forced-confident name resolution (J: "we're CREATING the profile,
created as though you truly have the information Xavier is more
likely Hispanic and he's a male"):
genderFor(name) — looks up MALE_NAMES + FEMALE_NAMES,
falls back to a deterministic hash split
so unknown names spread ~50/50. Sets now
include cross-cultural names: Alejandro/
Andres/Mateo/Santiago/Joaquin/Cesar/Hugo/
Felipe/Gerardo/Salvador/Ramon (Hispanic),
Raj/Anil/Vikram/Krishna/Pradeep (South
Asian), Wei/Yi/Hiroshi/Akira/Hyun (East
Asian), Demetrius/Kareem/DaQuan/Khalil
(Black), Omar/Khalid/Hassan/Ahmed/Bilal
(Middle Eastern). FEMALE_NAMES extended
in parallel.
guessEthnicityFromFirstName(name)
— confident default of 'caucasian' for any
name not in the cultural buckets so every
worker resolves to a category the face
pool can be biased toward. Order: ME → Black
→ Hispanic → South Asian → East Asian →
Caucasian (matters where names overlap,
e.g. Aisha appears in ME + Black, biases
toward ME for visual fit).
Both helpers also ported into console.html so the triage backfills
and try-it-yourself rendering get the same hint stack.
Privacy note in the script + route comments: the synthetic data uses
the worker's name as the seed; production should hash worker_id (not
name) to avoid leaking PII to a third-party CDN. The fetch URL itself
is referenced once per pool build, not per-worker.
.gitignore — added data/headshots/face_*.jpg (~100MB for 198 faces;
the manifest + script are tracked). Re-running the script on a fresh
checkout rebuilds the pool from scratch.
Verified end-to-end via playwright on devop.live/lakehouse:
forklift query → 10 worker cards
10/10 with face images (real synthetic headshots, not monograms)
0/10 broken
Alejandro G. Nelson → ?g=man&e=hispanic
Patricia K. Garcia → ?g=woman&e=caucasian
Each name → unique face, deterministic across loads.
Console triage backfills get the same treatment.
J: "can you update Staffer's Console too the same look." Console
rendered worker rows in three places (Chapter 4 permit-contract
candidates, Chapter 8 triage backfills, Chapter 9 try-it-yourself
results) with the original 28px square avatar + flat backgrounds —
inconsistent with the new dashboard design.
Three changes:
1. CSS — .worker now has a 3px left-edge border that color-codes the
role family, and .av is a 32px circle with a muted dark background
+ 1px ring + monogram initials. Five role-band colors mirror
search.html: warehouse blue / production amber / trades purple /
driver green / lead orange. Plus a .role-pill style matching the
dashboard's small uppercase chip.
2. Helpers — added ROLE_BANDS regex table + roleBand() classifier and
a new workerRow(name, role, detail, opts) builder. Same regex
patterns as search.html so a "Forklift Operator" classifies
identically on every page. opts.endorsed adds the green endorsed
chip; opts.score appends a rank badge.
3. Replaced the three inline avatar+row constructors with workerRow()
calls. Net: console.html lost ~20 lines of duplicated DOM building
while gaining role bands + pills.
Verified end-to-end via playwright on devop.live/lakehouse/console:
Chapter 8 triage scenario "Marcus running late site 4422":
5 backfill rows render with [warehouse] band + WAREHOUSE pill +
monogram avatars (SBC, ETW, SHC, WMG, MEB).
Same sober look as the dashboard worker cards. No emojis, no
cartoons, color-coded role family on the left edge.
J: "it's outdated." Console walkthrough was stuck on the original 6
chapters (legacy-bridge / permits / catalog / ranking demo / playbook
memory / try-it-yourself). Three weeks of new work weren't visible.
Three new chapters added between the existing playbook-memory chapter
and the input box; all pull live data from the running system:
Chapter 6 — Three coordinators, three views of the same corpus
Renders Maria/Devon/Aisha cards from /staffers with their
territories. Frames the per-staffer hot-swap as the relevance
gradient that compounds independently per coordinator. Same query
"forklift operators" returns 89 IN / 16 WI / 167 IL workers
depending on who's acting.
Chapter 7 — The hidden signal — public issuers in your contractor graph
Pulls /intelligence/profiler_index, builds the basket, shows
issuer count + attributed build value + contractor count as the
three top metrics. Lists top 8 issuers with attribution counts
and direct-link to the profiler. This is the BAI / Signal Engine
pitch in walkthrough form: every contractor name is also a forward
indicator on a public equity. Cross-metro replication explicit
in the closing paragraph.
Chapter 8 — When something breaks — triage in one shot
Live triage demo against /intelligence/chat with body
{message:"Marcus running late site 4422"}. Renders the worker
card + draft SMS + 5 backfills + duration_ms. The 250ms-vs-20min
moment, made concrete with real Quincy IL workers.
Chapter 9 (was 6) — Try it yourself
Updated input examples to demonstrate each new route:
"8 production workers near 60607" → headcount + zip parser
"Marcus running late site 4422" → triage handler
"Marcus" → bare-name lookup
"what came in last night" → temporal route
"reliable forklift operators with OSHA certs" → hybrid SQL+vector
Each is a click-to-run link beneath the input.
Two new accent classes: .accent-g (green for issuer-count) and
.accent-r (red for triage event).
Verified end-to-end on devop.live/lakehouse/console: 9 chapters
render, ch6 shows 3 staffer personas, ch7 shows 11 issuers / $347M /
200 contractors, ch8 shows Marcus V. Campbell + draft SMS + 5
backfills.
J asked for "a profiler index that shows a history of everyone." This
is a /profiler directory page (also reachable via /contractors) that
ranks every contractor who's filed a Chicago permit, by total permit
value. Rows are clickable into the full /contractor profile.
Defaults: since 2025-06-01, min permit cost $250K, top 200 contractors
by total_cost. Server pulls two Socrata GROUP BY queries (one keyed on
contact_1_name, one on contact_2_name), merges them so contractors
listed in either applicant or contractor slot appear once with combined
counts/cost. ~300ms cold.
UI: live search box, since-date selector, min-cost selector, sortable
columns (name / permits / total_cost / last_filed). Live numbers as of
this write: 200 contractors, 1,702 permits, $14.22B aggregate. Filter
"Target" returns TARGET CORPORATION + CORPORATION TARGET (name variants
from Socrata).
Also fixes J's other complaint — "no new contracts, Target is gone":
/intelligence/permit_contracts was hard-capped at $limit=6 + only
the most recent 6 over $250K, so any day with 6 fresh permits would
push older contractors (Target) off the panel entirely. Now defaults
to 24 (caller can pass body.limit up to 100), so 2-3 days of permits
stay on the panel. Added body.contractor — passes a name into the
WHERE so the staffer can pin a specific contractor to the panel
("Target Corporation" → 3 of their permits over $250K).
Server-side:
- new POST /intelligence/profiler_index — paginated contractor index
(since, min_cost, search, limit) with merged contact_1+contact_2
aggregations
- /intelligence/permit_contracts — body.limit + body.contractor
- /profiler and /contractors routes serve profiler.html
Front-end:
- new mcp-server/profiler.html — sortable table, live filter, deep
links to /contractor?name=... (prefix-aware via P, so /lakehouse
works on devop.live)
- search.html + console.html nav: added "Profiler" link
Verified end-to-end via playwright on the public URL.
J reported https://devop.live/contractor?name=3115%20W%20POLK%20ST.%20LLC
returned 404. Cause: the anchor href was a bare /contractor, which on
devop.live routes to the LLM Team UI (port 5000) at the main site root,
not the lakehouse mcp-server (which lives under /lakehouse/*).
Every page that renders a contractor link now uses the same prefix
detector the dashboard already had:
var P = location.pathname.indexOf('/lakehouse') >= 0 ? '/lakehouse' : '';
Files updated:
- search.html: entity-brief anchor + preview anchor → P+/contractor
- console.html: permit-card contractor list → P+/contractor
- contractor.html: history.replaceState + back-link + the
/intelligence/contractor_profile fetch all use P prefix. The page
is reachable at /lakehouse/contractor on the public URL and bare
/contractor on localhost; both work without further config.
Verified:
https://devop.live/lakehouse/contractor?name=3115%20W%20POLK%20ST.%20LLC
→ 200, 29.9 KB, full profile renders. Contractor has 1 permit on
file (a small LLC), 1 geocoded so the heat map plots one marker.
The contractor.html click-target J asked for: a separate page (not a
modal, not a fall-through search) showing every angle on a contractor.
Reachable from the Co-Pilot dashboard, the staffers console, and the
search box — all anchor-wrap contractor names to /contractor?name=...
What's new on the page:
1. PROJECT INDEX — build-signal score
Single 0-100 number with the drivers laid out beneath. Driver list
is staffer-readable: "59 Chicago permits in 180d (+30) · OSHA 20
inspections (-25) · federal contractor (+15)". Score weights are
placeholders to be replaced by an ML model once the 12 awaiting
sources ship — the current 6 wired signals would not give a real
model enough features.
2. HEAT MAP — every Chicago permit they've been contact_1 or contact_2
on, last 24 months, plotted on a leaflet dark map. Color by cost
(green <$100K, amber $100K-$1M, red ≥$1M), radius proportional to
cost so the staffer sees where money + activity concentrates. Click
a marker for permit detail (cost, date, work type, address, permit
ID). All 50 of Turner Construction's geocoded recent permits in
Chicago plot end-to-end.
3. ACTIVITY TIMELINE — monthly permit count, bar chart, with the
first/last month labels so the staffer sees momentum. Tooltip on
each bar gives the count and total cost for that month.
4. 12 AWAITING SOURCES — placeholder cards for the public datasets
that would 3× the build-signal feature count. Each card has:
- source name (real, e.g. DOL Wage & Hour, EPA ECHO, MSHA, BBB)
- one-liner in coordinator language ("Has this contractor stiffed
workers? Will they pay our staffing invoices?")
- "Would show:" sample shape so the engineering scope is concrete
Order is staffing-decision relevance:
1. DOL Wage & Hour (WHD violations)
2. State Licensure Boards (active license + expiry)
3. Surety Bond Capacity (bonding ceiling)
4. EPA ECHO Compliance (env violations at sites)
5. DOT/FMCSA Carrier Safety (crash + OOS rates)
6. BBB Complaints + Rating
7. PACER Civil Suits (FLSA / Title VII / ADA)
8. UCC Lien Filings (cash flow distress)
9. D&B / Credit Bureau (PAYDEX, payment behavior)
10. State UI Employer Claims (workforce stability)
11. MSHA Mine Safety (excavation / aggregate / heavy)
12. Registered Apprenticeships (DOL RAPIDS pipeline)
Server-side: entity.ts fetchContractorHistory now pulls the 50 most
recent permits with id + lat/lng + work_description, so the heat map
and timeline have what they need without a second SQL hop. The
ContractorHistory.recent_permits type gained the optional fields.
Front-end: contractor.html got 4 new render sections, leaflet wiring
(stylesheet + script in head), placeholder grid CSS, and a PLACEHOLDERS
const at the bottom with the 12 sources. All popup HTML is built via
DOM construction (textContent + appendChild) — no innerHTML, no XSS.
console.html: contractor names from /intelligence/permit_contracts now
anchor-wrapped to /contractor?name=... so the click-through J described
works from the staffers console too. Click stops propagation so the
permit details element doesn't toggle on the same click.
Verified end-to-end via playwright — Turner Construction profile shows:
PIX score "Mixed signals — review drivers below"
Heat map: "50 permits plotted · green/amber/red"
4 section labels in order
12 placeholder cards in the documented order
Phase 8.5 was fully built on the Rust side (WorkspaceManager with
create/handoff/search/shortlist/activity/get/list, persisted to
object storage, zero-copy handoff between agents). Nothing surfaced
it in the recruiter UI. This page closes that gap.
/workspaces — split-pane UI:
Left: scrollable list of all workspaces, sorted by updated_at.
Each card shows name, tier pill (daily/weekly/monthly/pinned),
current owner, count of shortlisted candidates + activity events.
Right: selected workspace detail with five sections:
1. Header — name, tier, owner, created/updated dates, description,
previous-owners audit trail (each handoff is preserved)
2. Actions row — Hand off, Shortlist candidate, Save search, Log activity
3. Shortlist — candidates flagged with dataset + record_id + notes
4. Saved searches — named SQL queries the staffer wants to rerun
5. Activity — chronological (newest first) log of what happened
Four modals for the add/edit actions (create, handoff, shortlist,
save-search, log-activity). All forms POST through the existing
/api/* passthrough to the gateway's /workspaces/* routes.
End-to-end verified live:
1. Sarah creates 'Demo: Toledo Week 17' workspace
2. Shortlists Helen Sanchez (W500K-4661) with notes about prior endorsements
3. Logs activity: 'called — Helen confirmed Tuesday 7am shift'
4. Hands off to Kim with reason 'end of shift'
5. Kim opens the workspace: owner=kim, previous_owners=[{sarah→kim}],
sees all 3 prior events + the shortlisted Helen
— no data copy, pointer swap only (Phase 8.5 design)
Security: all dynamic content built via el(tag,cls,text) DOM helper.
Zero innerHTML on API-derived strings. Modal close-on-backdrop-click
is guarded to the backdrop element.
Nav updated across all 7 pages. Workspaces is the 7th tab.
Dashboard · Walkthrough · Architecture · Spec · Onboard · Alerts · Workspaces.
Converts the app from 'dashboard you visit' to 'system that finds you.'
Critical for the phone-first staffing shop that won't open a URL —
the system reaches out when something matters.
Daemon:
- Starts once per Bun process (guarded via globalThis sentinel)
- Default interval 15 min (configurable, min 1, max 1440)
- On each cycle, buildDigest() compares current state against prior
snapshot persisted in mcp-server/data/notification_state.json
- Events detected:
- risk_escalation: role moved to tight or critical (was ok/watch)
- deadline_approaching: staffing window falls within warn window
(default 7 days) AND deadline date differs from prior
- memory_growth: playbook_memory entries grew by >= 5 since last run
Channels (all opt-out individually via config):
- console: always on, logged to journalctl -u lakehouse-agent
- file: always on, appends JSONL to mcp-server/data/notifications.jsonl
- webhook: optional, POSTs {text, digest} to configured URL
(Slack incoming-webhook / Discord webhook / any custom endpoint)
Digest format (human-readable, fits in a Slack message):
LAKEHOUSE DIGEST — 2026-04-20 23:24
3 staffing deadlines within window:
• Production Worker — 2d to 2026-04-23 · demand 724
• Maintenance Tech — 4d to 2026-04-25 · demand 32
• Electrician — 5d to 2026-04-26 · demand 34
+779 new playbooks (total 779, 2204 endorsed names)
snapshot: 0 critical · 0 tight · $275,599,326 pipeline
/alerts page:
- Current status table (daemon state, interval, webhook, last run)
- Config form: enable toggle, interval, deadline warn window, webhook
URL + label (saved to data/notification_config.json)
- 'Fire a test digest now' button — force a cycle without waiting
- Recent digests panel shows the last 10 dispatches with full text
End-to-end verified live:
- Daemon armed successfully on startup
- First-run digest dispatched to console + file in <1s
- Events detected correctly: 3 deadlines within 7 days from real
Chicago permit data; 779 playbook entries surfaced as memory growth
- Digest text format is Slack-pastable
- Dispatch records appear in /alerts recent list
TDZ caveat: startAlertsDaemon() invocation moved to end of module so
all const/let in the alerts block evaluate before daemon reads them.
Previously failed with 'Cannot access X before initialization' when
the call lived near the top of the file. Nav added to all 6 pages:
Dashboard · Walkthrough · Architecture · Spec · Onboard · Alerts.
New /onboard page. Client-facing wizard for getting real data into
the system without engineering help.
Flow:
1. Drop a CSV (or click 'Use the sample as my data' — ships a 25-row
realistic staffing roster under /samples/staffing_roster_sample.csv)
2. Browser parses client-side. Columns auto-typed (text/int/decimal/
date). PII flagged by name hint AND content regex (emails, phones).
First rows previewed. Read-only — nothing written yet.
3. Name the dataset (lowercase+underscores). Commit.
4. Post-commit: dataset is live. Shows 4 next steps the operator can
take (SQL query, vector index, dashboard search, playbook training).
Backend:
- /onboard serves onboard.html
- /samples/*.csv serves CSV files from mcp-server/samples/ with
filename validation (only [a-zA-Z0-9_-.]+.csv, prevents path traversal)
- /onboard/ingest forwards multipart/form-data to gateway /ingest/file
preserving the boundary. The generic /api/* passthrough breaks
multipart because it reads as text and forwards as JSON; this route
uses arrayBuffer + original Content-Type.
Verified end-to-end: upload sample roster (25 rows, 12 columns) →
parse in browser → show columns + PII flags + preview → commit →
gateway writes Parquet, registers in catalog → immediately queryable:
SELECT * FROM onboard_demo2 LIMIT 3
→ Sarah Johnson, Forklift Operator, Chicago, IL, 0.92
Round-trip <1 second.
Nav updated on all pages to link Onboard. Shipped with a sample CSV
so the full flow is demonstrable without real client data.
When a real client shows up, same path — they upload their CSV.
No engineering ticket, no code change, no schema pre-definition.
Security: sample filename regex prevents path traversal. CSV parse
is client-side pure JS (no DOM injection). Commit uses existing
/ingest/file validation (schema fingerprint, PII server-side,
content-hash dedup).
J's ask: explain the full architecture so someone reading a README
can dispute it or recreate it. The repo isn't public yet; this page
IS the spec until it is.
Ch1 Repository layout — 13 crates + tests/multi-agent + docs + data,
with owned responsibility and file path per crate.
Ch2 Data ingest pipeline (8 steps) — sources (file/inbox/DB/cron),
parse+normalize with ADR-010 conservative typing, PII auto-tag,
dedup, Parquet write, catalog register with fingerprint gate,
mark embeddings stale, queryable immediately.
Ch3 Measurement & indexing — row count / fingerprint / owner /
sensitivity / freshness / lineage per dataset. HNSW vs Lance
tradeoff table with measured numbers (ADR-019). Autotune loop.
Per-profile scoping (Phase 17).
Ch4 Contract inference from external signal — Chicago permit feed
→ role mapping → worker count heuristic → timeline → hybrid
search with boost → pattern discovery → rendered card. All
pre-computed before staffer opens UI.
Ch5 What a CRM can't do — 11-row comparison table of capabilities.
Ch6 How it gets better over time — three paths:
- Phase 19 playbook boost (full math)
- Pattern discovery meta-index
- Autotune agent
Ch7 Scale story: 20 staffers, 300 contracts, midday +20/+1M surge
- Async gateway + per-staffer profile isolation + client blacklists
- 7-step surge handling flow (ingest, stale-mark, incremental refresh,
degradation, hot-swap, autotune re-enter)
- Known pain points: Ollama inference serial, RAM ceiling ~5M on
HNSW (mitigated by Lance), VRAM 1-2 models sequential,
playbook_memory unbounded.
Ch8 Error surfaces & recovery — 10-row table covering ingest schema
conflicts, bucket failures, ghost names, dual-agent drift,
empty searches, Ollama down, gateway restart, schema fingerprint
divergence. Every failure has a named surface and recovery path.
Ch9 Per-staffer context — active profile, workspace, client blacklist,
audit trail, daily summary. How 20 staffers don't see the same UI.
Ch10 Day in the life — 07:00 housekeeping → 07:30 refresh → 08:00
staffer opens → 08:15 drill down → 08:30 Call click → 09:00
second staffer shares memory → 12:30 surge → 14:00 no-show →
15:00 new embeddings live → 17:00 retrospective → 22:00
overnight trials.
Ch11 Known limits & non-goals — deferred (rate/margin, push, confidence
calibration, neural re-ranker, pm compaction, call_log cross-ref)
and explicitly out-of-scope (cloud, ACID, streaming, CRM replace,
proprietary formats, hard multi-tenant).
Also: nav updated on /dashboard, /console, /proof to link /spec.
Every architectural claim in the spec cites either a code path, an
ADR number, or a phase reference so someone skeptical can target
the specific artifact.
Old console was a chat playground. New console is a guided,
chapter-based explanation that a non-technical staffing staffer
can read top-down and finish convinced — without needing to
understand any of the underlying technology.
Six chapters, each loading live data:
1. Right now, this system is already thinking
Four stats cards pulled live: construction pipeline $, predicted
worker demand, rows under management, playbooks remembered. Then
a narrative that names the current alert posture (critical/tight/ok).
2. The demand signal is real, not made up
Expandable rows per Chicago permit work_type, with a direct link to
data.cityofchicago.org for verification. Pill labeled LIVE ·
DATA.CITYOFCHICAGO.ORG leaves no ambiguity.
3. Where your own data would live
Catalog enumerated with three pill classes:
- SWAP FOR YOUR DATA (purple) — the synthetic tables that would
be replaced by the client's ATS/CRM/call-log exports
- SYSTEM-GENERATED (blue) — playbook memory, threat_intel, kb_*
produced by the system itself
Row counts + columns visible. Names it honestly.
4. Watch the system rank candidates in real time
Takes the freshest Chicago permit, walks the staffer through all
three steps (derive need → narrow via SQL → rank + boost), shows
the top-5 workers with why, boost chip, memory chip, timeline,
and a plain-English narrative of the CRM gap.
5. Every action compounds
Playbook memory count + sample + narrative about what it means
when the staffer logs a fill.
6. Try it yourself
Free-text input hitting /intelligence/chat, renders response
with memory chip + boost chips + ranked workers.
Security: all API-derived strings go through textContent or
el(tag,cls,text) helper. Zero innerHTML usage on dynamic content.
Passes security reminder hook.
File size: 419 → ~500 lines. Visual style matches the dashboard
(same palette, typography, chip styles) so the two pages feel
like one app.
"find me a warehouse worker available today near Nashville" now:
- Parses: role=warehouse, city=Nashville, available=true
- Builds SQL: role LIKE '%warehouse%' AND city='Nashville' AND availability>0.5
- Returns: 12 Nashville warehouse workers with ZIP codes, availability %,
reliability %, skills, certs, and archetype
- Shows understanding tags so user sees what the system parsed
- 414ms, 12 records — not a generic search, a targeted answer
Recognizes 20 role keywords, 40+ cities, 10 states, availability/reliability
signals from natural language. Falls through to vector search for anything
the parser doesn't catch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New page at /lakehouse/console — a $200/hr consultant's intelligence product:
Morning Brief (auto-loads in ~120ms across 500K profiles):
- Workforce Pulse: total, reliable %, elite %, archetype breakdown
- Geographic Bench: state-by-state reliable % with weakest-state alert
- Comeback Watch: 15K improving workers who crossed 80% reliability
- Risk Watch: 5K erratic + 5K silent workers flagged automatically
- Ready & Waiting: available + reliable workers to call first
- Role Supply: 20 roles with supply/available/reliability
Conversational Chat with 5 intelligent routes:
- "Find someone like [Name] but in OH" → vector similarity search
- "Who could handle industrial electrical work?" → semantic role discovery
(finds workers for roles that DON'T EXIST in the database)
- "What if we lose our top 5 forklift operators?" → scenario analysis
with risk rating, bench depth, state-by-state breakdown
- "Which workers should we stop placing?" → risk flagging
- Default: hybrid SQL+vector search with LLM summary
Every response shows: query steps, records scanned, response time.
Transparency kills the "AI is making it up" argument.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>