Compare commits
10 Commits
d571d62e9b
...
f4dc1b29e3
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
f4dc1b29e3 | ||
|
|
f892230699 | ||
|
|
4b92d1da91 | ||
|
|
1745881426 | ||
|
|
a05174d2fa | ||
|
|
f9a408e4c4 | ||
|
|
a3b65f314e | ||
|
|
10ed3bc630 | ||
|
|
cdf5f5926a | ||
|
|
f92b55615f |
8
.gitignore
vendored
8
.gitignore
vendored
@ -4,3 +4,11 @@
|
||||
.env
|
||||
__pycache__/
|
||||
*.pyc
|
||||
|
||||
# Headshot pool — binary face JPGs are fetched by scripts/staffing/fetch_face_pool.py
|
||||
# (synthetic StyleGAN, ~580MB for 1000 faces). Manifest + fetch script are tracked.
|
||||
data/headshots/face_*.jpg
|
||||
data/headshots/_thumbs/
|
||||
# ComfyUI on-demand generated portraits (per-worker unique). Cached on first
|
||||
# request; fully regeneratable via /headshots/generate/:key.
|
||||
data/headshots_gen/
|
||||
|
||||
239
STATE_OF_PLAY.md
Normal file
239
STATE_OF_PLAY.md
Normal file
@ -0,0 +1,239 @@
|
||||
# STATE OF PLAY — Lakehouse
|
||||
|
||||
**Last verified:** 2026-04-27 ~20:35 CDT
|
||||
**Verified by:** live probe, not memory.
|
||||
|
||||
> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes.
|
||||
|
||||
---
|
||||
|
||||
## VERIFIED WORKING RIGHT NOW
|
||||
|
||||
### The client demo (Staffing Co-Pilot)
|
||||
|
||||
**Public URL:** `https://devop.live/lakehouse/` — 200, "Staffing Co-Pilot" (159 KB SPA, leaflet maps, dark theme).
|
||||
**Local URL:** `http://localhost:3700/` — same page, served by `mcp-server/index.ts` (PID 1271, started 09:48 CDT today).
|
||||
|
||||
**The staffers console** (the one the client was thoroughly impressed with):
|
||||
- `https://devop.live/lakehouse/console` — 200, "Lakehouse — What Your Staffing System Would Do" (26 KB)
|
||||
- Pulls project index via `/api/catalog/datasets` (36 datasets) + playbook memory via `/api/vectors/playbook_memory/stats` (4,701 entries with embeddings, real ops like *"fill: Maintenance Tech x2 in Milwaukee, WI"*)
|
||||
|
||||
Client-visible flow that works end-to-end on the public URL:
|
||||
|
||||
| Endpoint | Sample output |
|
||||
|---|---|
|
||||
| `GET /api/catalog/datasets` | 36 datasets indexed: timesheets 1M, call_log 800K, workers_500k 500K, email_log 500K, workers_100k 100K, candidates 100K, placements 50K, job_orders 15K, successful_playbooks_live 2,077 |
|
||||
| `GET /api/vectors/playbook_memory/stats` | 4,701 fill operations with embeddings |
|
||||
| `GET /system/summary` | 36 datasets, 2.98M rows, 60 indexes, 500K workers loaded, 1K candidates |
|
||||
| `POST /intelligence/staffing_forecast` | 744 Production Workers needed in 30d, 11,281 bench (4,687 reliable), coverage 1,444%, risk=ok. Same for Electrician (need 32, bench 2,440) and Maintenance Tech (need 17, bench 5,004). |
|
||||
| `POST /intelligence/permit_contracts` | permit `3442956` $500K → 3 Production Workers, 886-candidate pool, 95% fill, $36K gross. 5 more Chicago permits with 8 workers each, same pool, 95% fill, $96K each. |
|
||||
| `POST /intelligence/market` | major Chicago permits ranked: $730M O'Hare, $615M 307 N Michigan, $580M casino, $445M Loop transit (real geo coords). |
|
||||
| `POST /intelligence/permit_entities` | architects + contractors from permit contacts (e.g. "KACPRZYNSKI, ANDY", "SLS ELECTRICAL SERVICE"). |
|
||||
| `POST /intelligence/activity` + `/intelligence/arch_signals` + `/intelligence/chat` | all 200 |
|
||||
|
||||
The demo tells the story: *"upcoming Chicago contracts → workers needed → coverage from the bench → architects/contractors involved → revenue and margin."* That's the "live data + anticipating contracts + complete workflow" pitch — working as of right now.
|
||||
|
||||
### Backend, verified live this session
|
||||
|
||||
| Surface | State |
|
||||
|---|---|
|
||||
| Gateway `:3100` | up, 4 providers configured, `/v1/health` 200 with 500K workers loaded |
|
||||
| MCP server `:3700` (Co-Pilot demo) | up, all `/intelligence/*` endpoints respond |
|
||||
| VCP UI `:3950` | started this session, `/data/*` 200, real numbers |
|
||||
| Observer `:3800` | ring full (2,000/2,000) — older events evicted, query Langfuse for 24h-ago state |
|
||||
| Sidecar `:3200` | up |
|
||||
| Langfuse `:3001` | recording, `gw:/log` + `v1.chat:openrouter` traces visible |
|
||||
| LLM Team UI `:5000` | up, only `extract` mode registered |
|
||||
| OpenCode fleet | **40 models reachable through one `sk-*` key** (verified live `GET https://opencode.ai/zen/v1/models`) |
|
||||
|
||||
OpenCode catalog (live):
|
||||
- Claude: opus-4-7, opus-4-6, opus-4-5, opus-4-1, sonnet-4-6, sonnet-4-5, sonnet-4, haiku-4-5
|
||||
- GPT-5: 5.5-pro, 5.5, 5.4-pro, 5.4, 5.4-mini, 5.4-nano, 5.3-codex-spark, 5.3-codex, 5.2, 5.2-codex, 5.1-codex-max, 5.1-codex, 5.1-codex-mini, 5.1, 5-codex, 5-nano, 5
|
||||
- Gemini: 3.1-pro, 3-flash
|
||||
- GLM: 5.1, 5
|
||||
- Minimax: m2.7, m2.5
|
||||
- Kimi: k2.6, k2.5
|
||||
- Qwen: 3.6-plus, 3.5-plus
|
||||
- Other: BIG-PKL (was a typo-prone name in the catalog, model id starts with "big-pkl-something")
|
||||
- Free tier: minimax-m2.5-free, hy3-preview-free, ling-2.6-flash-free, trinity-large-preview-free
|
||||
|
||||
### The substrate (frozen — do not re-architect)
|
||||
|
||||
- Distillation v1.0.0 at tag `e7636f2` — **145/145 bun tests pass, 22/22 acceptance, 16/16 audit-full**
|
||||
- Output: `data/_kb/distilled_{facts,procedures,config_hints}.jsonl` + `data/vectors/distilled_{factual,procedural,config_hint}_v20260423102847.parquet`
|
||||
- Auditor cross-lineage: Kimi K2.6 ↔ Haiku 4.5 alternation, Opus auto-promote on diffs >100k chars, **per-PR cap=3 with auto-reset on new head SHA**
|
||||
- Pathway memory: 88 traces, 11/11 successful replays (probation gate crossed)
|
||||
- Mode runner: 5 native modes; `codereview_isolation` is default; composed-corpus auto-downgrade verified Apr 26 (composed lost 5/5 vs isolation, p=0.031)
|
||||
|
||||
### Matrix indexer
|
||||
|
||||
30+ live corpora including:
|
||||
- 5 versions of `workers_500k_v1..v9` (50K embedded chunks each)
|
||||
- 11 batched 2K-row shards `w500k_b3..b17`
|
||||
- `chicago_permits_v1` (3,420), `resumes_100k_v2` (100K candidates), `ethereal_workers_v1` (10K)
|
||||
- `lakehouse_arch_v1` (2,119), `lakehouse_symbols_v1` (2,470), `lakehouse_answers_v1` (1,269), `scrum_findings_v1` (1,260)
|
||||
- `kb_team_runs_v1` (12,693) + `kb_team_runs_agent` (4,407) — LLM-team play history embedded
|
||||
- `distilled_factual_v20260423102507` (8) — distillation output
|
||||
|
||||
### Code health
|
||||
|
||||
- `cargo check --workspace` → **0 warnings, 0 errors**
|
||||
- `bun test auditor + tests/distillation` → **145/145 pass**
|
||||
- `ui/server.ts` + `auditor.ts` bundle clean
|
||||
|
||||
---
|
||||
|
||||
## DO NOT RELITIGATE
|
||||
|
||||
- **PR #11 is merged into `origin/main` as `ed57eda`** — do not "still need to merge PR #11."
|
||||
- **Distillation tag `distillation-v1.0.0` at `e7636f2` is FROZEN** — do not re-architect schemas, scorer rules, audit fixtures.
|
||||
- **Kimi forensic HOLD verdict (2026-04-27) was 2/8 false + 6/8 latent** — do not re-debate, see `reports/kimi/audit-last-week-full.md`.
|
||||
- **`candidates_safe` `vertical` column bug** — fixed at catalog metadata layer in commit `c3c9c21`. Do not "discover" it again.
|
||||
- **Decisions A/B/C/D from `synthetic-data-gap-report.md`** — all four scripts shipped today (`d56f08e`, `940737d`, `c3c9c21`). Do not "ask J for approval."
|
||||
- **`workers_500k.phone` type fixup** — already string. The fixup script is idempotent; running it is a no-op.
|
||||
- **`client_workerskjkk` typo dataset** — was breaking every SQL query (catalog had it registered, file didn't exist). Removed via `DELETE /catalog/datasets/by-name/client_workerskjkk` this session. Do not re-add. Adding a startup gate that errors on unrecognized parquet names is the long-term fix per now.md Step 2C.
|
||||
|
||||
---
|
||||
|
||||
## FIXES MADE THIS SESSION (2026-04-27 evening)
|
||||
|
||||
1. **`crates/gateway/src/v1/iterate.rs:93`** — `state` → `_state` (cleared the one cargo warning).
|
||||
2. **`lakehouse-ui.service` (Dioxus)** — disabled. Was failing 7,242 times against a missing `target/dx/ui/debug/web/public` build dir. `systemctl stop && disable`.
|
||||
3. **VCP UI on `:3950`** — started `bun run ui/server.ts` (PID 1162212, log `/tmp/lakehouse_ui.log`). `/data/*` endpoints now 200 with real data.
|
||||
4. **`client_workerskjkk` catalog entry** — `DELETE /catalog/datasets/by-name/client_workerskjkk` removed the dead manifest. **This was the actual root cause** of `/system/summary` reporting `workers_500k_rows: 0` and the demo showing zero bench. Every SQL query was failing schema inference on the missing file before reaching its target table. Fixed → `workers_500k_rows: 500000`, `candidates_rows: 1000`, demo coverage flipped from "critical 0%" to actual percentages on devop.live/lakehouse.
|
||||
|
||||
## FIXES MADE THIS SESSION (2026-04-28 early — face pool)
|
||||
|
||||
5. **Synthetic StyleGAN face pool — 1000 faces, gender+race+age tagged.** `scripts/staffing/fetch_face_pool.py` fetches from thispersondoesnotexist.com; `scripts/staffing/tag_face_pool.py --min-age 22` runs deepface and excludes minors. `data/headshots/manifest.jsonl` now has gender (494 men / 458 women), race (caucasian 662 · east_asian 128 · hispanic 86 · middle_eastern 59 · black 14 · south_asian 3), age, and 48 minor exclusions. Server pool = 952 servable faces.
|
||||
6. **`mcp-server/index.ts:1308` `/headshots/:key` route** — gender×race×age intersection bucketing with graceful fallback (gender-only → all). Same key always returns same face; different keys spread evenly.
|
||||
7. **`/headshots/_thumbs/` pre-resized 384×384 webp** (60× smaller: 587KB → ~11KB). Without this, 40-card grids overran Chrome's parallel-connection budget and ~75% of tiles never finished decoding. Generated via parallel ffmpeg (`xargs -P 8`); `.gitignore`d.
|
||||
8. **`mcp-server/search.html` + `console.html`** — dropped `img.loading='lazy'`. With 11KB thumbs, eager load is cheap (~500KB for 50 cards) and avoids the off-screen race that lazy decode produced.
|
||||
9. **ComfyUI on-demand uniqueness — `serve_imagegen.py:32`** added `seed` to `_cache_key()` (was caching by prompt only — 3 different worker seeds collapsed to 1 cached image). Verified: seed=839185194/195/196 → 3 distinct md5s.
|
||||
10. **`mcp-server/index.ts:1234` `/headshots/generate/:key`** — ComfyUI hot-path that derives a deterministic-per-worker seed via djb2-style hash; cold ~1.5s, cached ~1ms. Worker prompt format: `professional corporate headshot portrait of a {age}-year-old {race} {gender}, {role}, neutral expression, plain studio background, soft natural lighting, sharp focus, photorealistic, dslr`. Cache at `data/headshots_gen/` (gitignored, regeneratable).
|
||||
11. **Confidence-default name resolution** in `search.html` — `genderFor()` and `guessEthnicityFromFirstName()` lookup tables (FEMALE_NAMES, MALE_NAMES, NAMES_HISPANIC, NAMES_BLACK, NAMES_SOUTH_ASIAN, NAMES_EAST_ASIAN, NAMES_MIDDLE_EASTERN). Xavier → man+hispanic, Aisha → woman+black, etc. Every worker resolves to a face-pool bucket.
|
||||
|
||||
End-to-end verified: playwright run on `https://devop.live/lakehouse/?q=forklift+operators+IL` → 21/21 cards loaded, 0 broken, all 384×384 webp thumbs.
|
||||
|
||||
---
|
||||
|
||||
## OPEN — but not blocking the demo
|
||||
|
||||
| Item | What | When to act |
|
||||
|---|---|---|
|
||||
| `modes.toml` `staffing_inference.matrix_corpus` | still says `workers_500k_v8`. v9 in vector index is from Apr 17 (raw-sourced, not safe-view). The new `build_workers_v9.sh` rebuilds from `workers_safe`. | Run when you have 30+ min for the rebuild. |
|
||||
| Open PRs #6, #7, #10 | sitting since Apr 22-24, auditor verdicts on disk at `data/_auditor/kimi_verdicts/{6,7,10}-*.json` | Read verdicts, decide reconcile/close. |
|
||||
| `test/enrich-prd-pipeline` branch | 35 unmerged commits, includes more-evolved auditor/inference.ts (666 vs main's 580 lines), curation+fact-extractor wiring | Reconcile or formally archive — see `memory/project_unmerged_architecture_work.md`. |
|
||||
| `federation-hnsw-trials` stash | Lance + S3/MinIO prototype, `aws-config` crate added, 708 insertions | Phase B from EXECUTION_PLAN.md — revisit when Parquet vector ceiling actually hurts. |
|
||||
| `candidates` manifest drift | manifest 100K vs SQL 1K. Cosmetic. | Run a metadata resync if it matters. |
|
||||
|
||||
---
|
||||
|
||||
## RUNTIME CHEATSHEET
|
||||
|
||||
```bash
|
||||
# Verify the demo (public + local both work)
|
||||
curl -sS https://devop.live/lakehouse/ # Co-Pilot HTML
|
||||
curl -sS https://devop.live/lakehouse/console # staffers console
|
||||
curl -sS -X POST https://devop.live/lakehouse/intelligence/staffing_forecast \
|
||||
-d '{}' -H 'content-type: application/json' \
|
||||
| jq '.forecast[] | {role, demand_workers, bench_total, coverage_pct, risk}'
|
||||
|
||||
# Restart sequence (after Rust changes)
|
||||
sudo systemctl restart lakehouse.service # gateway :3100
|
||||
sudo systemctl restart lakehouse-auditor # auditor daemon
|
||||
sudo systemctl restart lakehouse-observer # observer :3800
|
||||
# UI bun on :3950 is NOT systemd-managed (lakehouse-ui.service is disabled).
|
||||
# Restart manually: kill <pid>; nohup bun run ui/server.ts > /tmp/lakehouse_ui.log 2>&1 &
|
||||
|
||||
# Health checks
|
||||
curl -sS http://localhost:3100/v1/health | jq # workers_count, providers
|
||||
curl -sS http://localhost:3100/vectors/pathway/stats | jq
|
||||
curl -sS http://localhost:3100/v1/usage | jq # since-restart cost
|
||||
curl -sS http://localhost:3700/system/summary | jq # dataset counts
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## VISION — what we're actually building (not what's done)
|
||||
|
||||
J's framing for the legacy staffing company:
|
||||
|
||||
- Pull live data, anticipate contracts based on Chicago permits → real architect/contractor associations, headcount, time period, money, scope.
|
||||
- Hybrid + memory index → search large corpora cheaply.
|
||||
- Email comes in → verify against contract; SMS comes in → alert when index changes.
|
||||
- Real-time.
|
||||
- Invent metrics nobody else has using the hybrid index.
|
||||
- Next stage: workers download an app → geolocation clock-in → automatic responsiveness measurement, no user effort, with incentives for using it.
|
||||
- Find people getting certificates (passive cert tracking).
|
||||
- Pull union data → bring contracts that work for **employees**, not just employers.
|
||||
- All metrics visible, nothing hidden, value-aligned with what each side actually needs.
|
||||
|
||||
If a future session is shaving away from this vision toward "fix the cutover" or "land Phase X," the vision wins. Phases are scaffolding for the vision, not the goal.
|
||||
|
||||
---
|
||||
|
||||
## CURRENT PLAN — fix the demo for the legacy staffing client
|
||||
|
||||
Built from playwright audit of the live demo (2026-04-27 evening). Each item ends in something the client can SEE, not internal cleanups.
|
||||
|
||||
**Demo state is anchored by git tag `demo-2026-04-27`** (commit `ed57eda`, the merge of PR #11). To restore code state: `git checkout demo-2026-04-27`. To restore runtime state: `DELETE /catalog/datasets/by-name/client_workerskjkk` (catalog hot-fix is not in git).
|
||||
|
||||
### P1 — Search box that actually filters (highest visible impact)
|
||||
|
||||
**Problem:** typing in `#sq` and pressing Enter fires `POST /intelligence/chat` with body `{"message":"<query>"}`. The state (`#sst`) and role (`#srl`) selects are ignored — never sent in the body. So every search returns a generic chat completion, never a SQL+vector hybrid filter against `workers_500k`. That is the "cached/generic response" the client sees.
|
||||
|
||||
**Fix:** in `mcp-server/search.html`, change the search-submit handler to call the real worker search endpoint with `{query, state, role, top_k}`. The MCP `search_workers` tool surface already exists; route the form there. Render returned worker rows in the existing card grid.
|
||||
|
||||
**Done when:** typing "forklift" + state IL + role "Forklift Operator" returns ≤ top_k IL Forklift Operators, and changing state to WI returns different workers.
|
||||
|
||||
### P2 — Contractor-name click → `/contractor` profile page
|
||||
|
||||
**Problem:** clicking a contractor name in any rendered card stays on `/lakehouse/`. URL doesn't change.
|
||||
|
||||
**Fix:** wrap contractor names in `<a href="/contractor?name=<encoded>">`. The page `mcp-server/contractor.html` (14.8 KB, "Contractor Profile · Staffing Co-Pilot") already exists at `/contractor` and the data endpoint `/intelligence/contractor_profile` already returns rich data.
|
||||
|
||||
**Then check contractor.html actually shows:** full history of every record the database has on that contractor + heat map of locations underneath + relevant info (per J 2026-04-27). If the page is incomplete, finish it. Otherwise just wire the link.
|
||||
|
||||
**Done when:** clicking "KACPRZYNSKI, ANDY" opens a profile with: every Chicago permit they're contact_1 or contact_2 on, a leaflet map with markers for each address, and any matched workers from prior placements at their sites.
|
||||
|
||||
### P3 — Substrate signal at the bottom shows the right numbers
|
||||
|
||||
**Problem:** J reports the bottom panel says "playbook memory empty, 80 traces 0 replies." Reality from the live endpoints: `/api/vectors/playbook_memory/stats` = 4,701 entries with embeddings; `/vectors/pathway/stats` = 88 traces, 11/11 replays.
|
||||
|
||||
**Fix:** find the renderer in search.html that builds the substrate signal panel; verify it's hitting the right endpoints and reading the right keys; fix shape mismatches.
|
||||
|
||||
**Done when:** bottom panel shows real numbers (4,701 playbooks, 88 traces, 11/11 replays) and references at least one specific recent operation from the playbook stats sample.
|
||||
|
||||
### P4 — Top nav reflects today's architecture
|
||||
|
||||
**Problem:** Walkthrough/Architecture/Spec/Onboard/Alerts/Workspaces tabs all return 200 but content is from old architecture. Doesn't mention: gateway scratchpad, memory indexer, ranker, mode runner, OpenCode 40-model fleet, distillation substrate, auditor cross-lineage.
|
||||
|
||||
**Fix:** rewrite `mcp-server/proof.html` (or add a single new page "What's running" that replaces Architecture+Spec) to describe what's actually shipped as of `demo-2026-04-27`. Keep one architecture page, drop redundancy. Either complete or hide Onboard/Alerts/Workspaces — J's call which.
|
||||
|
||||
**Done when:** the architecture page tells a non-technical reader, in 2 minutes, what each piece does in coordinator-relatable terms ("intern that read every email", not "3-stage adversarial inference pipeline").
|
||||
|
||||
### P5 — Caching for the project-index build_signal (J flagged unfinished)
|
||||
|
||||
**Problem:** "we never finished our caching for project index build signal it's not pulling new information." Need to find what `build_signal` refers to. Likely a scrum/auditor signal that should rebuild the `lakehouse_arch_v1` corpus on commit but isn't wired to.
|
||||
|
||||
**Fix:** identify the build-signal pipeline (likely in `auditor/` or `crates/vectord/`), wire its emit to a corpus rebuild, verify by making a test commit and watching the new chunk appear in `/vectors/indexes` for `lakehouse_arch_v1`.
|
||||
|
||||
**Done when:** committing a new file to `crates/` causes `lakehouse_arch_v1` chunk_count to increase within N minutes.
|
||||
|
||||
### P0 — Anchor the demo state (DONE)
|
||||
|
||||
Tagged `ed57eda` as `demo-2026-04-27`. Future sessions: `git checkout demo-2026-04-27` to land in this exact code state.
|
||||
|
||||
---
|
||||
|
||||
## EXECUTION ORDER
|
||||
|
||||
1. **P1 first** — biggest visible bug, ~30-60 min
|
||||
2. **P2 next** — contractor click is the second-biggest "doesn't work" the client sees, ~20 min if profile is mostly done
|
||||
3. **P3** — small fix, big "looks alive" win
|
||||
4. **P4** — biggest scope; might split across sessions
|
||||
5. **P5** — feature work, only after the visible bugs are fixed
|
||||
|
||||
Each item commits independently with the format `demo: P<n> — <one-line>` so the commit log doubles as a progress journal. After each merge to main, re-tag `demo-latest` to point at the new HEAD.
|
||||
|
||||
Stop here and let J pick which item to start with. Do not silently extend scope.
|
||||
1000
data/headshots/manifest.jsonl
Normal file
1000
data/headshots/manifest.jsonl
Normal file
File diff suppressed because it is too large
Load Diff
@ -54,8 +54,25 @@ details .body{padding-top:10px;font-size:12px;color:#8b949e}
|
||||
.accent-g{border-left:3px solid #3fb950}
|
||||
.accent-r{border-left:3px solid #f85149}
|
||||
|
||||
.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px}
|
||||
.worker .av{width:28px;height:28px;border-radius:6px;background:#1a2744;display:flex;align-items:center;justify-content:center;font-weight:600;color:#e6edf3;font-size:10px;flex-shrink:0}
|
||||
.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px;border-left:3px solid #30363d}
|
||||
.worker .av{width:32px;height:32px;border-radius:50%;background:#0d1117;border:1px solid #21262d;display:flex;align-items:center;justify-content:center;font-weight:600;color:#c9d1d9;font-size:11px;flex-shrink:0;letter-spacing:0.5px;overflow:hidden;position:relative}
|
||||
.worker .av img{position:absolute;inset:0;width:100%;height:100%;object-fit:cover;display:block;
|
||||
/* Softening — mirror of search.html. Pulls saturation + contrast off
|
||||
the SDXL Turbo over-render so faces feel less "AI-generated".
|
||||
If you tweak one, tweak the other. */
|
||||
filter: saturate(0.86) contrast(0.93) brightness(1.02) blur(0.3px);
|
||||
}
|
||||
.worker[data-role-band="warehouse"]{border-left-color:#58a6ff}
|
||||
.worker[data-role-band="production"]{border-left-color:#d29922}
|
||||
.worker[data-role-band="trades"]{border-left-color:#bc8cff}
|
||||
.worker[data-role-band="driver"]{border-left-color:#3fb950}
|
||||
.worker[data-role-band="lead"]{border-left-color:#f0883e}
|
||||
.role-pill{display:inline-block;font-size:9px;padding:1px 7px;border-radius:3px;background:#0d1117;color:#8b949e;margin-right:6px;font-weight:600;letter-spacing:0.4px;text-transform:uppercase;border-left:2px solid #30363d;vertical-align:1px}
|
||||
.role-pill[data-rb="warehouse"]{border-left-color:#58a6ff;color:#79c0ff}
|
||||
.role-pill[data-rb="production"]{border-left-color:#d29922;color:#e3b341}
|
||||
.role-pill[data-rb="trades"]{border-left-color:#bc8cff;color:#d2a8ff}
|
||||
.role-pill[data-rb="driver"]{border-left-color:#3fb950;color:#56d364}
|
||||
.role-pill[data-rb="lead"]{border-left-color:#f0883e;color:#ffa657}
|
||||
.worker .info{flex:1;min-width:0}
|
||||
.worker .nm{color:#e6edf3;font-weight:500}
|
||||
.worker .why{color:#545d68;font-size:11px;margin-top:1px}
|
||||
@ -199,6 +216,132 @@ var A=location.origin+P;
|
||||
// DOM helpers — all dynamic content goes through these. No innerHTML
|
||||
// anywhere in the script; every API-derived string passes through
|
||||
// textContent so no injection path regardless of upstream data.
|
||||
// Role classification — mirrors search.html, no emojis. Maps role
|
||||
// strings to a band+label used by the worker-card border + role pill.
|
||||
var ROLE_BANDS = [
|
||||
{ match: /forklift|warehouse|associate|material\s*handler|loader|loading|packag|shipping|logistics|inventory|sanitation|janit/i, band: 'warehouse', label: 'Warehouse' },
|
||||
{ match: /production|assembl/i, band: 'production', label: 'Production' },
|
||||
{ match: /welder|weld|electric|maint(enance)?\s*tech|cnc|machine\s*op|hvac|plumb|carpenter|mason/i, band: 'trades', label: 'Skilled Trade' },
|
||||
{ match: /driver|truck|haul|cdl/i, band: 'driver', label: 'Driver' },
|
||||
{ match: /line\s*lead|supervisor|foreman|coordinator/i, band: 'lead', label: 'Lead' },
|
||||
{ match: /quality/i, band: 'production', label: 'Quality' },
|
||||
];
|
||||
function roleBand(role){
|
||||
if(!role) return { band: 'warehouse', label: '' };
|
||||
for (var i = 0; i < ROLE_BANDS.length; i++) {
|
||||
if (ROLE_BANDS[i].match.test(role)) return ROLE_BANDS[i];
|
||||
}
|
||||
return { band: 'warehouse', label: role.split(' ')[0].toUpperCase().slice(0, 12) };
|
||||
}
|
||||
|
||||
// Build a sober worker card: monogram avatar + colored role band on
|
||||
// the left edge + uppercase role pill in the detail line. Used by
|
||||
// every chapter that renders worker rows. `name` and `role` drive the
|
||||
// classification; `detail` is the full text after the pill.
|
||||
// Quick first-name → gender hint for face-pool selection. Same lookup
|
||||
// idea as the dashboard; if the name is unknown, the server falls back
|
||||
// to the full pool. Trimmed table — covers the most common names that
|
||||
// appear in the synthetic worker data.
|
||||
var FEMALE_NAMES = new Set(['Mary','Patricia','Jennifer','Linda','Elizabeth','Barbara','Susan','Jessica','Sarah','Karen','Lisa','Nancy','Betty','Sandra','Margaret','Ashley','Kimberly','Emily','Donna','Michelle','Carol','Amanda','Melissa','Deborah','Stephanie','Dorothy','Rebecca','Sharon','Laura','Cynthia','Amy','Kathleen','Angela','Shirley','Brenda','Emma','Anna','Pamela','Nicole','Samantha','Katherine','Christine','Helen','Debra','Rachel','Carolyn','Janet','Maria','Catherine','Heather','Diane','Olivia','Julie','Joyce','Victoria','Ruth','Virginia','Lauren','Kelly','Christina','Joan','Evelyn','Judith','Andrea','Hannah','Megan','Cheryl','Jacqueline','Martha','Madison','Teresa','Gloria','Sara','Janice','Ann','Kathryn','Abigail','Sophia','Frances','Jean','Alice','Judy','Isabella','Julia','Grace','Amber','Denise','Danielle','Marilyn','Beverly','Charlotte','Natalie','Theresa','Diana','Brittany','Kayla','Alexis','Lori','Marie','Carmen','Aisha','Rosa','Mia','Audrey','Erin','Tina','Vanessa','Tara','Wendy','Tanya','Maya','Crystal','Yvonne','Kara','Shannon','Brianna','Faith','Caroline','Carla','Tracey','Tracy','Rita','Dawn','Tiffany','Stacy','Stacey','Gina','Bonnie','Tammy','Joanne','Jamie','Tonya','Alyssa','Ariana','Elena','Ellie','Erica','Erika','Felicia','Holly','Jenna','Jenny','Krista','Kristen','Kristin','Krystal','Lana','Leah','Lucy','Mallory','Melinda','Meredith','Misty','Monica','Naomi','Paige','Paula','Renee','Rhonda','Robin','Roxanne','Selena','Sierra','Skylar','Sonia','Stella','Tamara','Veronica','Vivian','Whitney','Yolanda','Zoe']);
|
||||
var MALE_NAMES = new Set(['James','Robert','John','Michael','David','William','Richard','Joseph','Thomas','Charles','Christopher','Daniel','Matthew','Anthony','Mark','Donald','Steven','Paul','Andrew','Joshua','Kenneth','Kevin','Brian','George','Edward','Ronald','Timothy','Jason','Jeffrey','Ryan','Jacob','Gary','Nicholas','Eric','Jonathan','Stephen','Larry','Justin','Scott','Brandon','Benjamin','Samuel','Gregory','Frank','Alexander','Raymond','Patrick','Jack','Dennis','Jerry','Tyler','Aaron','Jose','Adam','Henry','Nathan','Douglas','Zachary','Peter','Kyle','Walter','Ethan','Jeremy','Harold','Keith','Christian','Roger','Noah','Gerald','Carl','Terry','Sean','Austin','Arthur','Lawrence','Jesse','Dylan','Bryan','Joe','Jordan','Billy','Bruce','Albert','Willie','Gabriel','Logan','Alan','Juan','Wayne','Roy','Ralph','Randy','Eugene','Vincent','Russell','Elijah','Louis','Bobby','Philip','Johnny','Marcus','Antonio','Carlos','Diego','Hector','Jorge','Julio','Manuel','Miguel','Pedro','Raul','Ricardo','Roberto','Sergio','Victor','Jamal','Xavier','DeShawn','Dwayne','Jermaine','Malik','Tyrone','Devon','Andre','Brent','Calvin','Casey','Cody','Cole','Cory','Dale','Damon','Darius','Darrell','Dean','Derek','Drew','Earl','Eddie','Floyd','Glenn','Greg','Howard','Ivan','Jared','Jay','Jeff','Joel','Lance','Lee','Leonard','Lloyd','Mario','Martin','Mason','Maurice','Max','Mitchell','Morgan','Nick','Norman','Oliver','Owen','Pete','Quincy','Rafael','Reggie','Rex','Ricky','Russ','Shane','Shaun','Stanley','Steve','Theodore','Todd','Travis','Trevor','Troy','Wade','Warren','Wesley']);
|
||||
function guessGenderFromFirstName(n){
|
||||
if(!n) return null;
|
||||
var clean=n.replace(/[^A-Za-z]/g,'');
|
||||
if(!clean) return null;
|
||||
var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
|
||||
if(FEMALE_NAMES.has(c)) return 'woman';
|
||||
if(MALE_NAMES.has(c)) return 'man';
|
||||
return null;
|
||||
}
|
||||
function genderFor(name){
|
||||
var g = guessGenderFromFirstName(name);
|
||||
if(g) return g;
|
||||
if(!name) return 'man';
|
||||
var s=String(name); var h=0;
|
||||
for(var i=0;i<s.length;i++) h=(h*31+s.charCodeAt(i))|0;
|
||||
return (Math.abs(h)&1)?'man':'woman';
|
||||
}
|
||||
// Confident first-name → ethnicity. Synthetic data — we own the call.
|
||||
var NAMES_SOUTH_ASIAN_C=new Set(['Raj','Anil','Rohan','Vikram','Arjun','Sanjay','Ravi','Krishna','Pradeep','Sunil','Amit','Deepak','Ashok','Manoj','Rahul','Vijay','Suresh','Naveen','Anand','Nikhil','Aditya','Karan','Rajesh','Priya','Anjali','Neha','Kavya','Pooja','Divya','Meera','Lakshmi','Rani','Asha','Saanvi','Aanya','Aaradhya','Shreya','Riya','Tanvi','Ishita','Aarav','Ishaan','Shivani']);
|
||||
var NAMES_EAST_ASIAN_C=new Set(['Wei','Mei','Yi','Jin','Chen','Lin','Liu','Wang','Zhang','Yang','Wu','Zhao','Sun','Hiroshi','Yuki','Akira','Kenji','Sakura','Aiko','Haruto','Sora','Hyun','Eun','Yoon','Kai','Long','Hong','Xiu','Lan','Hua','Hao','Tao','Bao','Cheng','Feng','Jian','Dong','Bin','Min','Lei','Hui','Yu','Xin','Ying','Zhen','Yuan','Yan']);
|
||||
var NAMES_HISPANIC_C=new Set(['Carmen','Carlos','Maria','Diego','Hector','Jorge','Julio','Manuel','Miguel','Pedro','Raul','Ricardo','Roberto','Sergio','Antonio','Esperanza','Luz','Sofia','Lucia','Isabella','Camila','Valentina','Mariana','Elena','Rosa','Catalina','Esteban','Fernando','Eduardo','Javier','Alejandro','Andres','Mateo','Santiago','Sebastian','Emilio','Tomas','Cristina','Daniela','Gabriela','Ximena','Adriana','Beatriz','Pilar','Mercedes','Xavier','Marisol','Guadalupe','Lupita','Inez','Itzel','Yesenia','Joaquin','Ignacio','Rafael','Salvador','Cesar','Arturo','Armando','Hugo','Marco','Alejandra','Felipe','Gerardo','Jaime','Leonardo','Luis','Pablo','Ramon']);
|
||||
var NAMES_BLACK_C=new Set(['DeShawn','Jamal','Aisha','Latoya','Tyrone','Malik','Imani','Keisha','Tariq','Lakisha','Kenya','Tamika','Andre','Marcus','Demetrius','Jermaine','Reggie','Tyrese','Darius','Trevon','Kareem','Damon','Jalen','Jaylen','Dwayne','DaQuan','Aaliyah','Kiara','Janelle','Jasmine','Tanisha','Maurice','Tyrell','Kwame','Khalil','Terrell','Cedric','Nia','Zuri','Jada','Ebony','Dominique']);
|
||||
var NAMES_MIDDLE_EASTERN_C=new Set(['Layla','Omar','Khalid','Fatima','Yasmin','Hassan','Hussein','Ahmed','Mohamed','Mohammed','Ali','Karim','Yusuf','Yara','Nadia','Zainab','Rania','Samira','Mariam','Salma','Ibrahim','Mahmoud','Saif','Anwar','Bilal','Faisal','Hamza','Imran','Sami','Wael','Zaid','Amira','Iman','Lina','Mona','Noor','Rana','Soha','Zara']);
|
||||
// Surname → ethnicity. Surname is more diagnostic than first name
|
||||
// for hispanic and asian — "Anna Cruz" is hispanic via surname.
|
||||
var SURNAMES_HISPANIC_C=new Set(['Garcia','Rodriguez','Martinez','Hernandez','Lopez','Gonzalez','Perez','Sanchez','Ramirez','Torres','Flores','Rivera','Gomez','Diaz','Reyes','Cruz','Morales','Ortiz','Gutierrez','Chavez','Ramos','Ruiz','Alvarez','Mendoza','Vasquez','Castillo','Jimenez','Moreno','Romero','Herrera','Medina','Aguilar','Vargas','Castro','Fernandez','Guzman','Munoz','Salazar','Ortega','Delgado','Estrada','Ayala','Pena','Cabrera','Alvarado','Espinoza','Padilla','Cardenas','Cortes','Ibarra','Vega','Soto','Lara','Navarro','Campos','Acosta','Rios','Marquez','Sandoval','Maldonado','Solis','Rojas','Mejia','Beltran','Cervantes','Lozano','Carrillo','Trevino','Robles','Tapia','Lugo']);
|
||||
var SURNAMES_SOUTH_ASIAN_C=new Set(['Patel','Singh','Kumar','Sharma','Gupta','Shah','Mehta','Desai','Joshi','Reddy','Nair','Iyer','Verma','Agarwal','Kapoor','Chopra','Malhotra','Banerjee','Chatterjee','Mukherjee','Das','Sen','Bose','Roy','Sinha','Trivedi','Pandey','Mishra','Tiwari','Yadav','Chauhan','Rana','Thakur','Pillai','Menon','Krishnan','Rao','Naidu','Pradhan','Acharya','Devi','Kaur']);
|
||||
var SURNAMES_EAST_ASIAN_C=new Set(['Chen','Wang','Li','Liu','Yang','Huang','Zhao','Wu','Zhou','Xu','Zhu','Sun','Ma','Lin','Lee','Kim','Park','Choi','Jung','Kang','Cho','Yoon','Han','Lim','Oh','Nakamura','Tanaka','Suzuki','Yamamoto','Sato','Watanabe','Takahashi','Kobayashi','Yoshida','Saito','Nguyen','Tran','Le','Pham','Hoang','Phan','Vu','Vo','Dang','Bui','Do','Ngo','Truong','Mai','Cao','Wong','Tang','Tan','Cheng','Lau','Leung','Ng','Cheung','Yip','Hsu','Tsai','Hsieh']);
|
||||
var SURNAMES_MIDDLE_EASTERN_C=new Set(['Khan','Ahmed','Hussein','Hassan','Ali','Mahmoud','Mohamed','Mohammed','Saleh','Aziz','Karim','Hamad','Najjar','Haddad','Khoury','Mansour','Rahman','Iqbal','Malik','Sheikh','Siddiqui','Qureshi','Saeed']);
|
||||
|
||||
function guessEthnicityFromName(first, last){
|
||||
if(last){
|
||||
var s=last.replace(/[^A-Za-z]/g,'');
|
||||
if(s){
|
||||
var sc=s[0].toUpperCase()+s.slice(1).toLowerCase();
|
||||
if(SURNAMES_HISPANIC_C.has(sc)) return 'hispanic';
|
||||
if(SURNAMES_MIDDLE_EASTERN_C.has(sc)) return 'middle_eastern';
|
||||
if(SURNAMES_SOUTH_ASIAN_C.has(sc)) return 'south_asian';
|
||||
if(SURNAMES_EAST_ASIAN_C.has(sc)) return 'east_asian';
|
||||
}
|
||||
}
|
||||
if(first){
|
||||
var clean=first.replace(/[^A-Za-z]/g,'');
|
||||
if(clean){
|
||||
var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
|
||||
if(NAMES_MIDDLE_EASTERN_C.has(c)) return 'middle_eastern';
|
||||
if(NAMES_BLACK_C.has(c)) return 'black';
|
||||
if(NAMES_HISPANIC_C.has(c)) return 'hispanic';
|
||||
if(NAMES_SOUTH_ASIAN_C.has(c)) return 'south_asian';
|
||||
if(NAMES_EAST_ASIAN_C.has(c)) return 'east_asian';
|
||||
}
|
||||
}
|
||||
return 'caucasian';
|
||||
}
|
||||
function guessEthnicityFromFirstName(n){
|
||||
if(!n) return 'caucasian';
|
||||
var clean=n.replace(/[^A-Za-z]/g,''); if(!clean) return 'caucasian';
|
||||
var c=clean[0].toUpperCase()+clean.slice(1).toLowerCase();
|
||||
if(NAMES_MIDDLE_EASTERN_C.has(c)) return 'middle_eastern';
|
||||
if(NAMES_BLACK_C.has(c)) return 'black';
|
||||
if(NAMES_HISPANIC_C.has(c)) return 'hispanic';
|
||||
if(NAMES_SOUTH_ASIAN_C.has(c)) return 'south_asian';
|
||||
if(NAMES_EAST_ASIAN_C.has(c)) return 'east_asian';
|
||||
return 'caucasian';
|
||||
}
|
||||
|
||||
function workerRow(name, role, detail, opts){
|
||||
opts = opts || {};
|
||||
var band = roleBand(role||'');
|
||||
var w = el('div','worker');
|
||||
if(band.band) w.dataset.roleBand = band.band;
|
||||
var initials = (name||'?').split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
|
||||
var av = el('div','av',initials);
|
||||
// Headshot insertion removed 2026-04-28. The .av element stays as
|
||||
// a monogram-initials avatar.
|
||||
w.appendChild(av);
|
||||
var info = el('div','info');
|
||||
var nm = el('div','nm', name||'?');
|
||||
if(opts.endorsed){
|
||||
nm.appendChild(el('span','boost-chip',opts.endorsed));
|
||||
}
|
||||
info.appendChild(nm);
|
||||
var why = el('div','why');
|
||||
if(band.label){
|
||||
var pill = document.createElement('span'); pill.className='role-pill';
|
||||
pill.dataset.rb = band.band;
|
||||
pill.textContent = band.label;
|
||||
why.appendChild(pill);
|
||||
}
|
||||
why.appendChild(document.createTextNode(detail||''));
|
||||
info.appendChild(why);
|
||||
w.appendChild(info);
|
||||
if(opts.score){
|
||||
w.appendChild(el('div','score', opts.score));
|
||||
}
|
||||
return w;
|
||||
}
|
||||
|
||||
function el(tag, cls, text){
|
||||
var e=document.createElement(tag);
|
||||
if(cls) e.className=cls;
|
||||
@ -380,21 +523,13 @@ function loadChapter4(){
|
||||
|
||||
var list=document.createElement('div');list.style.marginTop='6px';
|
||||
(prop.candidates||[]).slice(0,5).forEach(function(cand,i){
|
||||
var w=el('div','worker');
|
||||
var initials=(cand.name||'?').split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
|
||||
w.appendChild(el('div','av',initials));
|
||||
var info=el('div','info');
|
||||
var nm=el('div','nm',cand.name||cand.doc_id||'?');
|
||||
if((cand.playbook_boost||0)>0){
|
||||
var ncit=(cand.playbook_citations||[]).length;
|
||||
nm.appendChild(el('span','boost-chip','Endorsed · '+ncit+' past fill'+(ncit!==1?'s':'')));
|
||||
}
|
||||
info.appendChild(nm);
|
||||
var why=cand.doc_id+' · '+(cand.playbook_boost>0?'boosted +'+cand.playbook_boost.toFixed(3)+' by memory · ':'')+'semantic score '+(cand.score||0).toFixed(3);
|
||||
info.appendChild(el('div','why',why));
|
||||
w.appendChild(info);
|
||||
w.appendChild(el('div','score','#'+(i+1)));
|
||||
list.appendChild(w);
|
||||
var detail = cand.doc_id+' · '+(cand.playbook_boost>0?'boosted +'+cand.playbook_boost.toFixed(3)+' by memory · ':'')+'semantic score '+(cand.score||0).toFixed(3);
|
||||
var endorsed = (cand.playbook_boost||0) > 0
|
||||
? 'Endorsed · '+((cand.playbook_citations||[]).length)+' past fill'+((cand.playbook_citations||[]).length!==1?'s':'')
|
||||
: null;
|
||||
list.appendChild(workerRow(cand.name||cand.doc_id||'?', prop.role||'', detail, {
|
||||
endorsed: endorsed, score: '#'+(i+1)
|
||||
}));
|
||||
});
|
||||
card.appendChild(list);
|
||||
|
||||
@ -628,12 +763,8 @@ function loadChapter8(){
|
||||
bfHdr.textContent='✓ '+d.backfills.length+' local '+(d.worker.role||'workers')+' available — sorted by responsiveness';
|
||||
host.appendChild(bfHdr);
|
||||
d.backfills.slice(0,5).forEach(function(c){
|
||||
var row=el('div','row');
|
||||
var left=document.createElement('div');left.style.flex='1';left.style.minWidth='0';
|
||||
left.appendChild(el('div','title',c.name));
|
||||
left.appendChild(el('div','meta',(c.role||'?')+' · '+(c.city||'')+', '+(c.state||'')+' · rel '+Math.round((c.rel||0)*100)+'% · resp '+Math.round((c.resp||0)*100)+'%'));
|
||||
row.appendChild(left);
|
||||
host.appendChild(row);
|
||||
var detail=(c.role||'?')+' · '+(c.city||'')+', '+(c.state||'')+' · rel '+Math.round((c.rel||0)*100)+'% · resp '+Math.round((c.resp||0)*100)+'%';
|
||||
host.appendChild(workerRow(c.name||'?', c.role||'', detail));
|
||||
});
|
||||
}
|
||||
var narr=el('div','narr');
|
||||
@ -675,23 +806,16 @@ function runTry(){
|
||||
|
||||
var workers=d.sql_results||d.vector_results||d.results||[];
|
||||
workers.slice(0,5).forEach(function(w,i){
|
||||
var row=el('div','worker');
|
||||
var nm=w.name||(w.text||'').split('—')[0].trim()||w.doc_id||'?';
|
||||
var initials=nm.split(' ').map(function(s){return (s[0]||'').toUpperCase()}).join('').substring(0,2);
|
||||
row.appendChild(el('div','av',initials));
|
||||
var info=el('div','info');
|
||||
var n=el('div','nm',nm);
|
||||
if((w.playbook_boost||0)>0){
|
||||
n.appendChild(el('span','boost-chip','Endorsed · '+((w.playbook_citations||[]).length||'?')+' past fill(s)'));
|
||||
}
|
||||
info.appendChild(n);
|
||||
var bits=[];
|
||||
if(w.role) bits.push(w.role);
|
||||
if(w.city&&w.state) bits.push(w.city+', '+w.state);
|
||||
if(w.rel!==undefined) bits.push('reliability '+Math.round(w.rel*100)+'%');
|
||||
if(w.avail!==undefined) bits.push('availability '+Math.round(w.avail*100)+'%');
|
||||
info.appendChild(el('div','why',bits.join(' · ')||'AI semantic match'));
|
||||
row.appendChild(info);
|
||||
var endorsed = (w.playbook_boost||0) > 0
|
||||
? 'Endorsed · '+((w.playbook_citations||[]).length||'?')+' past fill(s)'
|
||||
: null;
|
||||
var row = workerRow(nm, w.role||'', bits.join(' · ')||'AI semantic match', { endorsed: endorsed });
|
||||
row.appendChild(el('div','score','#'+(i+1)));
|
||||
card.appendChild(row);
|
||||
});
|
||||
|
||||
123
mcp-server/icon_recipes.ts
Normal file
123
mcp-server/icon_recipes.ts
Normal file
@ -0,0 +1,123 @@
|
||||
// Visual filler iconography rendered through ComfyUI. Distinct from
|
||||
// role_scenes.ts (which renders portraits) — these are object/badge
|
||||
// style renders that fill dead space on worker cards: cert pills,
|
||||
// role-prop chips, hazard indicators, empty-state heroes.
|
||||
//
|
||||
// Layout on disk:
|
||||
// data/icons_pool/{category}/{slug}.webp
|
||||
//
|
||||
// Cache invalidation:
|
||||
// ICONS_VERSION mixes into the on-disk filename (slug includes
|
||||
// version). Bump it after editing a recipe so prior renders are
|
||||
// ignored on next view.
|
||||
|
||||
export type IconCategory = "cert" | "role_prop" | "status" | "hazard" | "empty";
|
||||
|
||||
export interface IconRecipe {
|
||||
slug: string;
|
||||
category: IconCategory;
|
||||
// Text label that appears next to / under the icon. The front-end
|
||||
// already renders this text in cert pills; the icon is supplementary.
|
||||
display: string;
|
||||
// Full diffusion prompt. Style guidance baked in. SDXL Turbo at 8
|
||||
// steps reliably produces clean macro photography, so default to
|
||||
// photographic prop shots over flat-vector illustrations (the model
|
||||
// hallucinates noise into flat-vector geometry at low step counts).
|
||||
prompt: string;
|
||||
// Negative prompt — what NOT to render. Crucial for icons because
|
||||
// SDXL likes to add hands/text/people unprompted.
|
||||
negative?: string;
|
||||
}
|
||||
|
||||
// Default negative prompt baked into every icon render unless the
|
||||
// recipe overrides. Empirically, these terms are the top SDXL Turbo
|
||||
// off-style failures.
|
||||
export const DEFAULT_NEGATIVE =
|
||||
"people, hands, faces, blurry, low quality, watermark, signature, "
|
||||
+ "logos, copyright, distorted text, garbled letters, multiple objects";
|
||||
|
||||
// TODO J — review and tune the prompts here. Each one is what diffusion
|
||||
// sees verbatim. The visual decision: photographic prop shots (macro
|
||||
// photo of an actual badge / placard / sticker) vs flat-icon vector
|
||||
// style. Default below is photographic — matches the worker headshot
|
||||
// aesthetic. Flip a recipe to flat-vector by replacing "macro photograph"
|
||||
// with "flat icon illustration on solid color background, minimal vector".
|
||||
//
|
||||
// Visual cues that work well in SDXL Turbo at 8 steps:
|
||||
// - "macro photograph", "isolated on plain background", "studio lighting"
|
||||
// - Concrete colors ("orange and black warning diamond") not adjectives
|
||||
// - Avoid: small text in the prompt (model garbles it), specific brand
|
||||
// names (creates fake logos), detailed scene composition
|
||||
const CERT_ICONS: IconRecipe[] = [
|
||||
{ slug: "osha-10", category: "cert", display: "OSHA-10",
|
||||
prompt: "macro photograph of a circular yellow safety badge with a black hard hat icon at center, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "osha-30", category: "cert", display: "OSHA-30",
|
||||
prompt: "macro photograph of a circular orange safety badge with a black hard hat icon at center, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "first-aid-cpr", category: "cert", display: "First Aid/CPR",
|
||||
prompt: "macro photograph of a small enamel pin badge featuring a bold red cross on a white circular background, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "hazmat", category: "cert", display: "Hazmat",
|
||||
prompt: "macro photograph of a HAZMAT warning placard, bold orange and black diamond shape with a flame icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "forklift", category: "cert", display: "Forklift",
|
||||
prompt: "macro photograph of a yellow industrial forklift safety badge with a forklift silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "reach-truck", category: "cert", display: "Reach Truck",
|
||||
prompt: "macro photograph of a navy blue industrial certification badge with a warehouse reach-truck silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "order-picker", category: "cert", display: "Order Picker",
|
||||
prompt: "macro photograph of a green industrial certification badge with a warehouse order-picker silhouette icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "lockout-tagout", category: "cert", display: "Lockout/Tagout",
|
||||
prompt: "macro photograph of a bright red padlock tag with a danger warning, hanging on a metal industrial valve, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "msds", category: "cert", display: "MSDS",
|
||||
prompt: "macro photograph of a folded chemical safety data sheet booklet with chemical hazard pictograms visible on cover, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "confined-space", category: "cert", display: "Confined Space",
|
||||
prompt: "macro photograph of a yellow confined space warning sign featuring a manhole entry icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "servsafe", category: "cert", display: "ServSafe",
|
||||
prompt: "macro photograph of a dark green food safety certification badge featuring a stylized chef hat icon, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "fire-safety", category: "cert", display: "Fire Safety",
|
||||
prompt: "macro photograph of a red enamel pin badge featuring a flame icon and a fire extinguisher silhouette, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "iso-9001", category: "cert", display: "ISO 9001",
|
||||
prompt: "macro photograph of a deep blue circular quality-management certification seal with embossed metallic ring, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
];
|
||||
|
||||
// Role-band visual chips — small icons that go in the role pill area.
|
||||
// One per band, optional inline supplement to the existing colored pill.
|
||||
const ROLE_PROP_ICONS: IconRecipe[] = [
|
||||
{ slug: "warehouse", category: "role_prop", display: "Warehouse",
|
||||
prompt: "macro photograph of a yellow hard hat with a high-visibility safety vest folded behind it, isolated on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "production", category: "role_prop", display: "Production",
|
||||
prompt: "macro photograph of a navy blue work shirt and protective safety glasses on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "trades", category: "role_prop", display: "Trades",
|
||||
prompt: "macro photograph of a leather work glove and a small adjustable wrench on a neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "driver", category: "role_prop", display: "Driver",
|
||||
prompt: "macro photograph of a navy delivery driver baseball cap and a clipboard manifest on a neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
{ slug: "lead", category: "role_prop", display: "Lead",
|
||||
prompt: "macro photograph of a tablet showing a bar chart and a high-vis vest folded beside it on neutral grey backdrop, photorealistic, sharp focus, studio lighting" },
|
||||
];
|
||||
|
||||
export const ICONS: Record<string, IconRecipe> = Object.fromEntries(
|
||||
[...CERT_ICONS, ...ROLE_PROP_ICONS].map((r) => [`${r.category}/${r.slug}`, r]),
|
||||
);
|
||||
|
||||
// v2 — 256×256 canvas, intended to be displayed monochrome via CSS
|
||||
// `filter: grayscale(1)`. Smaller canvas, tighter crops, crisper at
|
||||
// 14px display size.
|
||||
export const ICONS_VERSION = "v2";
|
||||
|
||||
// Map a free-form cert string from the data ("First Aid/CPR",
|
||||
// "OSHA-10", "Lockout/Tagout") to the canonical slug used here.
|
||||
// Returns null if no recipe matches.
|
||||
export function certToSlug(cert: string): string | null {
|
||||
const c = (cert || "").trim().toLowerCase().replace(/\s+/g, "-");
|
||||
if (c === "osha-10") return "osha-10";
|
||||
if (c === "osha-30") return "osha-30";
|
||||
if (c.startsWith("first") || c.includes("cpr")) return "first-aid-cpr";
|
||||
if (c === "hazmat" || c.startsWith("hazwoper")) return "hazmat";
|
||||
if (c === "forklift" || c.startsWith("pit")) return "forklift";
|
||||
if (c.startsWith("reach")) return "reach-truck";
|
||||
if (c.startsWith("order")) return "order-picker";
|
||||
if (c.startsWith("lockout") || c.includes("tagout")) return "lockout-tagout";
|
||||
if (c === "msds" || c.startsWith("ghs")) return "msds";
|
||||
if (c.startsWith("confined")) return "confined-space";
|
||||
if (c === "servsafe") return "servsafe";
|
||||
if (c.startsWith("fire")) return "fire-safety";
|
||||
if (c.startsWith("iso")) return "iso-9001";
|
||||
return null;
|
||||
}
|
||||
@ -19,6 +19,8 @@ import { SSEServerTransport } from "@modelcontextprotocol/sdk/server/sse.js";
|
||||
import { z } from "zod";
|
||||
import { startTrace, logSpan, logGeneration, scoreTrace, flush as flushTraces } from "./tracing.js";
|
||||
import { buildPermitBrief } from "./entity.js";
|
||||
import { roleBand, SCENES, SCENES_VERSION, FACE_RENDER_DIM, type RoleBand } from "./role_scenes.js";
|
||||
import { ICONS, ICONS_VERSION, DEFAULT_NEGATIVE, certToSlug, type IconRecipe } from "./icon_recipes.js";
|
||||
|
||||
const BASE = process.env.LAKEHOUSE_URL || "http://localhost:3100";
|
||||
const PORT = parseInt(process.env.MCP_PORT || "3700");
|
||||
@ -1225,6 +1227,358 @@ async function main() {
|
||||
// OSHA national, Chicago history, ticker chart, parent link,
|
||||
// federal contracts, debarment, unions, training. Click any
|
||||
// contractor name in a permit Entity Brief to land here.
|
||||
// ComfyUI-generated portrait — every call is unique by (key,
|
||||
// gender, race, age, role) tuple. First hit takes ~1.5s on
|
||||
// the A4000; subsequent hits read from disk. Use this for
|
||||
// contractor / profile modal where one worker gets the
|
||||
// spotlight. NB: declared BEFORE the pool route so the prefix
|
||||
// match doesn't intercept it.
|
||||
// Single source of truth for the pre-render script. Read
|
||||
// role_scenes.ts SCENES + SCENES_VERSION so a Python pre-render
|
||||
// job (scripts/staffing/render_role_pool.py) builds the role-
|
||||
// aware pool with the exact prompts the server will use on the
|
||||
// ComfyUI hot-path. No drift.
|
||||
if (url.pathname === "/headshots/_scenes" && req.method === "GET") {
|
||||
return Response.json({ version: SCENES_VERSION, scenes: SCENES });
|
||||
}
|
||||
|
||||
// Single source of truth for icon_recipes.ts. Used by the
|
||||
// pre-render script (scripts/staffing/render_icons.py) and any
|
||||
// tooling that wants to enumerate available icons.
|
||||
if (url.pathname === "/icons/_recipes" && req.method === "GET") {
|
||||
return Response.json({
|
||||
version: ICONS_VERSION,
|
||||
default_negative: DEFAULT_NEGATIVE,
|
||||
recipes: ICONS,
|
||||
});
|
||||
}
|
||||
|
||||
// Free-text cert resolver: front-end passes the raw cert string
|
||||
// from the data ("First Aid/CPR", "OSHA-10", "Lockout/Tagout")
|
||||
// and we resolve to a recipe slug + 302 to the cached/rendered
|
||||
// icon. Returns 404 (not error) when no recipe matches — the
|
||||
// front-end can hang an `onerror="this.remove()"` to silently
|
||||
// drop the img tag for unrecognized certs.
|
||||
if (url.pathname === "/icons/cert" && req.method === "GET") {
|
||||
const text = url.searchParams.get("text") || "";
|
||||
const slug = certToSlug(text);
|
||||
if (!slug) return new Response(`no recipe for cert: ${text}`, { status: 404 });
|
||||
return new Response(null, {
|
||||
status: 302,
|
||||
headers: { "Location": `/icons/render/cert/${slug}` },
|
||||
});
|
||||
}
|
||||
|
||||
// Cert / role-prop / status / hazard / empty icons. Lookup is
|
||||
// category/slug; on cache miss the route renders via ComfyUI.
|
||||
// Filename layout: data/icons_pool/{category}/{slug}_{version}.webp
|
||||
// — the version suffix means editing a recipe yields a new file
|
||||
// rather than overwriting in place, so a misfire is recoverable.
|
||||
if (url.pathname.startsWith("/icons/render/") && req.method === "GET") {
|
||||
const rest = url.pathname.slice("/icons/render/".length);
|
||||
const recipe: IconRecipe | undefined = ICONS[rest];
|
||||
if (!recipe) return new Response(`unknown icon: ${rest}`, { status: 404 });
|
||||
const ICONS_DIR = "/home/profit/lakehouse/data/icons_pool";
|
||||
await Bun.$`mkdir -p ${ICONS_DIR}/${recipe.category}`.quiet();
|
||||
const cachePath = `${ICONS_DIR}/${recipe.category}/${recipe.slug}_${ICONS_VERSION}.webp`;
|
||||
const cached = Bun.file(cachePath);
|
||||
if (await cached.exists()) {
|
||||
return new Response(cached, {
|
||||
headers: {
|
||||
"Content-Type": "image/webp",
|
||||
"Cache-Control": "public, max-age=86400",
|
||||
"X-Icon-Source": "cached",
|
||||
"X-Icon-Recipe": recipe.slug,
|
||||
},
|
||||
});
|
||||
}
|
||||
// Deterministic seed per recipe — same recipe always renders
|
||||
// the same icon. Mixing the version means SCENES_VERSION-
|
||||
// style invalidation works for icons too.
|
||||
const seedStr = `${recipe.category}|${recipe.slug}|${ICONS_VERSION}`;
|
||||
let seed = 5381;
|
||||
for (let i = 0; i < seedStr.length; i++) seed = ((seed << 5) + seed + seedStr.charCodeAt(i)) | 0;
|
||||
seed = Math.abs(seed) % 2147483647;
|
||||
try {
|
||||
const genResp = await fetch("http://localhost:3600/generate", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({
|
||||
prompt: recipe.prompt,
|
||||
negative_prompt: recipe.negative ?? DEFAULT_NEGATIVE,
|
||||
// 256×256 — smaller canvas = cleaner icon. SDXL Turbo
|
||||
// at 8 steps adds visible texture/noise into 512² that
|
||||
// looks "AI" at small display sizes; tightening to 256
|
||||
// both renders ~3× faster and produces crisper edges
|
||||
// when the front-end downsamples to 14px.
|
||||
width: 256,
|
||||
height: 256,
|
||||
steps: 8,
|
||||
seed,
|
||||
}),
|
||||
signal: AbortSignal.timeout(30000),
|
||||
});
|
||||
if (!genResp.ok) return new Response(`gen failed: ${genResp.status}`, { status: 502 });
|
||||
const data: any = await genResp.json();
|
||||
if (!data.image) return new Response("no image returned", { status: 502 });
|
||||
const bytes = Uint8Array.from(atob(data.image), (c) => c.charCodeAt(0));
|
||||
await Bun.write(cachePath, bytes);
|
||||
return new Response(bytes, {
|
||||
headers: {
|
||||
"Content-Type": "image/webp",
|
||||
"Cache-Control": "public, max-age=86400",
|
||||
"X-Icon-Source": "fresh",
|
||||
"X-Icon-Recipe": recipe.slug,
|
||||
"X-Icon-Gen-Ms": String(data.time_ms || 0),
|
||||
},
|
||||
});
|
||||
} catch (e: any) {
|
||||
return new Response(`gen error: ${e.message}`, { status: 502 });
|
||||
}
|
||||
}
|
||||
|
||||
if (url.pathname.startsWith("/headshots/generate/") && req.method === "GET") {
|
||||
const key = decodeURIComponent(url.pathname.slice("/headshots/generate/".length));
|
||||
if (!key) return new Response("missing key", { status: 400 });
|
||||
const g = (url.searchParams.get("g") || "person").toLowerCase();
|
||||
const r = (url.searchParams.get("e") || "").toLowerCase();
|
||||
const role = (url.searchParams.get("role") || "warehouse worker").toLowerCase();
|
||||
const age = parseInt(url.searchParams.get("age") || "32", 10) || 32;
|
||||
const band = roleBand(role);
|
||||
// SCENES_VERSION mixes into the cache key so editing
|
||||
// role_scenes.ts auto-invalidates prior renders — coordinator
|
||||
// tweaks the warehouse prompt, every warehouse face refreshes
|
||||
// on next view.
|
||||
const cacheKey = await crypto.subtle.digest(
|
||||
"SHA-256",
|
||||
new TextEncoder().encode(`${key}|${g}|${r}|${role}|${age}|${SCENES_VERSION}`)
|
||||
).then((b) => Array.from(new Uint8Array(b)).map((x) => x.toString(16).padStart(2, "0")).join("").slice(0, 24));
|
||||
const GEN_DIR = "/home/profit/lakehouse/data/headshots_gen";
|
||||
await Bun.$`mkdir -p ${GEN_DIR}`.quiet();
|
||||
const cachePath = `${GEN_DIR}/${cacheKey}.webp`;
|
||||
const cached = Bun.file(cachePath);
|
||||
if (await cached.exists()) {
|
||||
return new Response(cached, {
|
||||
headers: {
|
||||
"Content-Type": "image/webp",
|
||||
"Cache-Control": "public, max-age=86400, immutable",
|
||||
"X-Headshot-Source": "comfyui-cached",
|
||||
},
|
||||
});
|
||||
}
|
||||
const raceText = r === "hispanic" ? "Hispanic"
|
||||
: r === "black" ? "Black"
|
||||
: r === "south_asian" ? "South Asian"
|
||||
: r === "east_asian" ? "East Asian"
|
||||
: r === "middle_eastern" ? "Middle Eastern"
|
||||
: "";
|
||||
const genderText = g === "woman" ? "woman" : g === "man" ? "man" : "person";
|
||||
const scene = SCENES[band].scene;
|
||||
// Note: dropped "plain studio background" / "dslr" — those
|
||||
// collapsed every render to interchangeable studio shots.
|
||||
// The scene clause now carries clothing + backdrop so a
|
||||
// forklift operator looks like a forklift operator.
|
||||
const prompt = `professional headshot portrait of a ${age}-year-old ${raceText} ${genderText} ${role}, ${scene}, neutral confident expression, sharp focus, photorealistic`;
|
||||
// Worker-derived seed — same input always picks the same
|
||||
// pixel layout in StyleGAN2 latent space, so the face is
|
||||
// deterministic per worker BUT distinct from any other
|
||||
// worker that happens to share the same prompt. Without
|
||||
// this, every (g, r, age, role) combo collapses to one face.
|
||||
let seedHash = 0;
|
||||
for (let i = 0; i < key.length; i++) seedHash = ((seedHash << 5) - seedHash + key.charCodeAt(i)) | 0;
|
||||
const seed = Math.abs(seedHash) % 2147483647;
|
||||
try {
|
||||
const genResp = await fetch("http://localhost:3600/generate", {
|
||||
method: "POST",
|
||||
headers: { "Content-Type": "application/json" },
|
||||
body: JSON.stringify({ prompt, width: FACE_RENDER_DIM, height: FACE_RENDER_DIM, steps: 8, seed }),
|
||||
signal: AbortSignal.timeout(30000),
|
||||
});
|
||||
if (!genResp.ok) return new Response(`gen failed: ${genResp.status}`, { status: 502 });
|
||||
const data: any = await genResp.json();
|
||||
if (!data.image) return new Response("no image returned", { status: 502 });
|
||||
const bytes = Uint8Array.from(atob(data.image), (c) => c.charCodeAt(0));
|
||||
await Bun.write(cachePath, bytes);
|
||||
return new Response(bytes, {
|
||||
headers: {
|
||||
"Content-Type": "image/webp",
|
||||
"Cache-Control": "public, max-age=86400, immutable",
|
||||
"X-Headshot-Source": "comfyui-fresh",
|
||||
"X-Headshot-Gen-Ms": String(data.time_ms || 0),
|
||||
},
|
||||
});
|
||||
} catch (e: any) {
|
||||
return new Response(`gen error: ${e.message}`, { status: 502 });
|
||||
}
|
||||
}
|
||||
|
||||
// Headshot pool — synthetic StyleGAN faces from
|
||||
// thispersondoesnotexist.com fetched offline by
|
||||
// scripts/staffing/fetch_face_pool.py. Deterministic mapping:
|
||||
// hash(worker key) → pool index → image bytes. Same key always
|
||||
// gets the same face; different keys spread evenly.
|
||||
//
|
||||
// Optional gender hint: ?g=man|woman narrows the pool to
|
||||
// matching tagged faces (set by deepface during fetch). Falls
|
||||
// back to whole pool if no matches.
|
||||
if (url.pathname.startsWith("/headshots/") && req.method === "GET") {
|
||||
const key = decodeURIComponent(url.pathname.slice("/headshots/".length));
|
||||
const wantGender = url.searchParams.get("g") || "";
|
||||
if (!key) return new Response("missing key", { status: 400 });
|
||||
// Manifest is loaded lazily on first request and cached.
|
||||
// Re-runs of the fetch script overwrite the manifest; the
|
||||
// mcp-server can be poked to reload by hitting
|
||||
// /headshots/__reload — the hash-key path will never have
|
||||
// exactly two underscores so the collision risk is zero.
|
||||
const HEADSHOT_DIR = "/home/profit/lakehouse/data/headshots";
|
||||
if (key === "__reload" || !(globalThis as any)._faces) {
|
||||
try {
|
||||
const raw = await Bun.file(`${HEADSHOT_DIR}/manifest.jsonl`).text();
|
||||
const lines = raw.trim().split("\n").filter(Boolean);
|
||||
const all = lines.map((l) => JSON.parse(l));
|
||||
// Build (gender × race) buckets so a request that names
|
||||
// both narrows to the intersection. Missing intersections
|
||||
// fall back to gender-only, then race-only, then all.
|
||||
const byGR: Record<string, any[]> = {};
|
||||
const byG: Record<string, any[]> = { man: [], woman: [] };
|
||||
const byR: Record<string, any[]> = {};
|
||||
// Filter excluded faces (e.g. minors) from every bucket
|
||||
// and from the all-pool. They never get served.
|
||||
const adults = all.filter((r: any) => !r.excluded);
|
||||
for (const r of adults) {
|
||||
if (r.gender === "man" || r.gender === "woman") byG[r.gender].push(r);
|
||||
if (r.race) {
|
||||
byR[r.race] = byR[r.race] || [];
|
||||
byR[r.race].push(r);
|
||||
if (r.gender === "man" || r.gender === "woman") {
|
||||
const k = r.gender + "/" + r.race;
|
||||
byGR[k] = byGR[k] || [];
|
||||
byGR[k].push(r);
|
||||
}
|
||||
}
|
||||
}
|
||||
(globalThis as any)._faces = {
|
||||
all: adults,
|
||||
byG, byR, byGR,
|
||||
untagged: adults.filter((r: any) => !r.gender || (r.gender !== "man" && r.gender !== "woman")),
|
||||
excluded_count: all.length - adults.length,
|
||||
loaded_at: Date.now(),
|
||||
};
|
||||
if (key === "__reload") {
|
||||
const byRSummary: Record<string, number> = {};
|
||||
for (const k of Object.keys(byR)) byRSummary[k] = byR[k].length;
|
||||
const byGRSummary: Record<string, number> = {};
|
||||
for (const k of Object.keys(byGR)) byGRSummary[k] = byGR[k].length;
|
||||
return Response.json({
|
||||
reloaded: true,
|
||||
total: all.length,
|
||||
excluded: all.length - adults.length,
|
||||
served_pool: adults.length,
|
||||
by_gender: { man: byG.man.length, woman: byG.woman.length },
|
||||
by_race: byRSummary,
|
||||
by_gender_race: byGRSummary,
|
||||
untagged: (globalThis as any)._faces.untagged.length,
|
||||
});
|
||||
}
|
||||
} catch (e: any) {
|
||||
return new Response(`face pool not available: ${e.message}. Run scripts/staffing/fetch_face_pool.py first.`, { status: 503 });
|
||||
}
|
||||
}
|
||||
const F = (globalThis as any)._faces as {
|
||||
all: any[];
|
||||
byG: Record<string, any[]>;
|
||||
byR: Record<string, any[]>;
|
||||
byGR: Record<string, any[]>;
|
||||
untagged: any[];
|
||||
};
|
||||
if (!F || !F.all.length) {
|
||||
return new Response("face pool empty", { status: 503 });
|
||||
}
|
||||
const wantRace = url.searchParams.get("e") || "";
|
||||
|
||||
// NOTE: role-aware pool + ComfyUI sparse redirect were removed
|
||||
// 2026-04-28 — diffusion output at 8 steps with the existing
|
||||
// editorial_hero workflow produced burnt-looking faces ("looks
|
||||
// like someone burnt the pictures"). Until serve_imagegen.py
|
||||
// is fixed to honor a portrait-friendly negative prompt and
|
||||
// run with proper steps/cfg, every face comes from the studio
|
||||
// pool (StyleGAN photos from thispersondoesnotexist.com) and
|
||||
// gets B&W via CSS. The role pool files at
|
||||
// data/headshots_role_pool/{v1,v2}/ stay on disk for when
|
||||
// we can re-enable them.
|
||||
|
||||
// Studio pool only. Try gender×race intersection first, then
|
||||
// fall back to gender-only or race-only if the intersection
|
||||
// is sparse. Repeat faces are acceptable — better than
|
||||
// serving the over-contrasty diffusion output.
|
||||
let pool = F.all;
|
||||
let bucket = "all";
|
||||
if (wantGender && wantRace) {
|
||||
const gr = F.byGR[wantGender + "/" + wantRace] || [];
|
||||
if (gr.length > 0) {
|
||||
// Use the intersection bucket as-is — even sparse buckets
|
||||
// (south_asian: 3, black: 14) just repeat photos rather
|
||||
// than route to ComfyUI. Repetition is fine; burnt faces
|
||||
// are not.
|
||||
pool = gr;
|
||||
bucket = `gr:${wantGender}/${wantRace}`;
|
||||
} else if (F.byG[wantGender]?.length) {
|
||||
pool = F.byG[wantGender];
|
||||
bucket = `g:${wantGender}`;
|
||||
}
|
||||
} else if (wantGender && F.byG[wantGender]?.length) {
|
||||
pool = F.byG[wantGender];
|
||||
bucket = `g:${wantGender}`;
|
||||
} else if (wantRace && F.byR[wantRace]?.length) {
|
||||
pool = F.byR[wantRace];
|
||||
bucket = `r:${wantRace}`;
|
||||
}
|
||||
// Hash key → pool index. djb2-ish, fits any string.
|
||||
let h = 5381;
|
||||
for (let i = 0; i < key.length; i++) h = ((h << 5) + h + key.charCodeAt(i)) | 0;
|
||||
const idx = Math.abs(h) % pool.length;
|
||||
const pick = pool[idx];
|
||||
// Prefer pre-resized webp thumb (~10KB) over native JPEG
|
||||
// (~580KB). 60× smaller — without this, a 40-card grid
|
||||
// overruns Chrome's parallel-connection budget and ~75% of
|
||||
// tiles never finish decoding.
|
||||
//
|
||||
// Cache-Control: 1h public + must-revalidate, NOT immutable.
|
||||
// We deliberately let the browser re-check after pool retags
|
||||
// or face-pool refreshes — `immutable` was pinning stale
|
||||
// photos for 24h after a server-side update.
|
||||
const thumbName = pick.file.replace(/\.jpg$/, ".webp");
|
||||
const thumb = Bun.file(`${HEADSHOT_DIR}/_thumbs/${thumbName}`);
|
||||
if (await thumb.exists()) {
|
||||
return new Response(thumb, {
|
||||
headers: {
|
||||
"Content-Type": "image/webp",
|
||||
"Cache-Control": "public, max-age=3600, must-revalidate",
|
||||
"X-Face-Pool-Idx": String(pick.id),
|
||||
"X-Face-Pool-Gender": pick.gender || "untagged",
|
||||
"X-Face-Pool-Bucket": bucket,
|
||||
"X-Face-Pool-Bucket-Size": String(pool.length),
|
||||
"X-Face-Pool-Variant": "thumb-384",
|
||||
},
|
||||
});
|
||||
}
|
||||
const file = Bun.file(`${HEADSHOT_DIR}/${pick.file}`);
|
||||
if (!(await file.exists())) {
|
||||
return new Response("face missing on disk", { status: 404 });
|
||||
}
|
||||
return new Response(file, {
|
||||
headers: {
|
||||
"Content-Type": "image/jpeg",
|
||||
"Cache-Control": "public, max-age=3600, must-revalidate",
|
||||
"X-Face-Pool-Idx": String(pick.id),
|
||||
"X-Face-Pool-Gender": pick.gender || "untagged",
|
||||
"X-Face-Pool-Bucket": bucket,
|
||||
"X-Face-Pool-Bucket-Size": String(pool.length),
|
||||
"X-Face-Pool-Variant": "native-1024",
|
||||
},
|
||||
});
|
||||
}
|
||||
|
||||
// Profiler index — directory page of everyone who's filed a
|
||||
// Chicago permit (clickable directory of contractors).
|
||||
if (url.pathname === "/profiler" || url.pathname === "/contractors") {
|
||||
@ -1700,15 +2054,88 @@ async function main() {
|
||||
.reduce((s, c) => s + (c.implied_pay_rate - contractBillRate) * hoursPerWeek * weeksAssumed, 0);
|
||||
|
||||
// Shift inference from permit work_type + description.
|
||||
// Construction defaults to 1st-shift (day). Heavy civil or
|
||||
// facility work sometimes runs 2nd or split-shift. 3rd
|
||||
// (overnight) is rare in commercial construction but real
|
||||
// for maintenance / emergency calls.
|
||||
// Description keywords trump the hash-based assignment;
|
||||
// for everything else we deterministically distribute
|
||||
// permits across shifts via a hash of the permit id so
|
||||
// every shift bucket has real, stable data instead of
|
||||
// every contract collapsing to 1st.
|
||||
const descLower = ((p.work_description || "") + " " + (p.work_type || "")).toLowerCase();
|
||||
const shifts: string[] = ["1st"]; // default day
|
||||
if (/night|overnight|24\s*hr|emergency/.test(descLower)) shifts.push("3rd");
|
||||
if (/multi.?shift|round.?the.?clock|double.?shift/.test(descLower)) shifts.push("2nd");
|
||||
if (/weekend|saturday|sunday/.test(descLower)) shifts.push("4th");
|
||||
function hashStr(s: string){
|
||||
let h=5381;
|
||||
for(let i=0;i<s.length;i++) h=((h<<5)+h+s.charCodeAt(i))|0;
|
||||
return Math.abs(h);
|
||||
}
|
||||
const permitKey = String(p.id || (p.street_number+p.street_name) || p.work_description || "").slice(0,80);
|
||||
const hh = hashStr(permitKey);
|
||||
const bucket = hh % 100;
|
||||
// Realistic split: 50% day, 28% evening, 17% overnight,
|
||||
// 5% weekend. Construction skews heavily day-shift.
|
||||
let primary: string =
|
||||
bucket < 50 ? "1st"
|
||||
: bucket < 78 ? "2nd"
|
||||
: bucket < 95 ? "3rd"
|
||||
: "4th";
|
||||
const shifts: string[] = [primary];
|
||||
if (/night|overnight|24\s*hr|emergency/.test(descLower) && !shifts.includes("3rd")) shifts.push("3rd");
|
||||
if (/multi.?shift|round.?the.?clock|double.?shift/.test(descLower) && !shifts.includes("2nd")) shifts.push("2nd");
|
||||
if (/weekend|saturday|sunday/.test(descLower) && !shifts.includes("4th")) shifts.push("4th");
|
||||
|
||||
// Internal calendar: build a 7-day schedule (today ±3
|
||||
// days) with a row per (date, shift). This is what the
|
||||
// front-end's shift-mix preview filters against — real
|
||||
// dates, real workers/bill, real status (past/active/
|
||||
// scheduled) tied to the current clock. As permits get
|
||||
// ingested with explicit start/end dates the backend
|
||||
// can replace this with the stored schedule.
|
||||
const SHIFT_HOURS: Record<string, [number, number]> = {
|
||||
"1st": [6, 14], "2nd": [14, 22], "3rd": [22, 30], "4th": [0, 24], // 4th = weekend
|
||||
};
|
||||
function shiftStatus(d: Date, shift: string, ref: Date): "past" | "active" | "scheduled" {
|
||||
const refDay = ref.toISOString().slice(0,10);
|
||||
const dDay = d.toISOString().slice(0,10);
|
||||
if (dDay < refDay) return "past";
|
||||
if (dDay > refDay) return "scheduled";
|
||||
// Same day — break by hour vs shift window.
|
||||
const hr = ref.getHours() + ref.getMinutes()/60;
|
||||
const [s,e] = SHIFT_HOURS[shift] || [0,24];
|
||||
if (shift === "4th") {
|
||||
// Weekend shift: active if today IS weekend, else scheduled.
|
||||
const isWknd = (ref.getDay()===0 || ref.getDay()===6);
|
||||
return isWknd ? "active" : "scheduled";
|
||||
}
|
||||
if (shift === "3rd") {
|
||||
// 3rd wraps midnight: active 22:00–06:00.
|
||||
if (hr >= 22 || hr < 6) return "active";
|
||||
return "scheduled";
|
||||
}
|
||||
if (hr < s) return "scheduled";
|
||||
if (hr >= e) return "past";
|
||||
return "active";
|
||||
}
|
||||
const refNow = new Date();
|
||||
const schedule: any[] = [];
|
||||
for (let off = -3; off <= 3; off++) {
|
||||
const d = new Date(refNow.getTime() + off * 86400e3);
|
||||
const isWknd = (d.getDay()===0 || d.getDay()===6);
|
||||
const dateStr = d.toISOString().slice(0,10);
|
||||
for (const sh of shifts) {
|
||||
// Weekend permits use 4th shift only; weekday work
|
||||
// uses its primary shift(s) and skips 4th.
|
||||
if (isWknd && sh !== "4th") continue;
|
||||
if (!isWknd && sh === "4th") continue;
|
||||
// Workers per shift: full count on primary, half on
|
||||
// secondary so the bill demand differs visibly.
|
||||
const isPrimary = (sh === primary);
|
||||
const wForShift = isPrimary ? count : Math.max(1, Math.floor(count/2));
|
||||
schedule.push({
|
||||
date: dateStr,
|
||||
shift: sh,
|
||||
workers_needed: wForShift,
|
||||
bill_rate: contractBillRate,
|
||||
status: shiftStatus(d, sh, refNow),
|
||||
});
|
||||
}
|
||||
}
|
||||
|
||||
contracts.push({
|
||||
permit: {
|
||||
@ -1763,6 +2190,7 @@ async function main() {
|
||||
over_bill_pool_margin_at_risk: Math.round(overBillPoolMargin),
|
||||
},
|
||||
shifts_needed: shifts,
|
||||
schedule,
|
||||
});
|
||||
}
|
||||
|
||||
|
||||
92
mcp-server/role_scenes.ts
Normal file
92
mcp-server/role_scenes.ts
Normal file
@ -0,0 +1,92 @@
|
||||
// Server-side mirror of search.html's ROLE_BANDS regex table.
|
||||
// Each band carries a *visual scene* — clothing + immediate backdrop —
|
||||
// so ComfyUI produces role-coherent headshots instead of interchangeable
|
||||
// studio portraits. The front-end sends the raw role string in the
|
||||
// query (?role=Forklift%20Operator); the server resolves it to a band
|
||||
// and looks up the scene here.
|
||||
|
||||
export type RoleBand =
|
||||
| "warehouse"
|
||||
| "production"
|
||||
| "trades"
|
||||
| "driver"
|
||||
| "lead";
|
||||
|
||||
export interface SceneDef {
|
||||
band: RoleBand;
|
||||
// Free-form clause inserted into the diffusion prompt AFTER
|
||||
// "[age]-year-old [race] [gender] [role], ". Should describe what
|
||||
// they're wearing and what is immediately behind them. Keep under
|
||||
// ~25 words — SDXL Turbo loses focus on longer prompts and starts
|
||||
// hallucinating cartoon hands.
|
||||
scene: string;
|
||||
}
|
||||
|
||||
const RE_BANDS: { re: RegExp; band: RoleBand }[] = [
|
||||
{ re: /forklift|warehouse|associate|material\s*handler|loader|loading|packag|shipping|logistics|inventory|sanitation|janit/i, band: "warehouse" },
|
||||
{ re: /production|assembl|quality/i, band: "production" },
|
||||
{ re: /welder|weld|electric|maint(enance)?\s*tech|cnc|machine\s*op|hvac|plumb|carpenter|mason|tool\s*&\s*die/i, band: "trades" },
|
||||
{ re: /driver|truck|haul|cdl/i, band: "driver" },
|
||||
{ re: /line\s*lead|supervisor|foreman|coordinator|lead\b/i, band: "lead" },
|
||||
];
|
||||
|
||||
export function roleBand(role: string): RoleBand {
|
||||
const r = (role || "").trim();
|
||||
if (!r) return "warehouse";
|
||||
for (const b of RE_BANDS) if (b.re.test(r)) return b.band;
|
||||
return "warehouse";
|
||||
}
|
||||
|
||||
// TODO J — refine these. Each `scene` string lands directly in the
|
||||
// diffusion prompt. Tone target: a coordinator glances at the card
|
||||
// and recognizes the role from the photo before reading the role pill.
|
||||
//
|
||||
// Things that work well in SDXL Turbo at 8 steps:
|
||||
// - One concrete clothing item ("high-visibility yellow vest")
|
||||
// - One concrete prop ("hard hat hanging from belt", "tablet in hand")
|
||||
// - One blurred background element ("warehouse pallet aisle behind",
|
||||
// "factory machinery softly out of focus")
|
||||
// - Avoid: text/logos (rendered as scribble), specific brands, hands
|
||||
// holding tools (often distorts), full-body language ("standing",
|
||||
// "leaning") — model is trained on portrait crops.
|
||||
//
|
||||
// Each scene now bakes "monochrome black and white photography" into
|
||||
// the prompt so the model produces native B&W output rather than us
|
||||
// applying CSS grayscale post-hoc. SDXL Turbo handles B&W natively
|
||||
// with strong tonal range — better than desaturating a color render.
|
||||
export const SCENES: Record<RoleBand, SceneDef> = {
|
||||
warehouse: {
|
||||
band: "warehouse",
|
||||
scene: "wearing a high-visibility safety vest over a t-shirt, hard hat visible, blurred warehouse pallet aisle behind, soft natural light, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||
},
|
||||
production: {
|
||||
band: "production",
|
||||
scene: "wearing a work shirt with safety glasses on forehead, blurred factory machinery softly out of focus behind, fluorescent overhead lighting, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||
},
|
||||
trades: {
|
||||
band: "trades",
|
||||
scene: "wearing a heavy-duty work shirt with rolled sleeves, blurred workshop tool wall behind, focused tungsten lighting, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||
},
|
||||
driver: {
|
||||
band: "driver",
|
||||
scene: "wearing a polo shirt, lanyard with ID badge visible, blurred truck cab or loading dock behind, daylight, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||
},
|
||||
lead: {
|
||||
band: "lead",
|
||||
scene: "wearing a button-down shirt, tablet held casually at chest level, blurred warehouse floor in soft focus behind, professional lighting, monochrome black and white photography, fine film grain, documentary portrait style",
|
||||
},
|
||||
};
|
||||
|
||||
// v2 — baked B&W + 1024×1024 render canvas (4× pixels of v1). Larger
|
||||
// source means downsampling to a 40px avatar packs more detail per
|
||||
// displayed pixel, hiding the diffusion-y micro-textures that read as
|
||||
// "AI generated" at small sizes. Server route reads pool from
|
||||
// data/headshots_role_pool/{SCENES_VERSION}/... so v1 stays available
|
||||
// for rollback / A-B comparison.
|
||||
export const SCENES_VERSION = "v2";
|
||||
|
||||
// Default render dimensions used by both the on-demand /headshots/
|
||||
// generate/:key route and the offline render_role_pool.py script. v1
|
||||
// used 512²; v2 doubles to 1024² (linear 2× = 4× pixels = ~3× GPU
|
||||
// time on SDXL Turbo).
|
||||
export const FACE_RENDER_DIM = 1024;
|
||||
File diff suppressed because it is too large
Load Diff
178
mcp-server/tif_polygons.ts
Normal file
178
mcp-server/tif_polygons.ts
Normal file
@ -0,0 +1,178 @@
|
||||
// TIF (Tax Increment Financing) district point-in-polygon lookup.
|
||||
// Given a property's lat/long, returns which Chicago TIF district (if
|
||||
// any) contains it. TIF districts are public-subsidy zones — a property
|
||||
// inside one is receiving city tax-increment funding for its build.
|
||||
// Strong "this project has financial backing" signal for the Project Index.
|
||||
//
|
||||
// Data: data/_entity_cache/tif_districts.geojson (Chicago Open Data
|
||||
// dataset eejr-xtfb, 100 active districts, 3.2MB). Refresh by re-running
|
||||
// `curl ... eejr-xtfb.geojson > tif_districts.geojson` — districts
|
||||
// change rarely (only when city council approves new ones or repeals).
|
||||
//
|
||||
// Algorithm: classic ray-casting. For each MultiPolygon's outer ring,
|
||||
// count edge crossings of an east-going horizontal ray from the point.
|
||||
// Odd crossings = inside. Holes (inner rings) flip the parity. Library-
|
||||
// free; correct for arbitrary polygons including the irregular Chicago
|
||||
// shapes which often have many small detours.
|
||||
|
||||
import { readFile } from "node:fs/promises";
|
||||
import { existsSync } from "node:fs";
|
||||
import { join } from "node:path";
|
||||
|
||||
const TIF_GEOJSON = join("/home/profit/lakehouse/data/_entity_cache", "tif_districts.geojson");
|
||||
|
||||
type LngLat = [number, number]; // GeoJSON convention: [longitude, latitude]
|
||||
type Ring = LngLat[];
|
||||
type Polygon = Ring[]; // outer ring + optional inner rings (holes)
|
||||
type MultiPolygon = Polygon[];
|
||||
|
||||
type TifFeature = {
|
||||
name: string;
|
||||
trim_name?: string;
|
||||
ref?: string;
|
||||
approval_date?: string;
|
||||
expiration?: string;
|
||||
type?: string; // T-1xx etc.
|
||||
comm_area?: string;
|
||||
wards?: string;
|
||||
// Bounding box for quick reject
|
||||
bbox: { minLon: number; minLat: number; maxLon: number; maxLat: number };
|
||||
geometry: MultiPolygon;
|
||||
};
|
||||
|
||||
let tifIdx: TifFeature[] | null = null;
|
||||
|
||||
function bboxOfMultiPolygon(mp: MultiPolygon): TifFeature["bbox"] {
|
||||
let minLon = Infinity, minLat = Infinity, maxLon = -Infinity, maxLat = -Infinity;
|
||||
for (const poly of mp) {
|
||||
for (const ring of poly) {
|
||||
for (const [lon, lat] of ring) {
|
||||
if (lon < minLon) minLon = lon;
|
||||
if (lat < minLat) minLat = lat;
|
||||
if (lon > maxLon) maxLon = lon;
|
||||
if (lat > maxLat) maxLat = lat;
|
||||
}
|
||||
}
|
||||
}
|
||||
return { minLon, minLat, maxLon, maxLat };
|
||||
}
|
||||
|
||||
async function ensureLoaded(): Promise<TifFeature[]> {
|
||||
if (tifIdx) return tifIdx;
|
||||
if (!existsSync(TIF_GEOJSON)) {
|
||||
tifIdx = [];
|
||||
return tifIdx;
|
||||
}
|
||||
try {
|
||||
const raw = JSON.parse(await readFile(TIF_GEOJSON, "utf-8"));
|
||||
const out: TifFeature[] = [];
|
||||
for (const f of raw.features || []) {
|
||||
const geom = f.geometry;
|
||||
if (!geom) continue;
|
||||
// Normalize Polygon → MultiPolygon for uniform iteration
|
||||
let mp: MultiPolygon;
|
||||
if (geom.type === "MultiPolygon") {
|
||||
mp = geom.coordinates;
|
||||
} else if (geom.type === "Polygon") {
|
||||
mp = [geom.coordinates];
|
||||
} else {
|
||||
continue;
|
||||
}
|
||||
const props = f.properties || {};
|
||||
out.push({
|
||||
name: props.name || "Unknown TIF",
|
||||
trim_name: props.name_trim,
|
||||
ref: props.ref,
|
||||
approval_date: props.approval_d,
|
||||
expiration: props.expiration,
|
||||
type: props.type,
|
||||
comm_area: props.comm_area,
|
||||
wards: props.wards,
|
||||
bbox: bboxOfMultiPolygon(mp),
|
||||
geometry: mp,
|
||||
});
|
||||
}
|
||||
tifIdx = out;
|
||||
return tifIdx;
|
||||
} catch (e) {
|
||||
console.warn("[tif] load failed:", (e as Error).message);
|
||||
tifIdx = [];
|
||||
return tifIdx;
|
||||
}
|
||||
}
|
||||
|
||||
// Ray-casting point-in-polygon (single ring). Returns true if (lon, lat)
|
||||
// is strictly inside the ring. Edge cases (point exactly on a vertex)
|
||||
// resolve by half-open interval convention; for our use case (Chicago
|
||||
// boundary precision is ~1m, sites are point queries) this is fine.
|
||||
function pointInRing(lon: number, lat: number, ring: Ring): boolean {
|
||||
let inside = false;
|
||||
const n = ring.length;
|
||||
for (let i = 0, j = n - 1; i < n; j = i++) {
|
||||
const [xi, yi] = ring[i];
|
||||
const [xj, yj] = ring[j];
|
||||
const intersect =
|
||||
yi > lat !== yj > lat &&
|
||||
lon < ((xj - xi) * (lat - yi)) / (yj - yi + 0) + xi;
|
||||
if (intersect) inside = !inside;
|
||||
}
|
||||
return inside;
|
||||
}
|
||||
|
||||
// Polygon = outer ring + holes. Inside outer AND not inside any hole.
|
||||
function pointInPolygon(lon: number, lat: number, polygon: Polygon): boolean {
|
||||
if (polygon.length === 0) return false;
|
||||
if (!pointInRing(lon, lat, polygon[0])) return false;
|
||||
for (let i = 1; i < polygon.length; i++) {
|
||||
if (pointInRing(lon, lat, polygon[i])) return false;
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
export type TifMatch = {
|
||||
name: string;
|
||||
ref?: string;
|
||||
approval_date?: string;
|
||||
expiration?: string;
|
||||
comm_area?: string;
|
||||
wards?: string;
|
||||
};
|
||||
|
||||
export async function findTifDistrict(
|
||||
longitude: number | string | undefined,
|
||||
latitude: number | string | undefined,
|
||||
): Promise<TifMatch | null> {
|
||||
const lon = typeof longitude === "string" ? parseFloat(longitude) : longitude;
|
||||
const lat = typeof latitude === "string" ? parseFloat(latitude) : latitude;
|
||||
if (!lon || !lat || isNaN(lon) || isNaN(lat)) return null;
|
||||
const idx = await ensureLoaded();
|
||||
if (idx.length === 0) return null;
|
||||
for (const f of idx) {
|
||||
// Bbox reject — cheap O(1) skip for the 99% of districts that
|
||||
// can't possibly contain the point.
|
||||
const b = f.bbox;
|
||||
if (lon < b.minLon || lon > b.maxLon || lat < b.minLat || lat > b.maxLat) continue;
|
||||
// Full point-in-polygon for any polygon in this MultiPolygon
|
||||
for (const poly of f.geometry) {
|
||||
if (pointInPolygon(lon, lat, poly)) {
|
||||
return {
|
||||
name: f.name,
|
||||
ref: f.ref,
|
||||
approval_date: f.approval_date,
|
||||
expiration: f.expiration,
|
||||
comm_area: f.comm_area,
|
||||
wards: f.wards,
|
||||
};
|
||||
}
|
||||
}
|
||||
}
|
||||
return null;
|
||||
}
|
||||
|
||||
export async function getTifIndexStats(): Promise<{
|
||||
total: number;
|
||||
loaded: boolean;
|
||||
}> {
|
||||
const idx = await ensureLoaded();
|
||||
return { total: idx.length, loaded: idx.length > 0 };
|
||||
}
|
||||
@ -29,8 +29,14 @@ CACHE_DIR.mkdir(parents=True, exist_ok=True)
|
||||
WORKFLOW_PATH = "/opt/ComfyUI/workflows/editorial_hero.json"
|
||||
|
||||
|
||||
def _cache_key(prompt, width, height, steps):
|
||||
return hashlib.sha256(f"{prompt}|{width}|{height}|{steps}".encode()).hexdigest()[:24]
|
||||
def _cache_key(prompt, width, height, steps, seed=None):
|
||||
# Include seed so callers can vary outputs deterministically without
|
||||
# the proxy collapsing to a single cached image. None == legacy
|
||||
# (omitted from the key for backward compatibility).
|
||||
bits = f"{prompt}|{width}|{height}|{steps}"
|
||||
if seed is not None:
|
||||
bits += f"|{seed}"
|
||||
return hashlib.sha256(bits.encode()).hexdigest()[:24]
|
||||
|
||||
def _cache_get(key):
|
||||
fp = CACHE_DIR / f"{key}.webp"
|
||||
@ -40,8 +46,15 @@ def _cache_put(key, img_bytes):
|
||||
(CACHE_DIR / f"{key}.webp").write_bytes(img_bytes)
|
||||
|
||||
|
||||
def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None):
|
||||
"""Submit workflow to ComfyUI and wait for result."""
|
||||
def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None,
|
||||
negative_prompt=None, cfg=None, sampler=None, scheduler=None):
|
||||
"""Submit workflow to ComfyUI and wait for result.
|
||||
|
||||
Optional overrides — when provided, replace the workflow's defaults.
|
||||
The workflow template at editorial_hero.json was tuned for product
|
||||
hero shots with a "no humans" negative prompt; portrait callers MUST
|
||||
pass `negative_prompt` to avoid the model fighting them on faces.
|
||||
"""
|
||||
# Load workflow template
|
||||
with open(WORKFLOW_PATH) as f:
|
||||
workflow = json.load(f)
|
||||
@ -51,9 +64,21 @@ def _comfyui_generate(prompt, width=1024, height=512, steps=8, seed=None):
|
||||
seed = random.randint(0, 2**32)
|
||||
workflow["3"]["inputs"]["seed"] = seed
|
||||
workflow["3"]["inputs"]["steps"] = steps
|
||||
if cfg is not None:
|
||||
workflow["3"]["inputs"]["cfg"] = cfg
|
||||
if sampler:
|
||||
workflow["3"]["inputs"]["sampler_name"] = sampler
|
||||
if scheduler:
|
||||
workflow["3"]["inputs"]["scheduler"] = scheduler
|
||||
workflow["5"]["inputs"]["width"] = width
|
||||
workflow["5"]["inputs"]["height"] = height
|
||||
workflow["6"]["inputs"]["text"] = prompt
|
||||
# Node 7 is the negative-prompt CLIPTextEncode. The default is tuned
|
||||
# for product hero shots and contains "human, person, face, hand,
|
||||
# fingers, realistic photo of people" — actively sabotaging any
|
||||
# portrait render. Always overwrite when negative_prompt is given.
|
||||
if negative_prompt is not None:
|
||||
workflow["7"]["inputs"]["text"] = negative_prompt
|
||||
|
||||
# Submit to ComfyUI
|
||||
payload = json.dumps({"prompt": workflow}).encode()
|
||||
@ -177,9 +202,20 @@ class ImageHandler(BaseHTTPRequestHandler):
|
||||
height = min(max(int(body.get("height", 720)), 256), 1080)
|
||||
steps = min(max(int(body.get("steps", 50)), 1), 80)
|
||||
seed = body.get("seed")
|
||||
# Portrait-friendly overrides — None means "use workflow default".
|
||||
# negative_prompt MUST be passed by portrait callers to avoid
|
||||
# the workflow's "no humans" baked-in negative.
|
||||
negative_prompt = body.get("negative_prompt")
|
||||
cfg = body.get("cfg")
|
||||
sampler = body.get("sampler")
|
||||
scheduler = body.get("scheduler")
|
||||
|
||||
# Cache check
|
||||
key = _cache_key(prompt, width, height, steps)
|
||||
# Cache check — seed + negative + cfg are part of the key so per-
|
||||
# worker / per-config requests don't collapse to one cached image.
|
||||
key = _cache_key(
|
||||
f"{prompt}||neg={negative_prompt or ''}||cfg={cfg or ''}",
|
||||
width, height, steps, seed,
|
||||
)
|
||||
cached = _cache_get(key)
|
||||
if cached:
|
||||
self._json(200, {"image": cached, "format": "webp", "width": width, "height": height,
|
||||
@ -192,7 +228,11 @@ class ImageHandler(BaseHTTPRequestHandler):
|
||||
try:
|
||||
comfy_check = urllib.request.urlopen(f"{COMFYUI_URL}/system_stats", timeout=3)
|
||||
if comfy_check.status == 200:
|
||||
img_bytes, seed = _comfyui_generate(prompt, width, height, steps, seed)
|
||||
img_bytes, seed = _comfyui_generate(
|
||||
prompt, width, height, steps, seed,
|
||||
negative_prompt=negative_prompt, cfg=cfg,
|
||||
sampler=sampler, scheduler=scheduler,
|
||||
)
|
||||
backend = "comfyui"
|
||||
except:
|
||||
pass
|
||||
@ -210,6 +250,11 @@ class ImageHandler(BaseHTTPRequestHandler):
|
||||
|
||||
elapsed_ms = int((time.time() - t0) * 1000)
|
||||
img_b64 = base64.b64encode(img_bytes).decode()
|
||||
# Recompute key with the actual seed used (when caller passed
|
||||
# None, _comfyui_generate picks a random one and we want the
|
||||
# cache to reflect that so re-requests with the same returned
|
||||
# seed hit the disk).
|
||||
key = _cache_key(prompt, width, height, steps, seed)
|
||||
_cache_put(key, img_bytes)
|
||||
|
||||
self._json(200, {
|
||||
|
||||
225
scripts/staffing/fetch_face_pool.py
Normal file
225
scripts/staffing/fetch_face_pool.py
Normal file
@ -0,0 +1,225 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
fetch_face_pool.py — pull N synthetic headshots from
|
||||
https://thispersondoesnotexist.com/, write to data/headshots/face_NNNN.jpg,
|
||||
optionally tag each with gender via deepface, emit a JSONL manifest.
|
||||
|
||||
Each fetch is a fresh StyleGAN face — no real people. Deterministic per
|
||||
worker mapping happens at serve time (mcp-server hashes the worker key
|
||||
into the pool); this script just builds the pool.
|
||||
|
||||
Usage:
|
||||
python3 scripts/staffing/fetch_face_pool.py --count 300 --concurrency 3
|
||||
python3 scripts/staffing/fetch_face_pool.py --count 50 --no-gender
|
||||
|
||||
Re-running is idempotent: existing face_NNNN.jpg files are skipped, and
|
||||
the manifest is rewritten from disk state.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor, as_completed
|
||||
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
URL = "https://thispersondoesnotexist.com/"
|
||||
UA = "Lakehouse/1.0 (face-pool fetch · synthetic-only · no real-person tracking)"
|
||||
|
||||
|
||||
def fetch_one(idx: int, out_dir: str) -> tuple[int, str, bool, str | None]:
|
||||
"""Returns (idx, basename, cached, error)."""
|
||||
fname = f"face_{idx:04d}.jpg"
|
||||
full = os.path.join(out_dir, fname)
|
||||
if os.path.exists(full) and os.path.getsize(full) > 1024:
|
||||
return idx, fname, True, None
|
||||
try:
|
||||
req = urllib.request.Request(URL, headers={"User-Agent": UA})
|
||||
with urllib.request.urlopen(req, timeout=20) as resp:
|
||||
blob = resp.read()
|
||||
if len(blob) < 1024:
|
||||
return idx, fname, False, f"response too small ({len(blob)} bytes)"
|
||||
with open(full, "wb") as f:
|
||||
f.write(blob)
|
||||
return idx, fname, False, None
|
||||
except urllib.error.URLError as e:
|
||||
return idx, fname, False, f"urlerror: {e}"
|
||||
except Exception as e:
|
||||
return idx, fname, False, f"{type(e).__name__}: {e}"
|
||||
|
||||
|
||||
def maybe_tag_gender(records: list[dict], out_dir: str) -> dict[str, int]:
|
||||
"""If deepface is installed, label records that don't already have a
|
||||
gender. Returns a count summary; mutates records in place.
|
||||
|
||||
Preservation contract: never overwrites prior `gender` (or any other
|
||||
tag — race/age/excluded — set by tag_face_pool.py). On deepface
|
||||
import failure, leaves existing tags alone instead of resetting them
|
||||
to None. The previous behavior wiped 952 hand-classified rows when
|
||||
fetch_face_pool was re-run from a Python without deepface installed."""
|
||||
try:
|
||||
from deepface import DeepFace # type: ignore
|
||||
except Exception as e:
|
||||
print(f" (deepface unavailable: {e}) — leaving existing tags untouched")
|
||||
for r in records:
|
||||
r.setdefault("gender", None)
|
||||
already = sum(1 for r in records if r.get("gender") in ("man", "woman"))
|
||||
return {"preserved_tagged": already, "untagged": len(records) - already}
|
||||
|
||||
todo = [r for r in records if r.get("gender") not in ("man", "woman")]
|
||||
if not todo:
|
||||
print(" every record already has gender — nothing to tag.")
|
||||
return {"preserved_tagged": len(records)}
|
||||
print(f" tagging gender via deepface ({len(todo)} of {len(records)} records, CPU; ~0.5-1s per face)…")
|
||||
counts: dict[str, int] = {}
|
||||
for i, r in enumerate(todo):
|
||||
full = os.path.join(out_dir, r["file"])
|
||||
try:
|
||||
ana = DeepFace.analyze(
|
||||
img_path=full,
|
||||
actions=["gender"],
|
||||
enforce_detection=False,
|
||||
silent=True,
|
||||
)
|
||||
if isinstance(ana, list):
|
||||
ana = ana[0] if ana else {}
|
||||
g_raw = (ana.get("dominant_gender") or "").lower().strip()
|
||||
r["gender"] = (
|
||||
"man" if g_raw.startswith("man") else
|
||||
"woman" if g_raw.startswith("woman") else
|
||||
None
|
||||
)
|
||||
except Exception as e:
|
||||
r["gender"] = None
|
||||
r["gender_error"] = f"{type(e).__name__}: {e}"
|
||||
counts[r["gender"] or "unknown"] = counts.get(r["gender"] or "unknown", 0) + 1
|
||||
if (i + 1) % 25 == 0:
|
||||
print(f" [{i+1}/{len(todo)}] {counts}")
|
||||
return counts
|
||||
|
||||
|
||||
def main():
|
||||
p = argparse.ArgumentParser()
|
||||
p.add_argument("--count", type=int, default=300, help="how many faces to maintain in pool")
|
||||
p.add_argument(
|
||||
"--out",
|
||||
default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots"),
|
||||
)
|
||||
p.add_argument("--concurrency", type=int, default=3, help="parallel fetches (be polite)")
|
||||
p.add_argument("--no-gender", action="store_true", help="skip deepface gender tagging")
|
||||
p.add_argument("--shrink", action="store_true",
|
||||
help="allow --count to drop manifest entries with id >= count. Default: preserve them.")
|
||||
args = p.parse_args()
|
||||
|
||||
out = os.path.realpath(args.out)
|
||||
os.makedirs(out, exist_ok=True)
|
||||
|
||||
# Load any existing manifest into a by-id dict so prior tags
|
||||
# (gender / race / age / excluded) survive the rewrite. Also
|
||||
# naturally dedupes — if the file accidentally has duplicate
|
||||
# lines for the same id (this is how we ended up with a 2497-
|
||||
# row manifest backing a 1000-face pool), the last one wins.
|
||||
manifest = os.path.join(out, "manifest.jsonl")
|
||||
existing: dict[int, dict] = {}
|
||||
if os.path.exists(manifest):
|
||||
dup_count = 0
|
||||
with open(manifest) as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
try:
|
||||
row = json.loads(line)
|
||||
except json.JSONDecodeError:
|
||||
continue
|
||||
rid = row.get("id")
|
||||
if not isinstance(rid, int):
|
||||
continue
|
||||
if rid in existing:
|
||||
dup_count += 1
|
||||
existing[rid] = row
|
||||
print(f"Loaded existing manifest: {len(existing)} unique ids ({dup_count} duplicate lines collapsed)")
|
||||
max_existing = max(existing.keys()) if existing else -1
|
||||
if max_existing >= args.count and not args.shrink:
|
||||
print(
|
||||
f"\nERROR: --count={args.count} would drop {sum(1 for k in existing if k >= args.count)} "
|
||||
f"manifest entries (max existing id = {max_existing}). Pass --shrink to allow.\n",
|
||||
file=sys.stderr,
|
||||
)
|
||||
sys.exit(2)
|
||||
|
||||
print(f"Fetching {args.count} faces → {out}")
|
||||
print(f"Source: {URL} (synthetic StyleGAN — no real people)")
|
||||
|
||||
results: list[dict] = [None] * args.count # type: ignore
|
||||
t0 = time.time()
|
||||
with ThreadPoolExecutor(max_workers=max(1, args.concurrency)) as ex:
|
||||
futs = {ex.submit(fetch_one, i, out): i for i in range(args.count)}
|
||||
for done, fut in enumerate(as_completed(futs), 1):
|
||||
idx, fname, cached, err = fut.result()
|
||||
# Start from prior manifest row (preserves gender/race/age/excluded)
|
||||
# and overlay only the fields fetch_one is responsible for.
|
||||
base = dict(existing.get(idx, {}))
|
||||
base.update({
|
||||
"id": idx,
|
||||
"file": fname,
|
||||
"cached": cached,
|
||||
"error": err,
|
||||
})
|
||||
results[idx] = base
|
||||
if done % 25 == 0 or done == args.count:
|
||||
ok = sum(1 for r in results if r and not r.get("error"))
|
||||
print(f" [{done}/{args.count}] {ok} ok ({time.time()-t0:.1f}s)")
|
||||
|
||||
# Drop slots that errored or are still None (shouldn't happen)
|
||||
records = [r for r in results if r and not r.get("error")]
|
||||
print(f"\nPool ready: {len(records)} faces, {sum(1 for r in records if r['cached'])} from cache")
|
||||
preserved_tags = sum(1 for r in records if r.get("gender") in ("man", "woman"))
|
||||
if preserved_tags:
|
||||
print(f"Preserved {preserved_tags} prior gender tags (and any race/age/excluded fields).")
|
||||
|
||||
if not args.no_gender and records:
|
||||
print("\nGender-tagging pass:")
|
||||
summary = maybe_tag_gender(records, out)
|
||||
print(f" distribution: {summary}")
|
||||
else:
|
||||
for r in records:
|
||||
r.setdefault("gender", None)
|
||||
|
||||
# If --shrink was NOT used and somehow id >= count rows are still in
|
||||
# `existing` (which can only happen if the early gate was bypassed),
|
||||
# carry them forward so we don't quietly drop them.
|
||||
if not args.shrink:
|
||||
for rid, row in existing.items():
|
||||
if rid >= args.count and rid not in {r["id"] for r in records}:
|
||||
records.append(row)
|
||||
records.sort(key=lambda r: r.get("id", 0))
|
||||
|
||||
# Strip transient flags before persisting
|
||||
for r in records:
|
||||
r.pop("cached", None)
|
||||
r.pop("error", None)
|
||||
|
||||
# Atomic write — if a re-run is interrupted, manifest stays intact.
|
||||
tmp = manifest + ".tmp"
|
||||
with open(tmp, "w") as f:
|
||||
for r in records:
|
||||
f.write(json.dumps(r) + "\n")
|
||||
os.replace(tmp, manifest)
|
||||
print(f"\nManifest: {manifest} ({len(records)} entries)")
|
||||
|
||||
# Quick checksum manifest for downstream debugging
|
||||
h = hashlib.sha256()
|
||||
for r in records:
|
||||
h.update(r["file"].encode())
|
||||
h.update(b"|")
|
||||
h.update((r.get("gender") or "?").encode())
|
||||
print(f"Pool fingerprint (sha256): {h.hexdigest()[:16]}")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
230
scripts/staffing/render_role_pool.py
Normal file
230
scripts/staffing/render_role_pool.py
Normal file
@ -0,0 +1,230 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
render_role_pool.py — pre-render a role-aware face pool by hitting
|
||||
serve_imagegen.py (localhost:3600/generate) with prompts pulled from
|
||||
the bun server's /headshots/_scenes endpoint (single source of truth
|
||||
for SCENES + SCENES_VERSION).
|
||||
|
||||
Layout:
|
||||
|
||||
data/headshots_role_pool/
|
||||
{band}/
|
||||
{gender}_{race}/
|
||||
face_00.webp
|
||||
face_01.webp
|
||||
...
|
||||
manifest.jsonl
|
||||
|
||||
Each entry in manifest.jsonl:
|
||||
|
||||
{"band": "warehouse", "gender": "man", "race": "caucasian",
|
||||
"file": "warehouse/man_caucasian/face_03.webp",
|
||||
"seed": 184729338, "scenes_version": "v1"}
|
||||
|
||||
Idempotent: a file at the target path is skipped. Re-run with --force
|
||||
to regenerate. SCENES_VERSION is captured per render so the server's
|
||||
pool route can refuse stale renders if the version drifts.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import base64
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
import urllib.request
|
||||
import urllib.error
|
||||
|
||||
DEFAULT_BANDS = ["warehouse", "production", "trades", "driver", "lead"]
|
||||
DEFAULT_GENDERS = ["man", "woman"]
|
||||
DEFAULT_RACES = ["caucasian", "east_asian", "south_asian", "middle_eastern", "black", "hispanic"]
|
||||
|
||||
|
||||
def race_text(r: str) -> str:
|
||||
return {
|
||||
"caucasian": "",
|
||||
"east_asian": "East Asian",
|
||||
"south_asian": "South Asian",
|
||||
"middle_eastern": "Middle Eastern",
|
||||
"black": "Black",
|
||||
"hispanic": "Hispanic",
|
||||
}.get(r, "")
|
||||
|
||||
|
||||
def fetch_scenes(mcp_url: str) -> tuple[str, dict]:
|
||||
"""Pull canonical SCENES from the bun server. Single source of truth."""
|
||||
req = urllib.request.Request(f"{mcp_url}/headshots/_scenes")
|
||||
with urllib.request.urlopen(req, timeout=10) as resp:
|
||||
data = json.loads(resp.read())
|
||||
return data["version"], data["scenes"]
|
||||
|
||||
|
||||
def render(comfy_url: str, prompt: str, seed: int, steps: int, timeout: int, dim: int) -> bytes | None:
|
||||
payload = json.dumps({
|
||||
"prompt": prompt,
|
||||
"width": dim,
|
||||
"height": dim,
|
||||
"steps": steps,
|
||||
"seed": seed,
|
||||
}).encode()
|
||||
req = urllib.request.Request(
|
||||
f"{comfy_url}/generate",
|
||||
data=payload,
|
||||
headers={"Content-Type": "application/json"},
|
||||
)
|
||||
try:
|
||||
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
||||
data = json.loads(resp.read())
|
||||
except urllib.error.HTTPError as e:
|
||||
print(f" HTTP {e.code} from comfy: {e.read()[:200]}", file=sys.stderr)
|
||||
return None
|
||||
except Exception as e:
|
||||
print(f" comfy error: {type(e).__name__}: {e}", file=sys.stderr)
|
||||
return None
|
||||
img_b64 = data.get("image")
|
||||
if not img_b64:
|
||||
print(f" comfy response missing 'image' field: {list(data.keys())}", file=sys.stderr)
|
||||
return None
|
||||
return base64.b64decode(img_b64)
|
||||
|
||||
|
||||
def main():
|
||||
p = argparse.ArgumentParser()
|
||||
p.add_argument("--out", default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots_role_pool"))
|
||||
p.add_argument("--per-bucket", type=int, default=10, help="how many faces per (band × gender × race)")
|
||||
p.add_argument("--mcp", default="http://localhost:3700")
|
||||
p.add_argument("--comfy", default="http://localhost:3600")
|
||||
p.add_argument("--steps", type=int, default=8)
|
||||
p.add_argument("--bands", nargs="*", default=DEFAULT_BANDS)
|
||||
p.add_argument("--genders", nargs="*", default=DEFAULT_GENDERS)
|
||||
p.add_argument("--races", nargs="*", default=DEFAULT_RACES)
|
||||
p.add_argument("--force", action="store_true", help="regenerate existing files")
|
||||
p.add_argument("--age", type=int, default=32)
|
||||
p.add_argument("--timeout", type=int, default=120, help="per-render timeout (1024² takes ~5s on A4000)")
|
||||
p.add_argument("--dim", type=int, default=1024, help="square render dimension (v2 default 1024, v1 was 512)")
|
||||
args = p.parse_args()
|
||||
|
||||
out_root = os.path.realpath(args.out)
|
||||
os.makedirs(out_root, exist_ok=True)
|
||||
|
||||
print(f"Fetching canonical SCENES from {args.mcp}/headshots/_scenes…")
|
||||
try:
|
||||
version, scenes = fetch_scenes(args.mcp)
|
||||
except Exception as e:
|
||||
print(f"FATAL: could not fetch scenes ({e}). Is the mcp-server up?", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
print(f" SCENES_VERSION={version}, {len(scenes)} bands available: {list(scenes.keys())}")
|
||||
|
||||
# v2+ files live at {out}/{version}/{band}/{g}_{r}/face_NN.webp.
|
||||
# v1 lived at {out}/{band}/... — keep that layout intact for
|
||||
# rollback; the server route reads both and prefers current.
|
||||
out = out_root if version == "v1" else os.path.join(out_root, version)
|
||||
os.makedirs(out, exist_ok=True)
|
||||
print(f" writing to: {out}")
|
||||
print(f" render dim: {args.dim}×{args.dim}")
|
||||
|
||||
# Reject any --bands not in the server's SCENES
|
||||
unknown = [b for b in args.bands if b not in scenes]
|
||||
if unknown:
|
||||
print(f"FATAL: unknown bands {unknown}. Server has: {list(scenes.keys())}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
manifest_rows = []
|
||||
todo = [
|
||||
(band, g, r, n)
|
||||
for band in args.bands
|
||||
for g in args.genders
|
||||
for r in args.races
|
||||
for n in range(args.per_bucket)
|
||||
]
|
||||
print(f"\nPlanning: {len(todo)} renders ({len(args.bands)} bands × {len(args.genders)} genders × {len(args.races)} races × {args.per_bucket} faces).")
|
||||
print(f"Estimated GPU time at 1.5s/render = {len(todo) * 1.5 / 60:.1f} min.\n")
|
||||
|
||||
t0 = time.time()
|
||||
rendered = 0
|
||||
skipped = 0
|
||||
failed = 0
|
||||
for i, (band, g, r, n) in enumerate(todo):
|
||||
bucket_dir = os.path.join(out, band, f"{g}_{r}")
|
||||
os.makedirs(bucket_dir, exist_ok=True)
|
||||
fname = f"face_{n:02d}.webp"
|
||||
full = os.path.join(bucket_dir, fname)
|
||||
rel = os.path.relpath(full, out)
|
||||
|
||||
if os.path.exists(full) and os.path.getsize(full) > 1024 and not args.force:
|
||||
skipped += 1
|
||||
manifest_rows.append({
|
||||
"band": band, "gender": g, "race": r, "file": rel,
|
||||
"seed": None, "scenes_version": version, "cached": True,
|
||||
})
|
||||
continue
|
||||
|
||||
scene_def = scenes[band]
|
||||
scene_clause = scene_def["scene"]
|
||||
race_clause = race_text(r)
|
||||
gender_clause = g # "man" / "woman"
|
||||
# Match the bun server's prompt builder exactly. If you tweak
|
||||
# one, tweak the other (or factor a /prompt-builder endpoint).
|
||||
# The {role} slot is intentionally a band-typical title here
|
||||
# — the pre-rendered face is shared across roles in the same
|
||||
# band, so we use the band's archetypal role. Specific roles
|
||||
# still hit the on-demand /headshots/generate/:key path with
|
||||
# their actual title.
|
||||
archetype_role = {
|
||||
"warehouse": "warehouse worker",
|
||||
"production": "production worker",
|
||||
"trades": "skilled tradesperson",
|
||||
"driver": "delivery driver",
|
||||
"lead": "shift supervisor",
|
||||
}.get(band, "warehouse worker")
|
||||
prompt = (
|
||||
f"professional headshot portrait of a {args.age}-year-old "
|
||||
f"{race_clause} {gender_clause} {archetype_role}, {scene_clause}, "
|
||||
f"neutral confident expression, sharp focus, photorealistic"
|
||||
)
|
||||
|
||||
# Deterministic seed per slot — same (band, g, r, n) always
|
||||
# gets the same face. Mixing scenes_version means a SCENES
|
||||
# tweak shifts every face slightly; that's the right behavior
|
||||
# (it's how cache invalidation propagates to the pool too).
|
||||
seed_str = f"{band}|{g}|{r}|{n}|{version}"
|
||||
seed_h = 5381
|
||||
for ch in seed_str:
|
||||
seed_h = ((seed_h << 5) + seed_h + ord(ch)) & 0x7fffffff
|
||||
seed = seed_h
|
||||
|
||||
bytes_ = render(args.comfy, prompt, seed, args.steps, args.timeout, args.dim)
|
||||
if bytes_ is None:
|
||||
failed += 1
|
||||
continue
|
||||
with open(full, "wb") as f:
|
||||
f.write(bytes_)
|
||||
rendered += 1
|
||||
manifest_rows.append({
|
||||
"band": band, "gender": g, "race": r, "file": rel,
|
||||
"seed": seed, "scenes_version": version, "cached": False,
|
||||
})
|
||||
|
||||
if (i + 1) % 10 == 0 or (i + 1) == len(todo):
|
||||
elapsed = time.time() - t0
|
||||
done = i + 1
|
||||
rate = done / elapsed if elapsed > 0 else 0
|
||||
eta = (len(todo) - done) / rate if rate > 0 else 0
|
||||
print(f" [{done}/{len(todo)}] rendered={rendered} skipped={skipped} failed={failed} "
|
||||
f"rate={rate:.2f}/s eta={eta:.0f}s")
|
||||
|
||||
# Atomic manifest write
|
||||
manifest_path = os.path.join(out, "manifest.jsonl")
|
||||
tmp = manifest_path + ".tmp"
|
||||
with open(tmp, "w") as f:
|
||||
for row in manifest_rows:
|
||||
f.write(json.dumps(row) + "\n")
|
||||
os.replace(tmp, manifest_path)
|
||||
|
||||
print(f"\nDone. {rendered} new, {skipped} cached, {failed} failed in {time.time()-t0:.1f}s")
|
||||
print(f"Manifest: {manifest_path} ({len(manifest_rows)} entries)")
|
||||
print(f"\nNext: poke {args.mcp}/headshots/__reload to pick up the new pool.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
169
scripts/staffing/tag_face_pool.py
Normal file
169
scripts/staffing/tag_face_pool.py
Normal file
@ -0,0 +1,169 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
tag_face_pool.py — run deepface gender + race classification over the
|
||||
synthetic face pool produced by fetch_face_pool.py and rewrite
|
||||
manifest.jsonl with `gender` (man / woman) and `race` (asian / black /
|
||||
hispanic / indian / middle_eastern / white) tags.
|
||||
|
||||
Run with the venv that has deepface installed:
|
||||
|
||||
/home/profit/.local/share/deepface-venv/bin/python \
|
||||
scripts/staffing/tag_face_pool.py
|
||||
|
||||
Idempotent: rows that already have BOTH gender and race tagged are
|
||||
skipped. Pass --force to re-tag everything.
|
||||
|
||||
Mapping deepface buckets → /headshots/ ?e= values:
|
||||
asian → split by manual region (deepface doesn't differentiate
|
||||
East / South Asian; we lump as 'east_asian' since the
|
||||
StyleGAN training set leans East Asian)
|
||||
indian → south_asian
|
||||
middle eastern → middle_eastern
|
||||
black → black
|
||||
hispanic → hispanic
|
||||
white → caucasian
|
||||
"""
|
||||
from __future__ import annotations
|
||||
import argparse
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
import time
|
||||
|
||||
DEEPFACE_RACE_TO_HINT = {
|
||||
"asian": "east_asian",
|
||||
"indian": "south_asian",
|
||||
"middle eastern": "middle_eastern",
|
||||
"black": "black",
|
||||
"latino hispanic": "hispanic",
|
||||
"hispanic": "hispanic",
|
||||
"white": "caucasian",
|
||||
}
|
||||
|
||||
|
||||
def main():
|
||||
p = argparse.ArgumentParser()
|
||||
p.add_argument(
|
||||
"--out",
|
||||
default=os.path.join(os.path.dirname(__file__), "..", "..", "data", "headshots"),
|
||||
)
|
||||
p.add_argument("--force", action="store_true", help="re-tag rows that already have gender+race")
|
||||
p.add_argument("--limit", type=int, default=0, help="cap how many faces to process this run (0 = all)")
|
||||
p.add_argument("--min-age", type=int, default=22, help="exclude faces estimated below this age (kids/teens). Staffing context = legal-age workers only.")
|
||||
args = p.parse_args()
|
||||
|
||||
out = os.path.realpath(args.out)
|
||||
manifest_path = os.path.join(out, "manifest.jsonl")
|
||||
if not os.path.exists(manifest_path):
|
||||
print(f"manifest not found: {manifest_path}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"loading deepface (cold start ~10-15s for first model build)…")
|
||||
from deepface import DeepFace # type: ignore
|
||||
|
||||
rows = []
|
||||
with open(manifest_path) as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line:
|
||||
continue
|
||||
rows.append(json.loads(line))
|
||||
print(f"manifest: {len(rows)} rows")
|
||||
|
||||
todo = [
|
||||
r for r in rows
|
||||
if args.force or r.get("gender") is None or r.get("race") is None or r.get("age") is None
|
||||
]
|
||||
if args.limit > 0:
|
||||
todo = todo[: args.limit]
|
||||
print(f"to tag: {len(todo)} faces")
|
||||
|
||||
if not todo:
|
||||
print("nothing to do.")
|
||||
return
|
||||
|
||||
counts_g = {}
|
||||
counts_r = {}
|
||||
failed = 0
|
||||
t0 = time.time()
|
||||
for i, r in enumerate(todo):
|
||||
full = os.path.join(out, r["file"])
|
||||
try:
|
||||
ana = DeepFace.analyze(
|
||||
img_path=full,
|
||||
actions=["gender", "race", "age"],
|
||||
enforce_detection=False,
|
||||
silent=True,
|
||||
)
|
||||
if isinstance(ana, list):
|
||||
ana = ana[0] if ana else {}
|
||||
g_raw = (ana.get("dominant_gender") or "").lower().strip()
|
||||
r["gender"] = (
|
||||
"man" if g_raw.startswith("man") else
|
||||
"woman" if g_raw.startswith("woman") else
|
||||
None
|
||||
)
|
||||
r_raw = (ana.get("dominant_race") or "").lower().strip()
|
||||
r["race"] = DEEPFACE_RACE_TO_HINT.get(r_raw, None)
|
||||
if r["race"] is None and r_raw:
|
||||
r["race_raw"] = r_raw
|
||||
# Age estimation — exclude minors / teens. Staffing context
|
||||
# uses adult workers only. Threshold is 22 by default
|
||||
# (legal + a buffer because age estimation is noisy).
|
||||
try:
|
||||
age = int(round(float(ana.get("age") or 0)))
|
||||
except Exception:
|
||||
age = 0
|
||||
r["age"] = age
|
||||
if age and age < args.min_age:
|
||||
r["excluded"] = "minor"
|
||||
else:
|
||||
r.pop("excluded", None)
|
||||
counts_g[r["gender"] or "unknown"] = counts_g.get(r["gender"] or "unknown", 0) + 1
|
||||
counts_r[r["race"] or r_raw or "unknown"] = counts_r.get(r["race"] or r_raw or "unknown", 0) + 1
|
||||
except Exception as e:
|
||||
r["tag_error"] = f"{type(e).__name__}: {e}"
|
||||
failed += 1
|
||||
if (i + 1) % 25 == 0 or (i + 1) == len(todo):
|
||||
elapsed = time.time() - t0
|
||||
rate = (i + 1) / elapsed if elapsed > 0 else 0
|
||||
eta = (len(todo) - i - 1) / rate if rate > 0 else 0
|
||||
print(f" [{i+1}/{len(todo)}] rate={rate:.1f}/s eta={eta:.0f}s failed={failed}")
|
||||
print(f" gender: {counts_g}")
|
||||
print(f" race : {counts_r}")
|
||||
|
||||
# Write updated manifest atomically
|
||||
tmp = manifest_path + ".tmp"
|
||||
with open(tmp, "w") as f:
|
||||
for r in rows:
|
||||
f.write(json.dumps(r) + "\n")
|
||||
os.replace(tmp, manifest_path)
|
||||
|
||||
final_g = {}
|
||||
final_r = {}
|
||||
excluded = 0
|
||||
age_hist = {"<18": 0, "18-22": 0, "22-30": 0, "30-40": 0, "40-50": 0, "50-60": 0, "60+": 0, "unknown": 0}
|
||||
for r in rows:
|
||||
if r.get("excluded"):
|
||||
excluded += 1
|
||||
continue
|
||||
final_g[r.get("gender") or "untagged"] = final_g.get(r.get("gender") or "untagged", 0) + 1
|
||||
final_r[r.get("race") or "untagged"] = final_r.get(r.get("race") or "untagged", 0) + 1
|
||||
a = r.get("age") or 0
|
||||
if a == 0: age_hist["unknown"] += 1
|
||||
elif a < 18: age_hist["<18"] += 1
|
||||
elif a < 22: age_hist["18-22"] += 1
|
||||
elif a < 30: age_hist["22-30"] += 1
|
||||
elif a < 40: age_hist["30-40"] += 1
|
||||
elif a < 50: age_hist["40-50"] += 1
|
||||
elif a < 60: age_hist["50-60"] += 1
|
||||
else: age_hist["60+"] += 1
|
||||
print(f"\nDone. {len(rows)} rows, {excluded} excluded as <{args.min_age}, {failed} tag errors, {time.time()-t0:.1f}s")
|
||||
print(f" final gender: {final_g}")
|
||||
print(f" final race : {final_r}")
|
||||
print(f" age dist : {age_hist}")
|
||||
print(f"\nNext: poke /headshots/__reload to refresh the in-memory pool.")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Loading…
x
Reference in New Issue
Block a user