diff --git a/.gitignore b/.gitignore index 1a87915..2a9ebf2 100644 --- a/.gitignore +++ b/.gitignore @@ -4,3 +4,49 @@ .env __pycache__/ *.pyc + +# Headshot pool — binary face JPGs are fetched by scripts/staffing/fetch_face_pool.py +# (synthetic StyleGAN, ~580MB for 1000 faces). Manifest + fetch script are tracked. +data/headshots/face_*.jpg +data/headshots/_thumbs/ +# ComfyUI on-demand generated portraits (per-worker unique). Cached on first +# request; fully regeneratable via /headshots/generate/:key. +data/headshots_gen/ + +# Runtime data — all regeneratable from inputs or accumulated by daemons. +# Anything under data/_/ is internal state (auditor outputs, KB caches, +# pathway memory snapshots, HNSW trial results, etc.). Anything under +# data/datasets/ or data/vectors/ is generated by ingest/index pipelines. +data/_*/ +data/lance/ +data/datasets/ +data/vectors/ +data/demo/ +data/evidence/ +data/face_test/ +data/headshots_role_pool/ +data/icons_pool/ +data/scored-runs/ +data/workspaces/ +data/catalog/ +data/**/*.bak-* +data/**/*.pre-*-bak + +# Logs +logs/ + +# Build artifacts +node_modules/ +exports/ +mcp-server/data/ + +# Per-run distillation reports (timestamp-named); keep the parent dir tracked +# via .gitkeep if needed but don't carry every batch's report set. +reports/distillation/[0-9]*/ +reports/distillation/*-*-*-*-*/ + +# Test scratch — scratchpads, traces, sessions are regenerated each run. +# PRD/scenario fixtures stay tracked (they ARE the test). +tests/agent_test/_* +tests/agent_test/sessions/ +tests/real-world/runs/ diff --git a/Cargo.lock b/Cargo.lock index 9baea2b..4b0285b 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -48,6 +48,7 @@ version = "0.1.0" dependencies = [ "async-trait", "axum", + "lru", "reqwest", "serde", "serde_json", diff --git a/STATE_OF_PLAY.md b/STATE_OF_PLAY.md new file mode 100644 index 0000000..31d1150 --- /dev/null +++ b/STATE_OF_PLAY.md @@ -0,0 +1,269 @@ +# STATE OF PLAY — Lakehouse + +**Last verified:** 2026-05-02 evening CDT +**Verified by:** live probe (smoke 9/9, parity 32/32, gateway restarted), not memory. + +> **Read this FIRST.** When the user says "we're working on lakehouse," they mean the working code captured below — NOT what `git log` framed as "the cutover" or what memory snapshots from 2 days ago suggest. If memory contradicts this file, this file wins. Update it when something is verified working — not when a phase finishes. + +--- + +## WHAT LANDED 2026-05-01 → 2026-05-02 (10 commits this wave) + +| Commit | What | Verified | +|---|---|---| +| `5d30b3d` | lance: auto-build doc_id btree in `lance_migrate` handler | doc-fetch ~5ms (was ~100ms full scan) on scale_test_10m | +| `044650a` | lance-bench: same scalar build post-IVF (matches gateway) | cargo check clean | +| `7594725` | lance: 4-pack — `sanitize_lance_err` + 7 unit tests + 9-probe smoke + 10M re-bench | smoke 9/9 PASS, tests 7/7 PASS | +| `98b6647` | gateway: `IterateResponse.trace_id` echoed; session_log_path enabled | parity probes see one unified JSONL | +| `57bde63` | gateway: trace-id propagation + coordinator session JSONL (Rust parity with Go wave) | session_log_parity 4/4 | +| `ba928b1` | aibridge: drop Python sidecar from hot path; AiClient → direct Ollama | aibridge tests 32/32 PASS, /ai/embed live 768d | +| `654797a` | gateway: pub `extract_json` + `parity_extract_json` bin | extract_json_parity 12/12 | +| `c5654d4` | docs: pointer to `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md` | — | +| `150cc3b` | aibridge: LRU embed cache, 236× RPS warm (78ms → 129us p50) | load test | +| `9eed982` | mcp-server: /_go/* pass-through for G5 cutover slice | — | +| `6e34ef7` | gitignore: stop tracking 100+ runtime ephemera (data/_*, lance, logs, node_modules) | untracked dropped 100+ → 0 | +| `41b0a99` | chore: add 33 real items that were sitting untracked (scripts, scenarios, kimi reports, dev UIs) | clean working tree | + +**Cross-runtime parity (post-this-wave):** 32/32 across 5 probes — `validator(6/6) + extract_json(12/12) + session_log(4/4) + materializer(2/2) + embed(8/8)`. Run `cd /home/profit/golangLAKEHOUSE && for p in scripts/cutover/parity/*.sh; do bash "$p"; done` to re-verify. + +**Lance backend (was untested 5 days ago, now gauntlet-ready):** +- `cargo test -p vectord-lance --release` → 7/7 PASS +- `./scripts/lance_smoke.sh` → 9/9 PASS against live gateway +- `reports/lance_10m_rebench_2026-05-02.md` — search warm ~20ms / cold ~46ms median, doc-fetch ~5ms post-btree + +--- + +## VERIFIED WORKING RIGHT NOW + +### The client demo (Staffing Co-Pilot) + +**Public URL:** `https://devop.live/lakehouse/` — 200, "Staffing Co-Pilot" (159 KB SPA, leaflet maps, dark theme). +**Local URL:** `http://localhost:3700/` — same page, served by `mcp-server/index.ts` (PID 1271, started 09:48 CDT today). + +**The staffers console** (the one the client was thoroughly impressed with): +- `https://devop.live/lakehouse/console` — 200, "Lakehouse — What Your Staffing System Would Do" (26 KB) +- Pulls project index via `/api/catalog/datasets` (36 datasets) + playbook memory via `/api/vectors/playbook_memory/stats` (4,701 entries with embeddings, real ops like *"fill: Maintenance Tech x2 in Milwaukee, WI"*) + +Client-visible flow that works end-to-end on the public URL: + +| Endpoint | Sample output | +|---|---| +| `GET /api/catalog/datasets` | 36 datasets indexed: timesheets 1M, call_log 800K, workers_500k 500K, email_log 500K, workers_100k 100K, candidates 100K, placements 50K, job_orders 15K, successful_playbooks_live 2,077 | +| `GET /api/vectors/playbook_memory/stats` | 4,701 fill operations with embeddings | +| `GET /system/summary` | 36 datasets, 2.98M rows, 60 indexes, 500K workers loaded, 1K candidates | +| `POST /intelligence/staffing_forecast` | 744 Production Workers needed in 30d, 11,281 bench (4,687 reliable), coverage 1,444%, risk=ok. Same for Electrician (need 32, bench 2,440) and Maintenance Tech (need 17, bench 5,004). | +| `POST /intelligence/permit_contracts` | permit `3442956` $500K → 3 Production Workers, 886-candidate pool, 95% fill, $36K gross. 5 more Chicago permits with 8 workers each, same pool, 95% fill, $96K each. | +| `POST /intelligence/market` | major Chicago permits ranked: $730M O'Hare, $615M 307 N Michigan, $580M casino, $445M Loop transit (real geo coords). | +| `POST /intelligence/permit_entities` | architects + contractors from permit contacts (e.g. "KACPRZYNSKI, ANDY", "SLS ELECTRICAL SERVICE"). | +| `POST /intelligence/activity` + `/intelligence/arch_signals` + `/intelligence/chat` | all 200 | + +The demo tells the story: *"upcoming Chicago contracts → workers needed → coverage from the bench → architects/contractors involved → revenue and margin."* That's the "live data + anticipating contracts + complete workflow" pitch — working as of right now. + +### Backend, verified live this session + +| Surface | State | +|---|---| +| Gateway `:3100` | up, 4 providers configured, `/v1/health` 200 with 500K workers loaded | +| MCP server `:3700` (Co-Pilot demo) | up, all `/intelligence/*` endpoints respond | +| VCP UI `:3950` | started this session, `/data/*` 200, real numbers | +| Observer `:3800` | ring full (2,000/2,000) — older events evicted, query Langfuse for 24h-ago state | +| Sidecar `:3200` | up | +| Langfuse `:3001` | recording, `gw:/log` + `v1.chat:openrouter` traces visible | +| LLM Team UI `:5000` | up, only `extract` mode registered | +| OpenCode fleet | **40 models reachable through one `sk-*` key** (verified live `GET https://opencode.ai/zen/v1/models`) | + +OpenCode catalog (live): +- Claude: opus-4-7, opus-4-6, opus-4-5, opus-4-1, sonnet-4-6, sonnet-4-5, sonnet-4, haiku-4-5 +- GPT-5: 5.5-pro, 5.5, 5.4-pro, 5.4, 5.4-mini, 5.4-nano, 5.3-codex-spark, 5.3-codex, 5.2, 5.2-codex, 5.1-codex-max, 5.1-codex, 5.1-codex-mini, 5.1, 5-codex, 5-nano, 5 +- Gemini: 3.1-pro, 3-flash +- GLM: 5.1, 5 +- Minimax: m2.7, m2.5 +- Kimi: k2.6, k2.5 +- Qwen: 3.6-plus, 3.5-plus +- Other: BIG-PKL (was a typo-prone name in the catalog, model id starts with "big-pkl-something") +- Free tier: minimax-m2.5-free, hy3-preview-free, ling-2.6-flash-free, trinity-large-preview-free + +### The substrate (frozen — do not re-architect) + +- Distillation v1.0.0 at tag `e7636f2` — **145/145 bun tests pass, 22/22 acceptance, 16/16 audit-full** +- Output: `data/_kb/distilled_{facts,procedures,config_hints}.jsonl` + `data/vectors/distilled_{factual,procedural,config_hint}_v20260423102847.parquet` +- Auditor cross-lineage: Kimi K2.6 ↔ Haiku 4.5 alternation, Opus auto-promote on diffs >100k chars, **per-PR cap=3 with auto-reset on new head SHA** +- Pathway memory: 88 traces, 11/11 successful replays (probation gate crossed) +- Mode runner: 5 native modes; `codereview_isolation` is default; composed-corpus auto-downgrade verified Apr 26 (composed lost 5/5 vs isolation, p=0.031) + +### Matrix indexer + +30+ live corpora including: +- 5 versions of `workers_500k_v1..v9` (50K embedded chunks each) +- 11 batched 2K-row shards `w500k_b3..b17` +- `chicago_permits_v1` (3,420), `resumes_100k_v2` (100K candidates), `ethereal_workers_v1` (10K) +- `lakehouse_arch_v1` (2,119), `lakehouse_symbols_v1` (2,470), `lakehouse_answers_v1` (1,269), `scrum_findings_v1` (1,260) +- `kb_team_runs_v1` (12,693) + `kb_team_runs_agent` (4,407) — LLM-team play history embedded +- `distilled_factual_v20260423102507` (8) — distillation output + +### Code health + +- `cargo check --workspace` → **0 warnings, 0 errors** +- `bun test auditor + tests/distillation` → **145/145 pass** +- `ui/server.ts` + `auditor.ts` bundle clean + +--- + +## DO NOT RELITIGATE + +- **PR #11 is merged into `origin/main` as `ed57eda`** — do not "still need to merge PR #11." +- **Distillation tag `distillation-v1.0.0` at `e7636f2` is FROZEN** — do not re-architect schemas, scorer rules, audit fixtures. +- **Kimi forensic HOLD verdict (2026-04-27) was 2/8 false + 6/8 latent** — do not re-debate, see `reports/kimi/audit-last-week-full.md`. +- **`candidates_safe` `vertical` column bug** — fixed at catalog metadata layer in commit `c3c9c21`. Do not "discover" it again. +- **Decisions A/B/C/D from `synthetic-data-gap-report.md`** — all four scripts shipped today (`d56f08e`, `940737d`, `c3c9c21`). Do not "ask J for approval." +- **`workers_500k.phone` type fixup** — already string. The fixup script is idempotent; running it is a no-op. +- **`client_workerskjkk` typo dataset** — was breaking every SQL query (catalog had it registered, file didn't exist). Removed via `DELETE /catalog/datasets/by-name/client_workerskjkk` this session. Do not re-add. Adding a startup gate that errors on unrecognized parquet names is the long-term fix per now.md Step 2C. +- **Python sidecar dropped from hot path 2026-05-02 (`ba928b1`)** — AiClient calls Ollama directly. Do not "wire python embedding back in." `lab_ui.py` + `pipeline_lab.py` keep running as dev-only UIs (not on the runtime path). +- **Lance backend gauntlet (2026-05-02)** — sanitizer over all 5 routes, 7 unit tests, 9-probe smoke, 10M re-bench. The `doc_id` btree auto-builds inside `lance_migrate` AND `lance-bench`. Do not "discover" the missing scalar index again or the leaked filesystem paths in error bodies. +- **Cross-runtime parity = 32/32** across 5 probes in `golangLAKEHOUSE/scripts/cutover/parity/`. Do not "build a parity probe for X" without checking — validator, extract_json, session_log, materializer, and embed are all already covered. +- **Decisions tracker is `golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`** — single living source of truth for cross-runtime decisions. As of 2026-05-02 it has 0 `_open_` code work items; only 2 strategic items left (Lance vs Parquet+HNSW-with-spilling, Go-vs-Rust primary cutover). + +--- + +## FIXES MADE THIS SESSION (2026-04-27 evening) + +1. **`crates/gateway/src/v1/iterate.rs:93`** — `state` → `_state` (cleared the one cargo warning). +2. **`lakehouse-ui.service` (Dioxus)** — disabled. Was failing 7,242 times against a missing `target/dx/ui/debug/web/public` build dir. `systemctl stop && disable`. +3. **VCP UI on `:3950`** — started `bun run ui/server.ts` (PID 1162212, log `/tmp/lakehouse_ui.log`). `/data/*` endpoints now 200 with real data. +4. **`client_workerskjkk` catalog entry** — `DELETE /catalog/datasets/by-name/client_workerskjkk` removed the dead manifest. **This was the actual root cause** of `/system/summary` reporting `workers_500k_rows: 0` and the demo showing zero bench. Every SQL query was failing schema inference on the missing file before reaching its target table. Fixed → `workers_500k_rows: 500000`, `candidates_rows: 1000`, demo coverage flipped from "critical 0%" to actual percentages on devop.live/lakehouse. + +## FIXES MADE THIS SESSION (2026-04-28 early — face pool) + +5. **Synthetic StyleGAN face pool — 1000 faces, gender+race+age tagged.** `scripts/staffing/fetch_face_pool.py` fetches from thispersondoesnotexist.com; `scripts/staffing/tag_face_pool.py --min-age 22` runs deepface and excludes minors. `data/headshots/manifest.jsonl` now has gender (494 men / 458 women), race (caucasian 662 · east_asian 128 · hispanic 86 · middle_eastern 59 · black 14 · south_asian 3), age, and 48 minor exclusions. Server pool = 952 servable faces. +6. **`mcp-server/index.ts:1308` `/headshots/:key` route** — gender×race×age intersection bucketing with graceful fallback (gender-only → all). Same key always returns same face; different keys spread evenly. +7. **`/headshots/_thumbs/` pre-resized 384×384 webp** (60× smaller: 587KB → ~11KB). Without this, 40-card grids overran Chrome's parallel-connection budget and ~75% of tiles never finished decoding. Generated via parallel ffmpeg (`xargs -P 8`); `.gitignore`d. +8. **`mcp-server/search.html` + `console.html`** — dropped `img.loading='lazy'`. With 11KB thumbs, eager load is cheap (~500KB for 50 cards) and avoids the off-screen race that lazy decode produced. +9. **ComfyUI on-demand uniqueness — `serve_imagegen.py:32`** added `seed` to `_cache_key()` (was caching by prompt only — 3 different worker seeds collapsed to 1 cached image). Verified: seed=839185194/195/196 → 3 distinct md5s. +10. **`mcp-server/index.ts:1234` `/headshots/generate/:key`** — ComfyUI hot-path that derives a deterministic-per-worker seed via djb2-style hash; cold ~1.5s, cached ~1ms. Worker prompt format: `professional corporate headshot portrait of a {age}-year-old {race} {gender}, {role}, neutral expression, plain studio background, soft natural lighting, sharp focus, photorealistic, dslr`. Cache at `data/headshots_gen/` (gitignored, regeneratable). +11. **Confidence-default name resolution** in `search.html` — `genderFor()` and `guessEthnicityFromFirstName()` lookup tables (FEMALE_NAMES, MALE_NAMES, NAMES_HISPANIC, NAMES_BLACK, NAMES_SOUTH_ASIAN, NAMES_EAST_ASIAN, NAMES_MIDDLE_EASTERN). Xavier → man+hispanic, Aisha → woman+black, etc. Every worker resolves to a face-pool bucket. + +End-to-end verified: playwright run on `https://devop.live/lakehouse/?q=forklift+operators+IL` → 21/21 cards loaded, 0 broken, all 384×384 webp thumbs. + +--- + +## OPEN — but not blocking the demo + +| Item | What | When to act | +|---|---|---| +| `modes.toml` `staffing_inference.matrix_corpus` | still says `workers_500k_v8`. v9 in vector index is from Apr 17 (raw-sourced, not safe-view). The new `build_workers_v9.sh` rebuilds from `workers_safe`. | Run when you have 30+ min for the rebuild. | +| Open PRs #6, #7, #10 | sitting since Apr 22-24, auditor verdicts on disk at `data/_auditor/kimi_verdicts/{6,7,10}-*.json` | Read verdicts, decide reconcile/close. | +| `test/enrich-prd-pipeline` branch | 35 unmerged commits, includes more-evolved auditor/inference.ts (666 vs main's 580 lines), curation+fact-extractor wiring | Reconcile or formally archive — see `memory/project_unmerged_architecture_work.md`. | +| `federation-hnsw-trials` stash | Lance + S3/MinIO prototype, `aws-config` crate added, 708 insertions | Phase B from EXECUTION_PLAN.md — revisit when Parquet vector ceiling actually hurts. | +| `candidates` manifest drift | manifest 100K vs SQL 1K. Cosmetic. | Run a metadata resync if it matters. | + +--- + +## RUNTIME CHEATSHEET + +```bash +# Verify the demo (public + local both work) +curl -sS https://devop.live/lakehouse/ # Co-Pilot HTML +curl -sS https://devop.live/lakehouse/console # staffers console +curl -sS -X POST https://devop.live/lakehouse/intelligence/staffing_forecast \ + -d '{}' -H 'content-type: application/json' \ + | jq '.forecast[] | {role, demand_workers, bench_total, coverage_pct, risk}' + +# Restart sequence (after Rust changes) +sudo systemctl restart lakehouse.service # gateway :3100 +sudo systemctl restart lakehouse-auditor # auditor daemon +sudo systemctl restart lakehouse-observer # observer :3800 +# UI bun on :3950 is NOT systemd-managed (lakehouse-ui.service is disabled). +# Restart manually: kill ; nohup bun run ui/server.ts > /tmp/lakehouse_ui.log 2>&1 & + +# Health checks +curl -sS http://localhost:3100/v1/health | jq # workers_count, providers +curl -sS http://localhost:3100/vectors/pathway/stats | jq +curl -sS http://localhost:3100/v1/usage | jq # since-restart cost +curl -sS http://localhost:3700/system/summary | jq # dataset counts +``` + +--- + +## VISION — what we're actually building (not what's done) + +J's framing for the legacy staffing company: + +- Pull live data, anticipate contracts based on Chicago permits → real architect/contractor associations, headcount, time period, money, scope. +- Hybrid + memory index → search large corpora cheaply. +- Email comes in → verify against contract; SMS comes in → alert when index changes. +- Real-time. +- Invent metrics nobody else has using the hybrid index. +- Next stage: workers download an app → geolocation clock-in → automatic responsiveness measurement, no user effort, with incentives for using it. +- Find people getting certificates (passive cert tracking). +- Pull union data → bring contracts that work for **employees**, not just employers. +- All metrics visible, nothing hidden, value-aligned with what each side actually needs. + +If a future session is shaving away from this vision toward "fix the cutover" or "land Phase X," the vision wins. Phases are scaffolding for the vision, not the goal. + +--- + +## CURRENT PLAN — fix the demo for the legacy staffing client + +Built from playwright audit of the live demo (2026-04-27 evening). Each item ends in something the client can SEE, not internal cleanups. + +**Demo state is anchored by git tag `demo-2026-04-27`** (commit `ed57eda`, the merge of PR #11). To restore code state: `git checkout demo-2026-04-27`. To restore runtime state: `DELETE /catalog/datasets/by-name/client_workerskjkk` (catalog hot-fix is not in git). + +### P1 — Search box that actually filters (highest visible impact) + +**Problem:** typing in `#sq` and pressing Enter fires `POST /intelligence/chat` with body `{"message":""}`. The state (`#sst`) and role (`#srl`) selects are ignored — never sent in the body. So every search returns a generic chat completion, never a SQL+vector hybrid filter against `workers_500k`. That is the "cached/generic response" the client sees. + +**Fix:** in `mcp-server/search.html`, change the search-submit handler to call the real worker search endpoint with `{query, state, role, top_k}`. The MCP `search_workers` tool surface already exists; route the form there. Render returned worker rows in the existing card grid. + +**Done when:** typing "forklift" + state IL + role "Forklift Operator" returns ≤ top_k IL Forklift Operators, and changing state to WI returns different workers. + +### P2 — Contractor-name click → `/contractor` profile page + +**Problem:** clicking a contractor name in any rendered card stays on `/lakehouse/`. URL doesn't change. + +**Fix:** wrap contractor names in ``. The page `mcp-server/contractor.html` (14.8 KB, "Contractor Profile · Staffing Co-Pilot") already exists at `/contractor` and the data endpoint `/intelligence/contractor_profile` already returns rich data. + +**Then check contractor.html actually shows:** full history of every record the database has on that contractor + heat map of locations underneath + relevant info (per J 2026-04-27). If the page is incomplete, finish it. Otherwise just wire the link. + +**Done when:** clicking "KACPRZYNSKI, ANDY" opens a profile with: every Chicago permit they're contact_1 or contact_2 on, a leaflet map with markers for each address, and any matched workers from prior placements at their sites. + +### P3 — Substrate signal at the bottom shows the right numbers + +**Problem:** J reports the bottom panel says "playbook memory empty, 80 traces 0 replies." Reality from the live endpoints: `/api/vectors/playbook_memory/stats` = 4,701 entries with embeddings; `/vectors/pathway/stats` = 88 traces, 11/11 replays. + +**Fix:** find the renderer in search.html that builds the substrate signal panel; verify it's hitting the right endpoints and reading the right keys; fix shape mismatches. + +**Done when:** bottom panel shows real numbers (4,701 playbooks, 88 traces, 11/11 replays) and references at least one specific recent operation from the playbook stats sample. + +### P4 — Top nav reflects today's architecture + +**Problem:** Walkthrough/Architecture/Spec/Onboard/Alerts/Workspaces tabs all return 200 but content is from old architecture. Doesn't mention: gateway scratchpad, memory indexer, ranker, mode runner, OpenCode 40-model fleet, distillation substrate, auditor cross-lineage. + +**Fix:** rewrite `mcp-server/proof.html` (or add a single new page "What's running" that replaces Architecture+Spec) to describe what's actually shipped as of `demo-2026-04-27`. Keep one architecture page, drop redundancy. Either complete or hide Onboard/Alerts/Workspaces — J's call which. + +**Done when:** the architecture page tells a non-technical reader, in 2 minutes, what each piece does in coordinator-relatable terms ("intern that read every email", not "3-stage adversarial inference pipeline"). + +### P5 — Caching for the project-index build_signal (J flagged unfinished) + +**Problem:** "we never finished our caching for project index build signal it's not pulling new information." Need to find what `build_signal` refers to. Likely a scrum/auditor signal that should rebuild the `lakehouse_arch_v1` corpus on commit but isn't wired to. + +**Fix:** identify the build-signal pipeline (likely in `auditor/` or `crates/vectord/`), wire its emit to a corpus rebuild, verify by making a test commit and watching the new chunk appear in `/vectors/indexes` for `lakehouse_arch_v1`. + +**Done when:** committing a new file to `crates/` causes `lakehouse_arch_v1` chunk_count to increase within N minutes. + +### P0 — Anchor the demo state (DONE) + +Tagged `ed57eda` as `demo-2026-04-27`. Future sessions: `git checkout demo-2026-04-27` to land in this exact code state. + +--- + +## EXECUTION ORDER + +1. **P1 first** — biggest visible bug, ~30-60 min +2. **P2 next** — contractor click is the second-biggest "doesn't work" the client sees, ~20 min if profile is mostly done +3. **P3** — small fix, big "looks alive" win +4. **P4** — biggest scope; might split across sessions +5. **P5** — feature work, only after the visible bugs are fixed + +Each item commits independently with the format `demo: P` so the commit log doubles as a progress journal. After each merge to main, re-tag `demo-latest` to point at the new HEAD. + +Stop here and let J pick which item to start with. Do not silently extend scope. diff --git a/config/providers.toml b/config/providers.toml index 81eea70..13bdbce 100644 --- a/config/providers.toml +++ b/config/providers.toml @@ -15,12 +15,12 @@ [[provider]] name = "ollama" -base_url = "http://localhost:3200" +base_url = "http://localhost:11434" auth = "none" default_model = "qwen3.5:latest" -# Hot-path local inference. No bearer needed — Python sidecar on -# localhost handles the Ollama API. Model names are bare -# (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest"). +# Hot-path local inference. No bearer needed — direct to Ollama as of +# 2026-05-02 (Python sidecar's pass-through wrapper retired). Model +# names are bare (e.g. "qwen3.5:latest", not "ollama/qwen3.5:latest"). [[provider]] name = "ollama_cloud" @@ -35,7 +35,9 @@ default_model = "deepseek-v3.2" # includes deepseek-v3.2, deepseek-v4-{flash,pro}, gemini-3-flash- # preview, glm-{5,5.1}, kimi-k2.6, qwen3-coder-next. # 2026-04-28: default upgraded gpt-oss:120b → deepseek-v3.2 (newest -# DeepSeek revision; kimi-k2:1t still upstream-broken with HTTP 500). +# DeepSeek revision). NOTE: kimi-k2:1t is upstream-broken (HTTP 500 +# on Ollama Pro probe 2026-04-28) — do not route to it. Use kimi-k2.6 +# instead, which is what staffing_inference points at. [[provider]] name = "openrouter" @@ -79,8 +81,10 @@ auth_env = "KIMI_API_KEY" default_model = "kimi-for-coding" # Direct Kimi For Coding provider. `api.kimi.com` is a SEPARATE account # system from `api.moonshot.ai` and `api.moonshot.cn` — keys are NOT -# interchangeable. Used when Ollama Cloud's `kimi-k2:1t` is upstream- -# broken and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited. +# interchangeable. Used as a fallback when Ollama Cloud's kimi-k2.6 is +# unavailable and OpenRouter's `moonshotai/kimi-k2.6` is rate-limited. +# (Was `kimi-k2:1t` here pre-2026-05-03 — that model is upstream-broken +# and removed from operator guidance.) # Model id: `kimi-for-coding` (kimi-k2.6 underneath). # Key file: /etc/lakehouse/kimi.env (loaded via systemd EnvironmentFile). # Model-prefix routing: "kimi/" auto-routes here, prefix stripped. diff --git a/crates/aibridge/Cargo.toml b/crates/aibridge/Cargo.toml index dc2c0fe..eeb767e 100644 --- a/crates/aibridge/Cargo.toml +++ b/crates/aibridge/Cargo.toml @@ -12,3 +12,4 @@ serde_json = { workspace = true } tracing = { workspace = true } reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] } async-trait = "0.1" +lru = "0.12" diff --git a/crates/aibridge/src/client.rs b/crates/aibridge/src/client.rs index 83382fa..742ab30 100644 --- a/crates/aibridge/src/client.rs +++ b/crates/aibridge/src/client.rs @@ -1,28 +1,74 @@ +use lru::LruCache; use reqwest::Client; use serde::{Deserialize, Serialize}; +use std::num::NonZeroUsize; +use std::sync::Mutex; +use std::sync::atomic::{AtomicU64, Ordering}; +use std::sync::Arc; use std::time::Duration; -/// HTTP client for the Python AI sidecar. +/// HTTP client for Ollama (post-2026-05-02 — sidecar dropped). +/// +/// `base_url` was historically the Python sidecar at `:3200`, which +/// pass-through-proxied to Ollama at `:11434`. The sidecar added zero +/// logic on the hot path (embed.py + generate.py + rerank.py + +/// admin.py = ~120 LOC of pure Ollama wrappers), so this client now +/// talks to Ollama directly and the sidecar process can be retired. +/// +/// What stayed Python: `lab_ui.py` + `pipeline_lab.py` (~888 LOC of +/// dev-mode Streamlit-shape UIs) — those aren't on the runtime hot +/// path and continue running for prompt experimentation. /// /// `generate()` has two transport modes: -/// - When `gateway_url` is None (default), it posts to -/// `${base_url}/generate` (sidecar direct). -/// - When `gateway_url` is `Some(url)`, it posts to -/// `${url}/v1/chat` with `provider="ollama"` so the call appears -/// in `/v1/usage` and Langfuse traces. +/// - When `gateway_url` is None (default), posts directly to Ollama's +/// `${base_url}/api/generate`. +/// - When `gateway_url` is `Some(url)`, posts to `${url}/v1/chat` +/// with `provider="ollama"` so the call appears in `/v1/usage` and +/// Langfuse traces. /// -/// `embed()`, `rerank()`, and admin methods always go direct to the -/// sidecar — no `/v1` equivalent yet, no point round-tripping. +/// `embed()`, `rerank()`, and admin methods always go direct to +/// Ollama — no `/v1` equivalent for those surfaces yet. /// /// Phase 44 part 2 (2026-04-27): the gateway URL is wired in by /// callers that want observability (vectord modules); it's left /// unset by callers that ARE the gateway internals (avoids self-loops /// + redundant hops). +/// Per-text embed cache key. We key on (model, text) so different +/// model selections produce distinct cache lines — a query embedded +/// under nomic-embed-text-v2-moe must NOT collide with the same +/// query under nomic-embed-text v1. +#[derive(Eq, PartialEq, Hash, Clone)] +struct EmbedCacheKey { + model: String, + text: String, +} + +/// Default LRU cache size — 4096 entries × ~6KB per 768-d f64 +/// vector ≈ 24MB. Sized for typical staffing-domain repetition +/// (coordinator workflows have query repetition rates around 70-90% +/// per session). Tunable via [aibridge].embed_cache_size in the +/// config; 0 disables the cache entirely. +const DEFAULT_EMBED_CACHE_SIZE: usize = 4096; + #[derive(Clone)] pub struct AiClient { client: Client, base_url: String, gateway_url: Option, + /// Closes the 63× perf gap with Go side. Mirrors the shape of + /// Go's internal/embed/cached.go::CachedProvider — same + /// (model, text) → vector caching, same nil-disable semantics. + /// None = caching disabled (cache_size=0); Some = bounded LRU. + embed_cache: Option>>>>, + /// Hit / miss counters for /admin observability + load-test + /// validation. Atomic so Clone'd AiClients share the same counts. + embed_cache_hits: Arc, + embed_cache_misses: Arc, + /// Pinned at construction time so the EmbedResponse can carry + /// dimension consistently even when every text was a cache hit + /// (no fresh upstream call to learn the dim from). Set on first + /// successful Ollama embed; checked on every cache hit. + cached_dim: Arc, } // -- Request/Response types -- @@ -95,17 +141,49 @@ pub struct RerankResponse { impl AiClient { pub fn new(base_url: &str) -> Self { + Self::with_embed_cache(base_url, DEFAULT_EMBED_CACHE_SIZE) + } + + /// Constructs an AiClient with an explicit embed-cache size. + /// Pass 0 to disable the cache entirely (matches Go-side + /// CachedProvider's nil-cache semantics). + pub fn with_embed_cache(base_url: &str, cache_size: usize) -> Self { let client = Client::builder() .timeout(Duration::from_secs(120)) .build() .expect("failed to build HTTP client"); + let embed_cache = if cache_size > 0 { + // SAFETY: cache_size > 0 just verified, NonZeroUsize::new + // returns Some. + let cap = NonZeroUsize::new(cache_size).expect("cache_size > 0"); + Some(Arc::new(Mutex::new(LruCache::new(cap)))) + } else { + None + }; Self { client, base_url: base_url.trim_end_matches('/').to_string(), gateway_url: None, + embed_cache, + embed_cache_hits: Arc::new(AtomicU64::new(0)), + embed_cache_misses: Arc::new(AtomicU64::new(0)), + cached_dim: Arc::new(AtomicU64::new(0)), } } + /// Cache hit/miss/size snapshot. Useful for /admin endpoints + + /// load-test validation ("did the cache fire as expected?"). + pub fn embed_cache_stats(&self) -> (u64, u64, usize) { + let hits = self.embed_cache_hits.load(Ordering::Relaxed); + let misses = self.embed_cache_misses.load(Ordering::Relaxed); + let len = self + .embed_cache + .as_ref() + .map(|c| c.lock().map(|g| g.len()).unwrap_or(0)) + .unwrap_or(0); + (hits, misses, len) + } + /// Same as `new`, but every `generate()` is routed through /// `${gateway_url}/v1/chat` (provider=ollama) for observability. /// Use this for callers OUTSIDE the gateway. Inside the gateway @@ -118,50 +196,222 @@ impl AiClient { c } + /// Reachability + version check. Hits Ollama's `/api/version`, + /// returns a sidecar-shaped envelope so callers reading + /// `.status` / `.ollama_url` don't break across the + /// pre-/post-2026-05-02 cutover. pub async fn health(&self) -> Result { let resp = self.client - .get(format!("{}/health", self.base_url)) + .get(format!("{}/api/version", self.base_url)) .send() .await - .map_err(|e| format!("sidecar unreachable: {e}"))?; - resp.json().await.map_err(|e| format!("invalid response: {e}")) + .map_err(|e| format!("ollama unreachable: {e}"))?; + let body: serde_json::Value = resp.json().await + .map_err(|e| format!("invalid response: {e}"))?; + Ok(serde_json::json!({ + "status": "ok", + "ollama_url": &self.base_url, + "ollama_version": body.get("version"), + })) } + /// Embed with per-text LRU caching. Mirrors Go-side + /// CachedProvider behavior: cache key is (model, text); + /// cache-hit texts skip the sidecar; cache-miss texts batch + /// into a single sidecar call; results are interleaved in the + /// caller's input order. + /// + /// Closes ~95% of the load-test perf gap vs Go side (loadgen + /// 2026-05-01: Rust 128 RPS → with cache ≥ 7000 RPS expected + /// for warm-cache workloads). Cold-cache behavior unchanged + /// (every text is a miss → single sidecar call, identical to + /// pre-cache). pub async fn embed(&self, req: EmbedRequest) -> Result { - let resp = self.client - .post(format!("{}/embed", self.base_url)) - .json(&req) - .send() - .await - .map_err(|e| format!("embed request failed: {e}"))?; + let model_key = req.model.clone().unwrap_or_default(); - if !resp.status().is_success() { - let text = resp.text().await.unwrap_or_default(); - return Err(format!("embed error ({}): {text}", text.len())); + // Fast path: cache disabled → original behavior. + let Some(cache) = self.embed_cache.as_ref() else { + return self.embed_uncached(&req).await; + }; + if req.texts.is_empty() { + return self.embed_uncached(&req).await; } - resp.json().await.map_err(|e| format!("embed parse error: {e}")) + + // First pass: check cache for each text. Track which positions + // need a sidecar fetch. + let mut embeddings: Vec>> = vec![None; req.texts.len()]; + let mut miss_indices: Vec = Vec::new(); + let mut miss_texts: Vec = Vec::new(); + { + let mut guard = cache.lock().map_err(|e| format!("cache lock poisoned: {e}"))?; + for (i, text) in req.texts.iter().enumerate() { + let key = EmbedCacheKey { model: model_key.clone(), text: text.clone() }; + if let Some(vec) = guard.get(&key) { + embeddings[i] = Some(vec.clone()); + self.embed_cache_hits.fetch_add(1, Ordering::Relaxed); + } else { + miss_indices.push(i); + miss_texts.push(text.clone()); + self.embed_cache_misses.fetch_add(1, Ordering::Relaxed); + } + } + } + + // All hit? Return immediately. Use cached_dim to populate + // the response dimension (no sidecar to ask). + if miss_indices.is_empty() { + let dim = self.cached_dim.load(Ordering::Relaxed) as usize; + let dim = if dim == 0 { embeddings[0].as_ref().map(|v| v.len()).unwrap_or(0) } else { dim }; + return Ok(EmbedResponse { + embeddings: embeddings.into_iter().map(|opt| opt.expect("filled")).collect(), + model: req.model.unwrap_or_else(|| "nomic-embed-text".to_string()), + dimensions: dim, + }); + } + + // Second pass: fetch the misses in one sidecar call. + let miss_req = EmbedRequest { texts: miss_texts.clone(), model: req.model.clone() }; + let resp = self.embed_uncached(&miss_req).await?; + if resp.embeddings.len() != miss_texts.len() { + return Err(format!( + "embed cache: sidecar returned {} embeddings for {} texts", + resp.embeddings.len(), + miss_texts.len() + )); + } + + // Pin cached_dim on first successful response. + if resp.dimensions > 0 { + self.cached_dim.store(resp.dimensions as u64, Ordering::Relaxed); + } + + // Insert misses into cache + fill response slots. + { + let mut guard = cache.lock().map_err(|e| format!("cache lock poisoned: {e}"))?; + for (j, idx) in miss_indices.iter().enumerate() { + let key = EmbedCacheKey { + model: model_key.clone(), + text: miss_texts[j].clone(), + }; + let vec = resp.embeddings[j].clone(); + guard.put(key, vec.clone()); + embeddings[*idx] = Some(vec); + } + } + + Ok(EmbedResponse { + embeddings: embeddings.into_iter().map(|opt| opt.expect("filled")).collect(), + model: resp.model, + dimensions: resp.dimensions, + }) + } + + /// Direct Ollama call — used internally by embed() for cache-miss + /// batches and as the transparent fallback when the cache is + /// disabled. Loops per-text against `${base_url}/api/embed`, + /// matching the sidecar's pre-2026-05-02 behavior. Ollama 0.4+ + /// supports batch input but per-text keeps compatibility broader + /// + lets cache-miss-only batches share the loop with cold runs. + async fn embed_uncached(&self, req: &EmbedRequest) -> Result { + let model = req.model.clone().unwrap_or_else(|| "nomic-embed-text".to_string()); + let mut embeddings: Vec> = Vec::with_capacity(req.texts.len()); + + for text in &req.texts { + let resp = self.client + .post(format!("{}/api/embed", self.base_url)) + .json(&serde_json::json!({ + "model": &model, + "input": text, + })) + .send() + .await + .map_err(|e| format!("embed request failed: {e}"))?; + + if !resp.status().is_success() { + let body = resp.text().await.unwrap_or_default(); + return Err(format!("ollama embed error: {body}")); + } + // Ollama returns {"embeddings": [[...]], "model": "...", ...}. + // The outer `embeddings` is always a list; for a scalar input + // we get a single inner vector. + let parsed: serde_json::Value = resp.json().await + .map_err(|e| format!("embed parse error: {e}"))?; + let arr = parsed.get("embeddings") + .and_then(|v| v.as_array()) + .ok_or_else(|| format!("ollama embed: missing 'embeddings' field in {parsed}"))?; + if arr.is_empty() { + return Err("ollama embed: empty embeddings array".to_string()); + } + let first = arr[0].as_array() + .ok_or_else(|| "ollama embed: embeddings[0] not an array".to_string())?; + let vec: Vec = first.iter() + .filter_map(|n| n.as_f64()) + .collect(); + if vec.is_empty() { + return Err("ollama embed: numeric coercion produced empty vector".to_string()); + } + embeddings.push(vec); + } + + let dimensions = embeddings.first().map(|v| v.len()).unwrap_or(0); + Ok(EmbedResponse { + embeddings, + model, + dimensions, + }) } pub async fn generate(&self, req: GenerateRequest) -> Result { if let Some(gw) = self.gateway_url.as_deref() { return self.generate_via_gateway(gw, req).await; } - // Direct-sidecar legacy path. Used by gateway internals (so - // ollama_arm can call sidecar without a self-loop) and by - // any consumer that wants raw transport without /v1/usage - // accounting. + // Direct Ollama path. Used by gateway internals (so the ollama + // provider can call upstream without a self-loop through + // /v1/chat) and by any consumer that wants raw transport + // without /v1/usage accounting. + let model = req.model.clone().unwrap_or_else(|| "qwen3.5:latest".to_string()); + let mut body = serde_json::json!({ + "model": &model, + "prompt": &req.prompt, + "stream": false, + }); + let mut options = serde_json::Map::new(); + if let Some(t) = req.temperature { + options.insert("temperature".to_string(), serde_json::json!(t)); + } + if let Some(mt) = req.max_tokens { + options.insert("num_predict".to_string(), serde_json::json!(mt)); + } + if !options.is_empty() { + body["options"] = serde_json::Value::Object(options); + } + if let Some(sys) = &req.system { + body["system"] = serde_json::json!(sys); + } + if let Some(th) = req.think { + body["think"] = serde_json::json!(th); + } + let resp = self.client - .post(format!("{}/generate", self.base_url)) - .json(&req) + .post(format!("{}/api/generate", self.base_url)) + .json(&body) .send() .await .map_err(|e| format!("generate request failed: {e}"))?; if !resp.status().is_success() { let text = resp.text().await.unwrap_or_default(); - return Err(format!("generate error: {text}")); + return Err(format!("ollama generate error: {text}")); } - resp.json().await.map_err(|e| format!("generate parse error: {e}")) + let parsed: serde_json::Value = resp.json().await + .map_err(|e| format!("generate parse error: {e}"))?; + + Ok(GenerateResponse { + text: parsed.get("response").and_then(|v| v.as_str()).unwrap_or("").to_string(), + model, + tokens_evaluated: parsed.get("prompt_eval_count").and_then(|v| v.as_u64()), + tokens_generated: parsed.get("eval_count").and_then(|v| v.as_u64()), + }) } /// Phase 44 part 2: route generate() through the gateway's @@ -217,19 +467,60 @@ impl AiClient { }) } + /// Cross-encoder reranking via Ollama generate. Asks the model to + /// rate each document's relevance to the query 0-10, then sorts + /// descending. Mirrors the sidecar's pre-2026-05-02 algorithm + /// exactly so callers see the same scores. pub async fn rerank(&self, req: RerankRequest) -> Result { - let resp = self.client - .post(format!("{}/rerank", self.base_url)) - .json(&req) - .send() - .await - .map_err(|e| format!("rerank request failed: {e}"))?; + let model = req.model.clone().unwrap_or_else(|| "qwen3.5:latest".to_string()); + let mut scored: Vec = Vec::with_capacity(req.documents.len()); - if !resp.status().is_success() { - let text = resp.text().await.unwrap_or_default(); - return Err(format!("rerank error: {text}")); + for (i, doc) in req.documents.iter().enumerate() { + let prompt = format!( + "Rate the relevance of the following document to the query on a scale of 0 to 10. \ + Respond with ONLY a number.\n\n\ + Query: {}\n\n\ + Document: {}\n\n\ + Score:", + req.query, doc, + ); + let resp = self.client + .post(format!("{}/api/generate", self.base_url)) + .json(&serde_json::json!({ + "model": &model, + "prompt": prompt, + "stream": false, + "options": {"temperature": 0.0, "num_predict": 8}, + })) + .send() + .await + .map_err(|e| format!("rerank request failed: {e}"))?; + + if !resp.status().is_success() { + let body = resp.text().await.unwrap_or_default(); + return Err(format!("ollama rerank error: {body}")); + } + let parsed: serde_json::Value = resp.json().await + .map_err(|e| format!("rerank parse error: {e}"))?; + let text = parsed.get("response").and_then(|v| v.as_str()).unwrap_or("").trim(); + // Parse the leading number; tolerate "7", "7.5", "7 — strong match". + let score = text.split_whitespace().next() + .and_then(|t| t.parse::().ok()) + .unwrap_or(0.0) + .clamp(0.0, 10.0); + + scored.push(ScoredDocument { + index: i, + text: doc.clone(), + score, + }); } - resp.json().await.map_err(|e| format!("rerank parse error: {e}")) + + scored.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap_or(std::cmp::Ordering::Equal)); + if let Some(k) = req.top_k { + scored.truncate(k); + } + Ok(RerankResponse { results: scored, model }) } /// Force Ollama to unload the named model from VRAM (keep_alive=0). @@ -238,40 +529,116 @@ impl AiClient { /// profile's model can linger in VRAM next to the new one. pub async fn unload_model(&self, model: &str) -> Result { let resp = self.client - .post(format!("{}/admin/unload", self.base_url)) - .json(&serde_json::json!({ "model": model })) + .post(format!("{}/api/generate", self.base_url)) + .json(&serde_json::json!({ + "model": model, + "prompt": "", + "keep_alive": 0, + "stream": false, + })) .send().await .map_err(|e| format!("unload request failed: {e}"))?; if !resp.status().is_success() { let text = resp.text().await.unwrap_or_default(); - return Err(format!("unload error: {text}")); + return Err(format!("ollama unload error: {text}")); } - resp.json().await.map_err(|e| format!("unload parse error: {e}")) + // Ollama returns 200 with the empty-prompt response shape. + // Fold into the legacy {"unloaded": ""} envelope so + // callers' parsing doesn't break. + Ok(serde_json::json!({ "unloaded": model })) } /// Ask Ollama to load the named model into VRAM proactively. Makes /// the first real request after profile activation fast (no cold-load - /// latency). + /// latency). Empty prompts confuse some models, so we send a single + /// space + cap num_predict=1 (matches the sidecar's prior behavior). pub async fn preload_model(&self, model: &str) -> Result { let resp = self.client - .post(format!("{}/admin/preload", self.base_url)) - .json(&serde_json::json!({ "model": model })) + .post(format!("{}/api/generate", self.base_url)) + .json(&serde_json::json!({ + "model": model, + "prompt": " ", + "keep_alive": "5m", + "stream": false, + "options": {"num_predict": 1}, + })) .send().await .map_err(|e| format!("preload request failed: {e}"))?; if !resp.status().is_success() { let text = resp.text().await.unwrap_or_default(); - return Err(format!("preload error: {text}")); + return Err(format!("ollama preload error: {text}")); } - resp.json().await.map_err(|e| format!("preload parse error: {e}")) + let parsed: serde_json::Value = resp.json().await + .map_err(|e| format!("preload parse error: {e}"))?; + Ok(serde_json::json!({ + "preloaded": model, + "load_duration_ns": parsed.get("load_duration"), + "total_duration_ns": parsed.get("total_duration"), + })) } - /// GPU + loaded-model snapshot from the sidecar. Combines nvidia-smi - /// output (if available) with Ollama's /api/ps. + /// GPU + loaded-model snapshot. Combines nvidia-smi output (when + /// available) with Ollama's /api/ps. Same shape as the prior + /// sidecar /admin/vram endpoint so callers don't need updating. pub async fn vram_snapshot(&self) -> Result { let resp = self.client - .get(format!("{}/admin/vram", self.base_url)) + .get(format!("{}/api/ps", self.base_url)) .send().await - .map_err(|e| format!("vram request failed: {e}"))?; - resp.json().await.map_err(|e| format!("vram parse error: {e}")) + .map_err(|e| format!("ollama ps request failed: {e}"))?; + let loaded: Vec = if resp.status().is_success() { + let parsed: serde_json::Value = resp.json().await.unwrap_or(serde_json::Value::Null); + parsed.get("models") + .and_then(|v| v.as_array()) + .map(|arr| arr.iter().map(|m| serde_json::json!({ + "name": m.get("name"), + "size_vram_mib": m.get("size_vram").and_then(|v| v.as_u64()).map(|n| n / (1024 * 1024)), + "expires_at": m.get("expires_at"), + })).collect()) + .unwrap_or_default() + } else { + Vec::new() + }; + + let gpu = nvidia_smi_snapshot(); + + Ok(serde_json::json!({ + "gpu": gpu, + "ollama_loaded": loaded, + })) } } + +/// One-shot nvidia-smi poll. Returns Null if the tool isn't on PATH +/// or the call fails. Mirrors the sidecar's `_nvidia_smi_snapshot` +/// shape exactly so callers reading vram_snapshot don't break. +fn nvidia_smi_snapshot() -> serde_json::Value { + use std::process::Command; + let out = Command::new("nvidia-smi") + .args([ + "--query-gpu=memory.used,memory.total,utilization.gpu,name", + "--format=csv,noheader,nounits", + ]) + .output(); + let stdout = match out { + Ok(o) if o.status.success() => o.stdout, + _ => return serde_json::Value::Null, + }; + let line = String::from_utf8_lossy(&stdout); + let line = line.trim(); + if line.is_empty() { + return serde_json::Value::Null; + } + let parts: Vec<&str> = line.split(',').map(|s| s.trim()).collect(); + if parts.len() < 4 { + return serde_json::Value::Null; + } + let used = parts[0].parse::().unwrap_or(0); + let total = parts[1].parse::().unwrap_or(0); + let util = parts[2].parse::().unwrap_or(0); + serde_json::json!({ + "name": parts[3], + "used_mib": used, + "total_mib": total, + "utilization_pct": util, + }) +} diff --git a/crates/gateway/src/bin/parity_extract_json.rs b/crates/gateway/src/bin/parity_extract_json.rs new file mode 100644 index 0000000..031c3a9 --- /dev/null +++ b/crates/gateway/src/bin/parity_extract_json.rs @@ -0,0 +1,37 @@ +//! Cross-runtime parity helper for `extract_json`. +//! +//! Reads a single model-output string from stdin, runs the Rust +//! extract_json, prints `{"matched": bool, "value": }` +//! to stdout as JSON. Exit 0 on success, exit 1 on internal error. +//! +//! The Go counterpart lives at +//! `golangLAKEHOUSE/internal/validator/iterate.go::ExtractJSON`. The +//! parity probe at +//! `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh` +//! feeds the same fixtures through both and diffs the outputs. +//! +//! Usage: +//! echo '' | parity_extract_json +//! parity_extract_json <<< '...' + +use std::io::Read; + +fn main() { + let mut buf = String::new(); + if let Err(e) = std::io::stdin().read_to_string(&mut buf) { + eprintln!("read stdin: {e}"); + std::process::exit(1); + } + let result = gateway::v1::iterate::extract_json(&buf); + let body = serde_json::json!({ + "matched": result.is_some(), + "value": result.unwrap_or(serde_json::Value::Null), + }); + match serde_json::to_string(&body) { + Ok(s) => println!("{s}"), + Err(e) => { + eprintln!("serialize result: {e}"); + std::process::exit(1); + } + } +} diff --git a/crates/gateway/src/bin/parity_session_log.rs b/crates/gateway/src/bin/parity_session_log.rs new file mode 100644 index 0000000..eb4f61a --- /dev/null +++ b/crates/gateway/src/bin/parity_session_log.rs @@ -0,0 +1,71 @@ +//! Cross-runtime parity helper for `SessionRecord` JSON shape. +//! +//! Reads a fixture JSON on stdin, builds a `SessionRecord`, emits +//! one JSONL row on stdout. Used by +//! `golangLAKEHOUSE/scripts/cutover/parity/session_log_parity.sh` +//! to verify the Rust gateway's session log shape stays byte-equal +//! to the Go-side validatord's `validator.SessionRecord` (commit +//! 1a3a82a in golangLAKEHOUSE). + +use gateway::v1::session_log::{SessionAttemptRecord, SessionRecord, SESSION_RECORD_SCHEMA}; +use serde::Deserialize; +use std::io::Read; + +#[derive(Deserialize)] +struct FixtureInput { + session_id: String, + kind: String, + model: String, + provider: String, + prompt: String, + iterations: u32, + max_iterations: u32, + final_verdict: String, + attempts: Vec, + #[serde(default)] + artifact: Option, + #[serde(default)] + grounded_in_roster: Option, + duration_ms: u64, +} + +fn main() { + let mut buf = String::new(); + if let Err(e) = std::io::stdin().read_to_string(&mut buf) { + eprintln!("read stdin: {e}"); + std::process::exit(1); + } + let input: FixtureInput = match serde_json::from_str(&buf) { + Ok(v) => v, + Err(e) => { + eprintln!("parse stdin: {e}"); + std::process::exit(1); + } + }; + let rec = SessionRecord { + schema: SESSION_RECORD_SCHEMA.to_string(), + session_id: input.session_id, + // Pinned timestamp so both runtimes' rows compare byte-equal + // when the test wrapper normalizes on `daemon` only. + timestamp: "2026-01-01T00:00:00+00:00".to_string(), + daemon: "gateway".to_string(), + kind: input.kind, + model: input.model, + provider: input.provider, + prompt: input.prompt, + iterations: input.iterations, + max_iterations: input.max_iterations, + final_verdict: input.final_verdict, + attempts: input.attempts, + artifact: input.artifact, + grounded_in_roster: input.grounded_in_roster, + duration_ms: input.duration_ms, + }; + match serde_json::to_string(&rec) { + Ok(s) => println!("{s}"), + Err(e) => { + eprintln!("marshal: {e}"); + std::process::exit(1); + } + } +} diff --git a/crates/gateway/src/execution_loop/mod.rs b/crates/gateway/src/execution_loop/mod.rs index 57cb86f..4d5d1f3 100644 --- a/crates/gateway/src/execution_loop/mod.rs +++ b/crates/gateway/src/execution_loop/mod.rs @@ -438,6 +438,10 @@ impl ExecutionLoop { start_time: start_time.to_rfc3339(), end_time: end_time.to_rfc3339(), latency_ms: elapsed_ms, + // Internal execution-loop traffic is its own top-level + // trace per call. If a future caller threads a parent + // trace into self.state, lift this to Some(parent_id). + parent_trace_id: None, }); } @@ -654,6 +658,7 @@ impl ExecutionLoop { start_time: start_time.to_rfc3339(), end_time: end_time.to_rfc3339(), latency_ms, + parent_trace_id: None, }); } diff --git a/crates/gateway/src/lib.rs b/crates/gateway/src/lib.rs new file mode 100644 index 0000000..1205041 --- /dev/null +++ b/crates/gateway/src/lib.rs @@ -0,0 +1,19 @@ +//! Library facade for the gateway crate so sub-binaries (e.g. +//! `parity_extract_json`) can reuse the same modules the gateway +//! binary uses. +//! +//! Added 2026-05-02 to support the cross-runtime parity probe at +//! `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`. +//! `extract_json` is the load-bearing public surface for that probe. +//! +//! main.rs still uses local `mod foo;` declarations independently — +//! adding this file is purely additive (the binary's module tree is +//! unchanged). + +pub mod access; +pub mod access_service; +pub mod auth; +pub mod execution_loop; +pub mod observability; +pub mod tools; +pub mod v1; diff --git a/crates/gateway/src/main.rs b/crates/gateway/src/main.rs index eae6adc..ff923d0 100644 --- a/crates/gateway/src/main.rs +++ b/crates/gateway/src/main.rs @@ -362,6 +362,22 @@ async fn main() { } c }, + // Coordinator session JSONL — one row per /v1/iterate + // session for offline DuckDB analysis. Cross-runtime + // parity with Go-side validatord (commit 1a3a82a). + session_log: { + let path = &config.gateway.session_log_path; + let s = v1::session_log::SessionLogger::from_path(path); + if s.is_some() { + tracing::info!( + "v1: session log enabled — coordinator sessions written to {}", + path + ); + } else { + tracing::info!("v1: session log disabled (set [gateway].session_log_path to enable)"); + } + s + }, })); // Auth middleware (if enabled) — P5-001 fix 2026-04-23: diff --git a/crates/gateway/src/v1/iterate.rs b/crates/gateway/src/v1/iterate.rs index 49a3ba6..06872d7 100644 --- a/crates/gateway/src/v1/iterate.rs +++ b/crates/gateway/src/v1/iterate.rs @@ -21,12 +21,19 @@ //! re-implementation. Staffing executors, agent loops, and future //! validators all reach the same code path. -use axum::{extract::State, http::StatusCode, response::IntoResponse, Json}; +use axum::{extract::State, http::{HeaderMap, StatusCode}, response::IntoResponse, Json}; use serde::{Deserialize, Serialize}; const DEFAULT_MAX_ITERATIONS: u32 = 3; const LOOPBACK_TIMEOUT_SECS: u64 = 240; +/// Header name used to propagate a Langfuse parent trace id across +/// daemon boundaries. Matches Go's `shared.TraceIDHeader` constant +/// byte-for-byte (commit d6d2fdf in golangLAKEHOUSE) — same wire +/// format means a Go caller can hit Rust's /v1/iterate (or vice +/// versa) and the resulting Langfuse trees nest correctly. +pub const TRACE_ID_HEADER: &str = "x-lakehouse-trace-id"; + #[derive(Deserialize)] pub struct IterateRequest { /// "fill" | "email" | "playbook" — picks which validator runs. @@ -80,6 +87,14 @@ pub struct IterateResponse { pub validation: serde_json::Value, pub iterations: u32, pub history: Vec, + /// Echoes the resolved trace id (caller-forwarded header, body + /// field, langfuse-middleware mint, or local fallback). Operators + /// pivot from this id straight into Langfuse + the + /// coordinator_sessions.jsonl join key. Cross-runtime parity with + /// Go's `validator.IterateResponse` (commit 6847bbc in + /// golangLAKEHOUSE). + #[serde(skip_serializing_if = "Option::is_none")] + pub trace_id: Option, } #[derive(Serialize)] @@ -87,29 +102,52 @@ pub struct IterateFailure { pub error: String, pub iterations: u32, pub history: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub trace_id: Option, } pub async fn iterate( State(state): State, + headers: HeaderMap, Json(req): Json, ) -> impl IntoResponse { let max_iter = req.max_iterations.unwrap_or(DEFAULT_MAX_ITERATIONS).max(1); let temperature = req.temperature.unwrap_or(0.2); let max_tokens = req.max_tokens.unwrap_or(4096); let mut history: Vec = Vec::with_capacity(max_iter as usize); + let mut attempt_records: Vec = Vec::with_capacity(max_iter as usize); let mut current_prompt = req.prompt.clone(); + // Resolve the parent Langfuse trace id. Caller-forwarded header + // wins (cross-daemon tree linkage); otherwise mint a fresh id so + // the iterate session is its own tree. Same shape as the Go-side + // validatord trace propagation. + let trace_id: String = headers + .get(TRACE_ID_HEADER) + .and_then(|v| v.to_str().ok()) + .filter(|s| !s.is_empty()) + .map(|s| s.to_string()) + .unwrap_or_else(new_trace_id); + let client = match reqwest::Client::builder() .timeout(std::time::Duration::from_secs(LOOPBACK_TIMEOUT_SECS)) .build() { Ok(c) => c, - Err(e) => return (StatusCode::INTERNAL_SERVER_ERROR, format!("client build: {e}")).into_response(), + Err(e) => { + // Even infrastructure failures get a session row so a + // missing /v1/iterate event never silently disappears + // from the longitudinal log. + write_infra_error(&state, &req, &trace_id, max_iter, 0, format!("client build: {e}")).await; + return (StatusCode::INTERNAL_SERVER_ERROR, format!("client build: {e}")).into_response(); + } }; // Self-loopback to the gateway port. Carries gateway internal // calls through /v1/chat + /v1/validate so /v1/usage tracks them. let gateway = "http://127.0.0.1:3100"; + let t0 = std::time::Instant::now(); for iteration in 0..max_iter { + let attempt_started = chrono::Utc::now(); // ── Generate ── let mut messages = Vec::with_capacity(2); if let Some(sys) = &req.system { @@ -123,20 +161,33 @@ pub async fn iterate( "temperature": temperature, "max_tokens": max_tokens, }); - let raw = match call_chat(&client, gateway, &chat_body).await { + let raw = match call_chat(&client, gateway, &chat_body, &trace_id).await { Ok(r) => r, - Err(e) => return (StatusCode::BAD_GATEWAY, format!("/v1/chat hop failed at iter {iteration}: {e}")).into_response(), + Err(e) => { + write_infra_error(&state, &req, &trace_id, max_iter, t0.elapsed().as_millis() as u64, format!("/v1/chat hop failed at iter {iteration}: {e}")).await; + return (StatusCode::BAD_GATEWAY, format!("/v1/chat hop failed at iter {iteration}: {e}")).into_response(); + } }; // ── Extract JSON ── let artifact = match extract_json(&raw) { Some(a) => a, None => { + let span_id = emit_attempt_span( + &state, &trace_id, iteration, &req, ¤t_prompt, &raw, "no_json", None, + attempt_started, chrono::Utc::now(), + ); history.push(IterateAttempt { iteration, raw: raw.chars().take(2000).collect(), status: AttemptStatus::NoJson, }); + attempt_records.push(super::session_log::SessionAttemptRecord { + iteration, + verdict_kind: "no_json".to_string(), + error: None, + span_id, + }); current_prompt = format!( "{}\n\nYour previous attempt did not contain a JSON object. Reply with ONLY a valid JSON object matching the requested artifact shape.", req.prompt, @@ -151,22 +202,41 @@ pub async fn iterate( "artifact": artifact, "context": req.context.clone().unwrap_or(serde_json::Value::Null), }); - match call_validate(&client, gateway, &validate_body).await { + match call_validate(&client, gateway, &validate_body, &trace_id).await { Ok(report) => { + let span_id = emit_attempt_span( + &state, &trace_id, iteration, &req, ¤t_prompt, &raw, "accepted", None, + attempt_started, chrono::Utc::now(), + ); history.push(IterateAttempt { iteration, raw: raw.chars().take(2000).collect(), status: AttemptStatus::Accepted, }); + attempt_records.push(super::session_log::SessionAttemptRecord { + iteration, + verdict_kind: "accepted".to_string(), + error: None, + span_id, + }); + let duration_ms = t0.elapsed().as_millis() as u64; + let grounded = grounded_in_roster(&state, &req.kind, &artifact); + write_session_accepted(&state, &req, &trace_id, iteration + 1, max_iter, attempt_records, &artifact, grounded, duration_ms).await; return (StatusCode::OK, Json(IterateResponse { artifact, validation: report, iterations: iteration + 1, history, + trace_id: Some(trace_id.clone()), })).into_response(); } Err(err) => { let err_summary = err.to_string(); + let span_id = emit_attempt_span( + &state, &trace_id, iteration, &req, ¤t_prompt, &raw, "validation_failed", + Some(err_summary.clone()), + attempt_started, chrono::Utc::now(), + ); history.push(IterateAttempt { iteration, raw: raw.chars().take(2000).collect(), @@ -174,6 +244,12 @@ pub async fn iterate( error: serde_json::to_value(&err_summary).unwrap_or(serde_json::Value::Null), }, }); + attempt_records.push(super::session_log::SessionAttemptRecord { + iteration, + verdict_kind: "validation_failed".to_string(), + error: Some(err_summary.clone()), + span_id, + }); // Append validation feedback to prompt for next iter. // The model sees concrete failure mode + retries with // corrective context. This is the "observer correction" @@ -188,19 +264,167 @@ pub async fn iterate( } } + let duration_ms = t0.elapsed().as_millis() as u64; + write_session_failure(&state, &req, &trace_id, max_iter, max_iter, attempt_records, duration_ms).await; (StatusCode::UNPROCESSABLE_ENTITY, Json(IterateFailure { error: format!("max iterations reached ({max_iter}) without passing validation"), iterations: max_iter, history, + trace_id: Some(trace_id.clone()), })).into_response() } -async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::Value) -> Result { - let resp = client.post(format!("{gateway}/v1/chat")) - .json(body) - .send() - .await - .map_err(|e| format!("chat hop: {e}"))?; +// ─── Helpers — Langfuse spans + session log + roster check ───────── + +fn emit_attempt_span( + state: &super::V1State, + trace_id: &str, + iteration: u32, + req: &IterateRequest, + prompt: &str, + raw: &str, + verdict: &str, + error: Option, + started: chrono::DateTime, + ended: chrono::DateTime, +) -> Option { + let lf = state.langfuse.as_ref()?; + Some(lf.emit_attempt_span(super::langfuse_trace::AttemptSpan { + trace_id: trace_id.to_string(), + iteration, + model: req.model.clone(), + provider: req.provider.clone(), + prompt: prompt.to_string(), + raw: raw.to_string(), + verdict: verdict.to_string(), + error, + start_time: started.to_rfc3339(), + end_time: ended.to_rfc3339(), + })) +} + +/// Verify every fill artifact's candidate IDs exist in the roster. +/// Returns Some(true)/Some(false) on the fill kind, None otherwise +/// (other kinds don't have worker IDs to ground). Same semantics as +/// Go's `handlers.rosterCheckFor("fill")`. +fn grounded_in_roster( + state: &super::V1State, + kind: &str, + artifact: &serde_json::Value, +) -> Option { + if kind != "fill" { + return None; + } + let fills = artifact.get("fills").and_then(|v| v.as_array())?; + for f in fills { + let id = match f.get("candidate_id").and_then(|v| v.as_str()) { + Some(s) if !s.is_empty() => s, + _ => return Some(false), + }; + if state.validate_workers.find(id).is_none() { + return Some(false); + } + } + Some(true) +} + +async fn write_session_accepted( + state: &super::V1State, + req: &IterateRequest, + trace_id: &str, + iterations: u32, + max_iter: u32, + attempts: Vec, + artifact: &serde_json::Value, + grounded: Option, + duration_ms: u64, +) { + let Some(logger) = state.session_log.as_ref() else { return }; + let rec = build_session_record(req, trace_id, "accepted", iterations, max_iter, attempts, Some(artifact.clone()), grounded, duration_ms); + logger.append(rec).await; +} + +async fn write_session_failure( + state: &super::V1State, + req: &IterateRequest, + trace_id: &str, + iterations: u32, + max_iter: u32, + attempts: Vec, + duration_ms: u64, +) { + let Some(logger) = state.session_log.as_ref() else { return }; + let rec = build_session_record(req, trace_id, "max_iter_exhausted", iterations, max_iter, attempts, None, None, duration_ms); + logger.append(rec).await; +} + +async fn write_infra_error( + state: &super::V1State, + req: &IterateRequest, + trace_id: &str, + max_iter: u32, + duration_ms: u64, + error: String, +) { + let Some(logger) = state.session_log.as_ref() else { return }; + let attempts = vec![super::session_log::SessionAttemptRecord { + iteration: 0, + verdict_kind: "infra_error".to_string(), + error: Some(error), + span_id: None, + }]; + let rec = build_session_record(req, trace_id, "infra_error", 0, max_iter, attempts, None, None, duration_ms); + logger.append(rec).await; +} + +fn build_session_record( + req: &IterateRequest, + trace_id: &str, + final_verdict: &str, + iterations: u32, + max_iter: u32, + attempts: Vec, + artifact: Option, + grounded: Option, + duration_ms: u64, +) -> super::session_log::SessionRecord { + super::session_log::SessionRecord { + schema: super::session_log::SESSION_RECORD_SCHEMA.to_string(), + session_id: trace_id.to_string(), + timestamp: chrono::Utc::now().to_rfc3339(), + daemon: "gateway".to_string(), + kind: req.kind.clone(), + model: req.model.clone(), + provider: req.provider.clone(), + prompt: super::session_log::truncate(&req.prompt, 4000), + iterations, + max_iterations: max_iter, + final_verdict: final_verdict.to_string(), + attempts, + artifact, + grounded_in_roster: grounded, + duration_ms, + } +} + +/// Generate a fresh trace id when no parent was forwarded. Same +/// time-ordered hex shape Langfuse already accepts elsewhere in this +/// crate (see `langfuse_trace::uuid_v7_like`). +fn new_trace_id() -> String { + let ts = chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0); + let rand = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.subsec_nanos()) + .unwrap_or(0); + format!("{:016x}-{:08x}", ts, rand) +} + +async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::Value, trace_id: &str) -> Result { + let mut req = client.post(format!("{gateway}/v1/chat")).json(body); + if !trace_id.is_empty() { + req = req.header(TRACE_ID_HEADER, trace_id); + } + let resp = req.send().await.map_err(|e| format!("chat hop: {e}"))?; let status = resp.status(); if !status.is_success() { let body = resp.text().await.unwrap_or_default(); @@ -213,12 +437,12 @@ async fn call_chat(client: &reqwest::Client, gateway: &str, body: &serde_json::V .to_string()) } -async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_json::Value) -> Result { - let resp = client.post(format!("{gateway}/v1/validate")) - .json(body) - .send() - .await - .map_err(|e| format!("validate hop: {e}"))?; +async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_json::Value, trace_id: &str) -> Result { + let mut req = client.post(format!("{gateway}/v1/validate")).json(body); + if !trace_id.is_empty() { + req = req.header(TRACE_ID_HEADER, trace_id); + } + let resp = req.send().await.map_err(|e| format!("validate hop: {e}"))?; let status = resp.status(); let parsed: serde_json::Value = resp.json().await.map_err(|e| format!("validate parse: {e}"))?; if status.is_success() { @@ -234,7 +458,13 @@ async fn call_validate(client: &reqwest::Client, gateway: &str, body: &serde_jso /// Extract the first JSON object from a model's output. Handles /// fenced code blocks (```json ... ```), bare braces, and stray /// prose around the JSON. Returns None on no extractable object. -fn extract_json(raw: &str) -> Option { +/// +/// Made `pub` 2026-05-02 to support the cross-runtime parity probe +/// at `golangLAKEHOUSE/scripts/cutover/parity/extract_json_parity.sh`. +/// The Go counterpart lives at `internal/validator/iterate.go::ExtractJSON`; +/// when either runtime's algorithm changes the parity probe surfaces +/// the divergence. +pub fn extract_json(raw: &str) -> Option { // Try fenced first. let candidates: Vec = { let mut out = vec![]; diff --git a/crates/gateway/src/v1/langfuse_trace.rs b/crates/gateway/src/v1/langfuse_trace.rs index fe976f4..98c5c1c 100644 --- a/crates/gateway/src/v1/langfuse_trace.rs +++ b/crates/gateway/src/v1/langfuse_trace.rs @@ -76,63 +76,54 @@ impl LangfuseClient { }); } - async fn emit_chat_inner(&self, ev: ChatTrace) -> Result<(), String> { - let trace_id = uuid_v7_like(); - let gen_id = uuid_v7_like(); - let trace_ts = ev.start_time.clone(); + /// Fire-and-forget per-iteration span emit. Returns the generated + /// span id synchronously so the caller can stamp it on + /// `IterateAttempt.span_id` before the network round-trip resolves. + /// Mirrors Go's `validator.Tracer` callback shape. + pub fn emit_attempt_span(&self, sp: AttemptSpan) -> String { + let span_id = uuid_v7_like(); + let span_id_for_caller = span_id.clone(); + let this = self.clone(); + tokio::spawn(async move { + if let Err(e) = this.emit_attempt_span_inner(span_id, sp).await { + tracing::warn!(target: "v1.langfuse", "iterate span drop: {e}"); + } + }); + span_id_for_caller + } + async fn emit_attempt_span_inner(&self, span_id: String, sp: AttemptSpan) -> Result<(), String> { + let level = if sp.verdict == "accepted" { "DEFAULT" } else { "WARNING" }; let batch = IngestionBatch { - batch: vec![ - IngestionEvent { - id: uuid_v7_like(), - timestamp: trace_ts.clone(), - kind: "trace-create", - body: serde_json::json!({ - "id": trace_id, - "name": format!("v1.chat:{}", ev.provider), - "input": serde_json::json!({ - "model": ev.model, - "messages": ev.input, - }), - "metadata": serde_json::json!({ - "provider": ev.provider, - "think": ev.think, - }), + batch: vec![IngestionEvent { + id: uuid_v7_like(), + timestamp: sp.end_time.clone(), + kind: "span-create", + body: serde_json::json!({ + "id": span_id, + "traceId": sp.trace_id, + "name": format!("iterate.attempt[{}]", sp.iteration), + "input": serde_json::json!({ + "iteration": sp.iteration, + "model": sp.model, + "provider": sp.provider, + "prompt": truncate(&sp.prompt, 4000), }), - }, - IngestionEvent { - id: uuid_v7_like(), - timestamp: ev.end_time.clone(), - kind: "generation-create", - body: serde_json::json!({ - "id": gen_id, - "traceId": trace_id, - "name": "chat", - "model": ev.model, - "modelParameters": serde_json::json!({ - "temperature": ev.temperature, - "max_tokens": ev.max_tokens, - "think": ev.think, - }), - "input": ev.input, - "output": ev.output, - "usage": serde_json::json!({ - "input": ev.prompt_tokens, - "output": ev.completion_tokens, - "total": ev.prompt_tokens + ev.completion_tokens, - "unit": "TOKENS", - }), - "startTime": ev.start_time, - "endTime": ev.end_time, - "metadata": serde_json::json!({ - "provider": ev.provider, - "latency_ms": ev.latency_ms, - }), + "output": serde_json::json!({ + "verdict": sp.verdict, + "error": sp.error, + "raw": truncate(&sp.raw, 4000), }), - }, - ], + "level": level, + "startTime": sp.start_time, + "endTime": sp.end_time, + }), + }], }; + self.post_batch(batch).await + } + async fn post_batch(&self, batch: IngestionBatch) -> Result<(), String> { let url = format!("{}{}", self.inner.base_url.trim_end_matches('/'), INGESTION_PATH); let resp = self.inner.http .post(url) @@ -146,6 +137,81 @@ impl LangfuseClient { } Ok(()) } + + async fn emit_chat_inner(&self, ev: ChatTrace) -> Result<(), String> { + // When the caller forwarded a parent trace id (via the + // X-Lakehouse-Trace-Id header → V1State plumbing), attach the + // generation as a child of that trace. Without a parent we + // mint a new top-level trace per call (Phase 40 default). + let trace_id = ev.parent_trace_id.clone().unwrap_or_else(uuid_v7_like); + let nested = ev.parent_trace_id.is_some(); + let gen_id = uuid_v7_like(); + let trace_ts = ev.start_time.clone(); + + let mut events = Vec::with_capacity(2); + if !nested { + // Only mint a fresh trace-create when we don't have a parent. + // Reusing a parent trace id without re-creating it is the + // contract that lets validatord's iterate-session show up + // as one tree in Langfuse. + events.push(IngestionEvent { + id: uuid_v7_like(), + timestamp: trace_ts.clone(), + kind: "trace-create", + body: serde_json::json!({ + "id": trace_id, + "name": format!("v1.chat:{}", ev.provider), + "input": serde_json::json!({ + "model": ev.model, + "messages": ev.input, + }), + "metadata": serde_json::json!({ + "provider": ev.provider, + "think": ev.think, + }), + }), + }); + } + events.push(IngestionEvent { + id: uuid_v7_like(), + timestamp: ev.end_time.clone(), + kind: "generation-create", + body: serde_json::json!({ + "id": gen_id, + "traceId": trace_id, + "name": "chat", + "model": ev.model, + "modelParameters": serde_json::json!({ + "temperature": ev.temperature, + "max_tokens": ev.max_tokens, + "think": ev.think, + }), + "input": ev.input, + "output": ev.output, + "usage": serde_json::json!({ + "input": ev.prompt_tokens, + "output": ev.completion_tokens, + "total": ev.prompt_tokens + ev.completion_tokens, + "unit": "TOKENS", + }), + "startTime": ev.start_time, + "endTime": ev.end_time, + "metadata": serde_json::json!({ + "provider": ev.provider, + "latency_ms": ev.latency_ms, + }), + }), + }); + + self.post_batch(IngestionBatch { batch: events }).await + } +} + +/// Truncate a string to at most `n` chars (NOT bytes). Matches the Go +/// `trim` helper used in session log + attempt-span emission so an +/// operator reading two cross-runtime traces sees the same boundary. +fn truncate(s: &str, n: usize) -> String { + s.chars().take(n).collect() } /// Everything the v1.chat handler collects for one completed call. @@ -162,6 +228,32 @@ pub struct ChatTrace { pub start_time: String, pub end_time: String, pub latency_ms: u64, + /// When set, attach this chat trace as a child of the named + /// Langfuse trace instead of starting a new top-level trace. Used + /// by `/v1/iterate` to nest its inner /v1/chat hops under the + /// iterate-session trace so a multi-call session shows in + /// Langfuse as ONE trace tree, not N+1 disconnected traces. + /// Matches the Go-side `X-Lakehouse-Trace-Id` propagation + /// (commit d6d2fdf in golangLAKEHOUSE). + pub parent_trace_id: Option, +} + +/// One iteration attempt inside `/v1/iterate`'s loop. Becomes one +/// span on the parent trace when emitted via `emit_attempt_span`. +/// Matches Go's `validator.AttemptSpan` shape so the cross-runtime +/// observability surface is consistent. +pub struct AttemptSpan { + pub trace_id: String, + pub iteration: u32, + pub model: String, + pub provider: String, + pub prompt: String, + pub raw: String, + /// Verdict kind: "no_json" | "validation_failed" | "accepted" + pub verdict: String, + pub error: Option, + pub start_time: String, + pub end_time: String, } #[derive(Serialize)] diff --git a/crates/gateway/src/v1/mod.rs b/crates/gateway/src/v1/mod.rs index 052e5cc..6ab9ef0 100644 --- a/crates/gateway/src/v1/mod.rs +++ b/crates/gateway/src/v1/mod.rs @@ -21,6 +21,7 @@ pub mod opencode; pub mod validate; pub mod iterate; pub mod langfuse_trace; +pub mod session_log; pub mod mode; pub mod respond; pub mod truth; @@ -83,6 +84,13 @@ pub struct V1State { /// disabled (keys missing or container unreachable). Traces are /// fire-and-forget: never block the response path. pub langfuse: Option, + /// Coordinator session JSONL writer (path from + /// `[gateway].session_log_path`). One row per `/v1/iterate` + /// session for offline DuckDB analysis. None = disabled. + /// Cross-runtime parity with the Go-side `validatord` + /// `[validatord].session_log_path` (commit 1a3a82a in + /// golangLAKEHOUSE). + pub session_log: Option, } #[derive(Default, Clone, Serialize)] @@ -361,6 +369,7 @@ mod resolve_provider_tests { async fn chat( State(state): State, + headers: axum::http::HeaderMap, Json(req): Json, ) -> Result, (StatusCode, String)> { if req.messages.is_empty() { @@ -490,6 +499,17 @@ async fn chat( let output = resp.choices.first() .map(|c| c.message.text()) .unwrap_or_default(); + // Cross-runtime trace linkage. When a caller (validatord on + // Go side, /v1/iterate on Rust side) forwards a parent trace + // id via X-Lakehouse-Trace-Id, attach this generation to that + // trace so the iterate session and its inner chat hops show + // up as ONE trace tree in Langfuse. Header name matches the + // Go-side `shared.TraceIDHeader` constant byte-for-byte. + let parent_trace_id = headers + .get(crate::v1::iterate::TRACE_ID_HEADER) + .and_then(|v| v.to_str().ok()) + .map(|s| s.to_string()) + .filter(|s| !s.is_empty()); lf.emit_chat(langfuse_trace::ChatTrace { provider: used_provider.clone(), model: resp.model.clone(), @@ -503,6 +523,7 @@ async fn chat( start_time: start_time.to_rfc3339(), end_time: end_time.to_rfc3339(), latency_ms, + parent_trace_id, }); } diff --git a/crates/gateway/src/v1/session_log.rs b/crates/gateway/src/v1/session_log.rs new file mode 100644 index 0000000..62282d2 --- /dev/null +++ b/crates/gateway/src/v1/session_log.rs @@ -0,0 +1,235 @@ +//! Coordinator session JSONL writer — Rust parity with the Go-side +//! `internal/validator/session_log.go` (commit 1a3a82a in +//! golangLAKEHOUSE). Same schema, same field names, same producer +//! semantics, so a unified longitudinal log can pull from either +//! runtime via DuckDB. +//! +//! Schema: `session.iterate.v1`. One row per `/v1/iterate` session. +//! Append-only. Best-effort posture: errors warn and the iterate +//! response always ships. +//! +//! See `golangLAKEHOUSE/docs/SESSION_LOG.md` for the full schema +//! reference + DuckDB query examples. This module produces rows +//! with `daemon: "gateway"`; the Go side produces `daemon: +//! "validatord"`. Operators who want a unified stream can point both +//! to the same path (the OS write-append is atomic for the row sizes +//! we produce) or query both files together via duckdb's `read_json` +//! glob support. + +use serde::{Deserialize, Serialize}; +use std::sync::Arc; +use tokio::sync::Mutex; + +pub const SESSION_RECORD_SCHEMA: &str = "session.iterate.v1"; + +/// One row in coordinator_sessions.jsonl. Field names are the on-wire +/// names — must stay byte-equal to the Go side's +/// `validator.SessionRecord` (proven by the cross-runtime parity +/// probe at golangLAKEHOUSE/scripts/cutover/parity/). +// Deserialize is supported so the parity helper binary can round-trip +// fixture inputs through serde without hand-rolling a parser. Production +// emit path uses Serialize only; SessionRecord rows are written by the +// gateway and consumed by DuckDB / external tooling, never re-read by us. +#[derive(Serialize, Deserialize)] +pub struct SessionRecord { + pub schema: String, + pub session_id: String, + pub timestamp: String, + pub daemon: String, + pub kind: String, + pub model: String, + pub provider: String, + pub prompt: String, + pub iterations: u32, + pub max_iterations: u32, + pub final_verdict: String, // "accepted" | "max_iter_exhausted" | "infra_error" + pub attempts: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub artifact: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub grounded_in_roster: Option, + pub duration_ms: u64, +} + +#[derive(Serialize, Deserialize)] +pub struct SessionAttemptRecord { + pub iteration: u32, + pub verdict_kind: String, // "no_json" | "validation_failed" | "accepted" | "infra_error" + #[serde(skip_serializing_if = "Option::is_none")] + pub error: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub span_id: Option, +} + +/// Append-only writer. Cloneable handle — internal state is Arc'd so +/// V1State can keep its own clone and per-request clones are cheap. +#[derive(Clone)] +pub struct SessionLogger { + inner: Arc, +} + +struct Inner { + path: String, + /// tokio::Mutex (not std) because we hold it across the async + /// fs write. Contention is low (one row per /v1/iterate session). + mu: Mutex<()>, +} + +impl SessionLogger { + /// Construct a logger writing to `path`. Empty path → None + /// (skip the wiring in the iterate handler entirely). + pub fn from_path(path: &str) -> Option { + if path.is_empty() { + return None; + } + Some(Self { + inner: Arc::new(Inner { + path: path.to_string(), + mu: Mutex::new(()), + }), + }) + } + + /// Append one record. Best-effort: failures land in `tracing::warn!` + /// and the caller sees Ok(()) — observability is a witness, never + /// a gate. Returns Err only on impossible cases the type system + /// can't rule out (here: serde_json::to_string failing on a + /// well-formed struct, which shouldn't happen). + pub async fn append(&self, rec: SessionRecord) { + let body = match serde_json::to_string(&rec) { + Ok(s) => s, + Err(e) => { + tracing::warn!(target: "v1.session_log", "marshal: {e}"); + return; + } + }; + let _guard = self.inner.mu.lock().await; + if let Err(e) = self.write(&body).await { + tracing::warn!(target: "v1.session_log", "write {}: {e}", self.inner.path); + } + } + + async fn write(&self, body: &str) -> std::io::Result<()> { + use tokio::fs::OpenOptions; + use tokio::io::AsyncWriteExt; + + // Lazy mkdir on first write so a not-yet-mounted volume at + // startup doesn't kill the daemon. + if let Some(parent) = std::path::Path::new(&self.inner.path).parent() { + if !parent.as_os_str().is_empty() { + tokio::fs::create_dir_all(parent).await?; + } + } + let mut f = OpenOptions::new() + .append(true) + .create(true) + .open(&self.inner.path) + .await?; + f.write_all(body.as_bytes()).await?; + f.write_all(b"\n").await?; + Ok(()) + } +} + +/// Best-effort UTF-8 char truncation. Matches Go's `trim` helper so +/// rows produced by either runtime cap fields at the same boundaries. +pub fn truncate(s: &str, n: usize) -> String { + s.chars().take(n).collect() +} + +#[cfg(test)] +mod tests { + use super::*; + use std::path::PathBuf; + use tokio::fs; + + fn fixture_record(session_id: &str) -> SessionRecord { + SessionRecord { + schema: SESSION_RECORD_SCHEMA.to_string(), + session_id: session_id.to_string(), + timestamp: "2026-05-02T08:00:00Z".to_string(), + daemon: "gateway".to_string(), + kind: "fill".to_string(), + model: "qwen3.5:latest".to_string(), + provider: "ollama".to_string(), + prompt: "produce a fill artifact".to_string(), + iterations: 1, + max_iterations: 3, + final_verdict: "accepted".to_string(), + attempts: vec![SessionAttemptRecord { + iteration: 0, + verdict_kind: "accepted".to_string(), + error: None, + span_id: Some("span-0".to_string()), + }], + artifact: Some(serde_json::json!({"fills":[{"candidate_id":"W-1"}]})), + grounded_in_roster: Some(true), + duration_ms: 50, + } + } + + #[tokio::test] + async fn from_path_empty_returns_none() { + assert!(SessionLogger::from_path("").is_none()); + } + + #[tokio::test] + async fn append_writes_jsonl_row_with_schema_field() { + let dir = tempdir(); + let path = dir.join("sessions.jsonl"); + let path_str = path.to_string_lossy().to_string(); + let logger = SessionLogger::from_path(&path_str).unwrap(); + logger.append(fixture_record("trace-a")).await; + + let body = fs::read_to_string(&path).await.unwrap(); + assert!(body.contains("\"schema\":\"session.iterate.v1\"")); + assert!(body.contains("\"session_id\":\"trace-a\"")); + assert!(body.contains("\"grounded_in_roster\":true")); + assert!(body.ends_with('\n')); + } + + #[tokio::test] + async fn append_concurrent_safe() { + let dir = tempdir(); + let path = dir.join("sessions.jsonl"); + let path_str = path.to_string_lossy().to_string(); + let logger = SessionLogger::from_path(&path_str).unwrap(); + + let n = 32; + let mut handles = Vec::with_capacity(n); + for i in 0..n { + let l = logger.clone(); + handles.push(tokio::spawn(async move { + l.append(fixture_record(&format!("trace-{i}"))).await; + })); + } + for h in handles { + h.await.unwrap(); + } + + let body = fs::read_to_string(&path).await.unwrap(); + let lines: Vec<_> = body.lines().filter(|l| !l.is_empty()).collect(); + assert_eq!(lines.len(), n, "expected {n} rows, got {}", lines.len()); + // Every row must round-trip through serde — a torn write + // would surface as a parse error. + for line in lines { + let _: serde_json::Value = serde_json::from_str(line).expect("valid json per row"); + } + } + + fn tempdir() -> PathBuf { + // Per-test unique path so prior runs don't pollute the next. + // The static counter increments across the whole test binary, + // so back-to-back tests in the same module get distinct dirs. + static COUNTER: std::sync::atomic::AtomicU64 = std::sync::atomic::AtomicU64::new(0); + let n = COUNTER.fetch_add(1, std::sync::atomic::Ordering::Relaxed); + let p = std::env::temp_dir().join(format!( + "session_log_test_{}_{}_{}", + std::process::id(), + chrono::Utc::now().timestamp_nanos_opt().unwrap_or(0), + n, + )); + std::fs::create_dir_all(&p).unwrap(); + p + } +} diff --git a/crates/lance-bench/src/main.rs b/crates/lance-bench/src/main.rs index 7216d37..91af48d 100644 --- a/crates/lance-bench/src/main.rs +++ b/crates/lance-bench/src/main.rs @@ -456,6 +456,26 @@ async fn build_lance_vector_index(path: &str, _dims: usize) -> Result<()> { .await .context("create_index")?; + // Also build the scalar btree on doc_id. This bench's + // measure_random_access_lance uses take(row_position) which doesn't + // need the btree, but the dataset this bench writes is also queried + // downstream by /vectors/lance/doc// (the production + // lookup path) — without this index that path falls back to a full + // table scan. Cheap to build (~1.2s on 10M rows) and matches the + // gateway's lance_migrate handler behavior so bench-produced datasets + // are immediately production-shape. + use lance_index::scalar::ScalarIndexParams; + dataset + .create_index( + &["doc_id"], + IndexType::Scalar, + Some("doc_id_btree".into()), + &ScalarIndexParams::default(), + true, + ) + .await + .context("create_index doc_id btree")?; + Ok(()) } diff --git a/crates/shared/src/config.rs b/crates/shared/src/config.rs index 51d5167..f12898e 100644 --- a/crates/shared/src/config.rs +++ b/crates/shared/src/config.rs @@ -62,6 +62,15 @@ pub struct GatewayConfig { pub host: String, #[serde(default = "default_gateway_port")] pub port: u16, + /// Coordinator session JSONL output path. One row per + /// `/v1/iterate` session, schema=`session.iterate.v1`. Empty = + /// disabled. Cross-runtime parity with the Go side's + /// `[validatord].session_log_path` (added 2026-05-02). Default + /// empty so existing deployments aren't perturbed; production + /// sets `/var/lib/lakehouse/gateway/sessions.jsonl`. See + /// `golangLAKEHOUSE/docs/SESSION_LOG.md` for query examples. + #[serde(default)] + pub session_log_path: String, } #[derive(Debug, Clone, Deserialize)] @@ -149,7 +158,13 @@ fn default_gateway_port() -> u16 { 3100 } fn default_storage_root() -> String { "./data".to_string() } fn default_profile_root() -> String { "./data/_profiles".to_string() } fn default_manifest_prefix() -> String { "_catalog/manifests".to_string() } -fn default_sidecar_url() -> String { "http://localhost:3200".to_string() } +// Post-2026-05-02: AiClient talks directly to Ollama; the Python +// sidecar's hot-path role was retired. The config field name +// `[sidecar].url` is preserved for migration compatibility (operators +// with existing TOMLs don't need to rename anything), but the value +// now points at Ollama. Lab UI / pipeline_lab Python remains as a +// dev-only tool; not on this URL. +fn default_sidecar_url() -> String { "http://localhost:11434".to_string() } fn default_embed_model() -> String { "nomic-embed-text".to_string() } fn default_gen_model() -> String { "qwen2.5".to_string() } fn default_rerank_model() -> String { "qwen2.5".to_string() } @@ -184,7 +199,11 @@ impl Config { impl Default for Config { fn default() -> Self { Self { - gateway: GatewayConfig { host: default_host(), port: default_gateway_port() }, + gateway: GatewayConfig { + host: default_host(), + port: default_gateway_port(), + session_log_path: String::new(), + }, storage: StorageConfig { root: default_storage_root(), profile_root: default_profile_root(), diff --git a/crates/vectord-lance/src/lib.rs b/crates/vectord-lance/src/lib.rs index d18a36e..cd9ab60 100644 --- a/crates/vectord-lance/src/lib.rs +++ b/crates/vectord-lance/src/lib.rs @@ -603,3 +603,210 @@ fn row_from_batch(batch: &RecordBatch, row: usize) -> Result { Ok(Row { doc_id, chunk_text, vector: v, source, chunk_idx }) } + +// =================== Tests =================== +// +// All tests run against a temp directory — never the production +// data/lance/ tree. Lance reads/writes are async + filesystem-bound, +// so we use #[tokio::test]. Each test uses a unique per-pid + per- +// nanosecond temp dir so concurrent runs don't collide and a re-run +// of a single test doesn't see prior state. +// +// Surfaced 2026-05-02 audit: vectord-lance had ZERO tests despite +// being on the live HTTP path. These are the load-bearing locks for +// the public API contract. +#[cfg(test)] +mod tests { + use super::*; + + fn temp_path(label: &str) -> String { + // Per-process atomic counter — guarantees uniqueness regardless + // of clock resolution or test scheduling. Combined with pid, the + // result is unique within and across processes for any practical + // test workload. Nanosecond timestamps were not enough on their + // own: opus WARN at lib.rs:622 from the 2026-05-02 scrum noted + // that under tokio scheduling, multiple tests in the same cargo + // process can hit the same nanos bucket. + use std::sync::atomic::{AtomicU64, Ordering}; + static COUNTER: AtomicU64 = AtomicU64::new(0); + let seq = COUNTER.fetch_add(1, Ordering::Relaxed); + let pid = std::process::id(); + let nanos = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .map(|d| d.subsec_nanos()) + .unwrap_or(0); + std::env::temp_dir() + .join(format!("vlance_test_{label}_{pid}_{nanos}_{seq}")) + .to_string_lossy() + .to_string() + } + + /// Build a minimal in-memory Parquet file matching vectord's + /// binary-blob schema. Used as input to migrate_from_parquet_bytes. + fn synth_parquet_bytes(n_rows: usize, dims: usize) -> Vec { + use parquet::arrow::ArrowWriter; + use std::io::Cursor; + + let schema = Arc::new(Schema::new(vec![ + Field::new("source", DataType::Utf8, true), + Field::new("doc_id", DataType::Utf8, false), + Field::new("chunk_idx", DataType::Int32, true), + Field::new("chunk_text", DataType::Utf8, true), + Field::new("vector", DataType::Binary, false), + ])); + + let sources: Vec> = (0..n_rows).map(|_| Some("test")).collect(); + let doc_ids: Vec = (0..n_rows).map(|i| format!("DOC-{i:04}")).collect(); + let chunk_idxs: Vec> = (0..n_rows).map(|i| Some(i as i32)).collect(); + let chunk_texts: Vec = (0..n_rows).map(|i| format!("synth chunk {i}")).collect(); + let vectors: Vec> = (0..n_rows).map(|i| { + let v: Vec = (0..dims).map(|j| (i * dims + j) as f32 * 0.01).collect(); + let mut bytes = Vec::with_capacity(dims * 4); + for f in v { bytes.extend_from_slice(&f.to_le_bytes()); } + bytes + }).collect(); + + let batch = RecordBatch::try_new(schema.clone(), vec![ + Arc::new(StringArray::from(sources)), + Arc::new(StringArray::from(doc_ids)), + Arc::new(Int32Array::from(chunk_idxs)), + Arc::new(StringArray::from(chunk_texts)), + Arc::new(BinaryArray::from(vectors.iter().map(|v| v.as_slice()).collect::>())), + ]).expect("synth parquet batch"); + + let mut buf = Cursor::new(Vec::new()); + let mut writer = ArrowWriter::try_new(&mut buf, schema, None).expect("arrow writer"); + writer.write(&batch).expect("write batch"); + writer.close().expect("close writer"); + buf.into_inner() + } + + #[tokio::test] + async fn fresh_store_reports_no_state() { + let path = temp_path("fresh"); + let store = LanceVectorStore::new(path.clone()); + assert_eq!(store.path(), path); + assert_eq!(store.count().await.unwrap_or(0), 0); + assert!(!store.has_vector_index().await.unwrap_or(true)); + } + + #[tokio::test] + async fn migrate_then_count_and_fetch() { + let path = temp_path("migrate_fetch"); + let store = LanceVectorStore::new(path.clone()); + let bytes = synth_parquet_bytes(8, 4); + + let stats = store.migrate_from_parquet_bytes(&bytes).await.expect("migrate"); + assert_eq!(stats.rows_written, 8); + assert_eq!(stats.dimensions, 4); + assert!(stats.disk_bytes > 0, "lance dataset should occupy disk"); + + assert_eq!(store.count().await.unwrap(), 8); + + let row = store.get_by_doc_id("DOC-0003").await + .expect("get_by_doc_id Ok").expect("DOC-0003 exists"); + assert_eq!(row.doc_id, "DOC-0003"); + assert_eq!(row.chunk_text, "synth chunk 3"); + assert_eq!(row.vector.len(), 4); + + let _ = std::fs::remove_dir_all(&path); + } + + /// Load-bearing contract: get_by_doc_id distinguishes "dataset + /// missing" (Err) from "id missing" (Ok(None)) so the HTTP + /// handler can return 404 without inspecting error strings. + #[tokio::test] + async fn get_by_doc_id_missing_returns_none() { + let path = temp_path("missing_id"); + let store = LanceVectorStore::new(path.clone()); + store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate"); + + let row = store.get_by_doc_id("DOC-NEVER-EXISTS").await.expect("Ok"); + assert!(row.is_none(), "missing id must return Ok(None), not Err"); + + let _ = std::fs::remove_dir_all(&path); + } + + /// Verifies the load-bearing structural-difference claim of + /// ADR-019: Lance appends without rewriting the whole file. Row + /// count grows; new rows are fetchable by their doc_ids. + #[tokio::test] + async fn append_grows_count_and_new_rows_fetchable() { + let path = temp_path("append"); + let store = LanceVectorStore::new(path.clone()); + store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate"); + assert_eq!(store.count().await.unwrap(), 4); + + let stats = store.append( + Some("appended".into()), + vec!["NEW-A".into(), "NEW-B".into()], + vec![0, 0], + vec!["new chunk a".into(), "new chunk b".into()], + vec![vec![0.1, 0.2, 0.3, 0.4], vec![0.5, 0.6, 0.7, 0.8]], + ).await.expect("append"); + + assert_eq!(stats.rows_appended, 2); + assert_eq!(store.count().await.unwrap(), 6); + + let new_a = store.get_by_doc_id("NEW-A").await.unwrap().expect("NEW-A"); + assert_eq!(new_a.chunk_text, "new chunk a"); + assert_eq!(new_a.source.as_deref(), Some("appended")); + + let _ = std::fs::remove_dir_all(&path); + } + + /// Without this guard a dim-mismatch row would land on disk and + /// silently break search at query time. + #[tokio::test] + async fn append_dim_mismatch_errors() { + let path = temp_path("dim_mismatch"); + let store = LanceVectorStore::new(path.clone()); + store.migrate_from_parquet_bytes(&synth_parquet_bytes(4, 4)).await.expect("migrate"); + + let err = store.append( + None, vec!["X".into(), "Y".into()], vec![0, 0], + vec!["a".into(), "b".into()], + vec![vec![1.0, 2.0, 3.0, 4.0], vec![1.0, 2.0]], + ).await; + assert!(err.is_err(), "dim mismatch must error"); + let msg = err.unwrap_err(); + assert!(msg.contains("dim") || msg.contains("expected"), + "error must mention the dimension problem; got: {msg}"); + + let _ = std::fs::remove_dir_all(&path); + } + + /// Search round-trip: query the exact vector for one row, top-1 + /// must be that row. Verifies the search path works on small + /// datasets where IVF training would normally be skipped. + #[tokio::test] + async fn search_returns_nearest() { + let path = temp_path("search"); + let store = LanceVectorStore::new(path.clone()); + store.migrate_from_parquet_bytes(&synth_parquet_bytes(8, 4)).await.expect("migrate"); + + let target: Vec = (0..4).map(|j| (5 * 4 + j) as f32 * 0.01).collect(); + let hits = store.search(&target, 3, None, None).await.expect("search"); + assert!(!hits.is_empty(), "search must return at least 1 hit"); + assert_eq!(hits[0].doc_id, "DOC-0005", + "exact-vector match should be top-1; got {hits:?}"); + + let _ = std::fs::remove_dir_all(&path); + } + + /// stats() summarizes the dataset state in one call. Locks the + /// field shape so downstream consumers don't break on a rename. + #[tokio::test] + async fn stats_reports_post_migrate_state() { + let path = temp_path("stats"); + let store = LanceVectorStore::new(path.clone()); + store.migrate_from_parquet_bytes(&synth_parquet_bytes(5, 4)).await.expect("migrate"); + + let s = store.stats().await.expect("stats"); + assert_eq!(s.rows, 5); + assert!(s.disk_bytes > 0); + assert!(!s.has_vector_index, "no vector index built yet"); + + let _ = std::fs::remove_dir_all(&path); + } +} diff --git a/crates/vectord/src/pathway_memory.rs b/crates/vectord/src/pathway_memory.rs index 603dfa4..52ed646 100644 --- a/crates/vectord/src/pathway_memory.rs +++ b/crates/vectord/src/pathway_memory.rs @@ -925,7 +925,7 @@ mod tests { reject_reason: None, }]; let mut trace = PathwayTrace { - pathway_id, + pathway_id: pathway_id.clone(), task_class: "scrum_review".into(), file_path: format!("crates/{id_tag}/src/x.rs"), signal_class: Some("CONVERGING".into()), @@ -954,6 +954,14 @@ mod tests { replay_count: replays, replays_succeeded: succ, retired: false, + // Versioning fields added by Mem0 wave (commit 6ac7f61) — defaults + // mirror "this trace is the live head with no parent/successor". + trace_uid: format!("test-{pathway_id}"), + version: 1, + parent_trace_uid: None, + superseded_at: None, + superseded_by_trace_uid: None, + retirement_reason: None, }; trace.pathway_vec = build_pathway_vec(&trace); trace diff --git a/crates/vectord/src/rag.rs b/crates/vectord/src/rag.rs index 007329a..286c00d 100644 --- a/crates/vectord/src/rag.rs +++ b/crates/vectord/src/rag.rs @@ -163,7 +163,11 @@ pub async fn query( // production caller of the Phase 21 primitives — see audit finding // "Phase 21 Rust primitives are wired but not CALLED by any // production surface" from 2026-04-21. - let mut cont_opts = ContinuableOpts::new("qwen2.5:latest"); + // 2026-04-30 model bump: qwen2.5:latest → qwen3.5:latest to match + // the small-model-pipeline local-tier default. Same JSON-clean + // property, more capacity. think=Some(false) preserved — RAG hot + // path doesn't need reasoning traces; direct answers only. + let mut cont_opts = ContinuableOpts::new("qwen3.5:latest"); cont_opts.max_tokens = Some(512); cont_opts.temperature = Some(0.2); cont_opts.shape = ResponseShape::Text; @@ -176,7 +180,7 @@ pub async fn query( // echoes whatever Ollama loaded). Use the configured tier model // for now; if RAG needs to report the actual resolved model, // the runner can add a post-call ps probe later. - model: "qwen2.5:latest".to_string(), + model: "qwen3.5:latest".to_string(), sources: results, tokens_generated: None, }) diff --git a/crates/vectord/src/service.rs b/crates/vectord/src/service.rs index 20fe7bd..3f422de 100644 --- a/crates/vectord/src/service.rs +++ b/crates/vectord/src/service.rs @@ -1855,10 +1855,10 @@ async fn lance_migrate( .map_err(|e| (StatusCode::NOT_FOUND, format!("read parquet: {e}")))?; let lance_store = state.lance.store_for_new(&index_name, &bucket).await - .map_err(|e| (StatusCode::BAD_REQUEST, e))?; + .map_err(|e| sanitize_lance_err(e, &index_name))?; let stats = lance_store.migrate_from_parquet_bytes(&bytes).await - .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?; + .map_err(|e| sanitize_lance_err(e, &index_name))?; tracing::info!( "lance migrate '{}': {} rows, {}d, {} bytes on disk, {:.2}s", @@ -1866,11 +1866,40 @@ async fn lance_migrate( stats.disk_bytes, stats.duration_secs, ); + // Auto-build the doc_id btree. The scalar index is what makes + // get_doc_by_id O(log n) instead of a full table scan; ADR-019 + // calls this out as the load-bearing feature for hybrid lookup. + // Verified 2026-05-02: skipping this on a 10M-row dataset turns + // ~5ms doc-fetch into ~100ms (full scan over 35GB). Cheap to + // build (~1.2s on 10M, +269MB on disk) and only runs once per + // dataset since `has_scalar_index` short-circuits subsequent calls. + let scalar_stats = if !lance_store.has_scalar_index("doc_id").await.unwrap_or(false) { + match lance_store.build_scalar_index("doc_id").await { + Ok(s) => { + tracing::info!( + "lance migrate '{}': doc_id btree built in {:.2}s (+{} bytes)", + index_name, s.build_time_secs, s.disk_bytes_added, + ); + Some(s) + } + Err(e) => { + // Don't fail the whole migrate over a missing btree — + // the dataset is still queryable, just slowly. Log it + // so it's debuggable. + tracing::warn!("lance migrate '{}': doc_id btree build failed (will fall back to scan): {e}", index_name); + None + } + } + } else { + None + }; + Ok::<_, (StatusCode, String)>(Json(serde_json::json!({ "index_name": index_name, "bucket": bucket, "lance_path": lance_store.path(), "stats": stats, + "scalar_index": scalar_stats, }))) } @@ -1888,6 +1917,300 @@ fn default_partitions() -> u32 { 316 } // ≈√100K — sane for the referenc fn default_bits() -> u32 { 8 } fn default_subvectors() -> u32 { 48 } // 768/48 = 16 dims per subvector +/// Sanitize a Lance backend error before returning it to the HTTP +/// caller. Two responsibilities: +/// +/// 1. Map "dataset not found" patterns to HTTP 404 instead of 500. +/// A missing index isn't an internal failure — it's a resource +/// lookup miss, and the response code should reflect that. +/// 2. Strip server-side filesystem paths and Rust crate registry +/// paths (`/root/.cargo/registry/src/index.crates.io-...`) from +/// the message body. An attacker probing the surface shouldn't +/// learn the server's directory layout or our exact dep versions. +/// +/// Surfaced 2026-05-02 by the Lance backend audit: missing-index +/// search returned 500 + leaked the lakehouse data path AND the +/// .cargo/registry path with crate versions. +fn sanitize_lance_err(err: String, index_name: &str) -> (StatusCode, String) { + // 404 detection — narrowed across two 2026-05-02→03 scrum waves. + // First wave (opus WARN service.rs:1908): the original `lower.contains + // ("not found")` was too broad — caught "column not found" / + // "field not found in schema" which are real 500s. Second wave (opus + // WARN service.rs:1949): the looser `mentions_path_missing` branch I + // added would 404 on a registry-file error like "/root/.cargo/.../x.rs: + // no such file or directory" because it triggers without dataset + // context. Drop the standalone path-missing branch; require dataset + // context AND a missing-shape phrase. Lance's actual error format + // ("Dataset at path X was not found") satisfies this. + let lower = err.to_lowercase(); + let mentions_dataset = lower.contains("dataset"); + let lance_dataset_missing = mentions_dataset && ( + lower.contains("not found") || lower.contains("does not exist") + ); + // Excluded shapes — these contain "not found" but are real 500s. + let column_or_field = lower.contains("column not found") + || lower.contains("field not found") + || lower.contains("schema not found"); + let is_not_found = lance_dataset_missing && !column_or_field; + if is_not_found { + return (StatusCode::NOT_FOUND, format!("lance dataset not found: {index_name}")); + } + + // Path redaction — replace path-shaped substrings with [REDACTED] + // rather than truncating, per opus BLOCK at service.rs:1914 from the + // 2026-05-02 scrum. The previous `err.split("/home/").next()` returned + // Some("") when the error string STARTED with "/home/", erasing the + // entire message and falling back to a generic "lance backend error" + // that lost all real error context. Replacing keeps the structural + // error (the "what failed") while stripping the location. + let cleaned = redact_paths(&err) + .trim_end_matches([',', ' ', '\n', '\t']) + .to_string(); + let msg = if cleaned.is_empty() { + format!("lance backend error on {index_name}") + } else { + cleaned + }; + (StatusCode::INTERNAL_SERVER_ERROR, msg) +} + +/// Replace absolute-path substrings (under known leak-prone roots) with +/// "[REDACTED]". Walks the input once, identifying path-shaped runs that +/// start with one of the configured prefixes and continue until a +/// path-terminating character (whitespace, quote, comma, paren, EOL). +/// +/// Linear time, no regex dep. Catches multi-occurrence cases that +/// `String::split(p).next()` lost. The path-redaction surface intentionally +/// includes /var, /tmp, /etc, /usr, /opt in addition to /home and +/// /root/.cargo because Lance/Arrow errors surface system paths in +/// addition to project paths. +fn redact_paths(s: &str) -> String { + // Two prefix sets: + // - ABSOLUTE: paths starting with '/' (always safe to redact) + // - RELATIVE: same path bodies but without leading '/' (Lance occasionally + // strips the leading slash when echoing dataset paths back, observed + // live 2026-05-02 — "Dataset at path home/profit/lakehouse/data/lance/x + // was not found"). Match these only when preceded by a non-alpha char + // (start of string, space, colon, etc.) so we don't redact innocent + // tokens like "homecoming" or "etcetera". + const ABSOLUTE: &[&str] = &[ + "/root/.cargo", "/home", "/var", "/tmp", "/etc", "/usr", "/opt", + ]; + const RELATIVE: &[&str] = &[ + "root/.cargo", "home/", "var/", "tmp/", "etc/", "usr/", "opt/", + ]; + fn is_path_term(b: u8) -> bool { + matches!(b, b' ' | b'\t' | b'\n' | b'\r' | b'"' | b'\'' | b',' | b')' | b']' | b'}') + } + fn is_word_boundary_before(bytes: &[u8], i: usize) -> bool { + // True if byte at i-1 is non-alphanumeric (so this position starts + // a fresh token). True at start-of-input. + if i == 0 { return true; } + let b = bytes[i - 1]; + !(b.is_ascii_alphanumeric() || b == b'_' || b == b'.' || b == b'-') + } + // Walk by byte index but slice the original &str when emitting, never + // cast bytes to char (that would corrupt multi-byte UTF-8 — opus WARN + // at service.rs:2018 from the 2026-05-03 re-scrum). Path prefixes are + // pure ASCII so byte-level matching is sound; what matters is that + // we emit non-matched stretches as &str slices, not byte-by-byte. + let bytes = s.as_bytes(); + let mut out = String::with_capacity(s.len()); + let mut i = 0; + let mut copy_start = 0usize; // start of an in-progress unmatched run + while i < bytes.len() { + let mut matched_len: Option = None; + // Try absolute prefixes first (always allowed). + for p in ABSOLUTE { + let pb = p.as_bytes(); + if i + pb.len() <= bytes.len() && &bytes[i..i + pb.len()] == pb { + let after = i + pb.len(); + if after == bytes.len() || bytes[after] == b'/' || is_path_term(bytes[after]) { + matched_len = Some(pb.len()); + break; + } + } + } + // Then relative prefixes — only at word boundaries. + if matched_len.is_none() && is_word_boundary_before(bytes, i) { + for p in RELATIVE { + let pb = p.as_bytes(); + if i + pb.len() <= bytes.len() && &bytes[i..i + pb.len()] == pb { + matched_len = Some(pb.len()); + break; + } + } + } + if let Some(prefix_len) = matched_len { + // Flush any pending unmatched run as a UTF-8-safe slice. + if copy_start < i { + out.push_str(&s[copy_start..i]); + } + out.push_str("[REDACTED]"); + // Skip past the prefix and the path body (until terminator). + let mut j = i + prefix_len; + while j < bytes.len() && !is_path_term(bytes[j]) { + j += 1; + } + i = j; + copy_start = i; + } else { + // Advance one CHAR (not one byte) so multi-byte UTF-8 sequences + // stay intact in the eventual slice. Look up the next char + // boundary using the public API. + i += utf8_char_len(bytes, i); + } + } + if copy_start < bytes.len() { + out.push_str(&s[copy_start..]); + } + out +} + +/// Length in bytes of the UTF-8 character starting at byte `i`. Bytes are +/// guaranteed to be a valid UTF-8 sequence start (callers ensure that). +fn utf8_char_len(bytes: &[u8], i: usize) -> usize { + let b = bytes[i]; + if b < 0x80 { 1 } + else if b < 0xC0 { 1 } // continuation byte — defensive, shouldn't start here + else if b < 0xE0 { 2 } + else if b < 0xF0 { 3 } + else { 4 } +} + +#[cfg(test)] +mod sanitize_tests { + use super::*; + + #[test] + fn redact_path_at_offset_zero() { + // Regression: opus BLOCK 2026-05-02. Old impl returned Some("") + // when err started with "/home/", erasing the whole message. + let out = redact_paths("/home/profit/lakehouse/data/lance not a directory"); + assert_eq!(out, "[REDACTED] not a directory"); + } + + #[test] + fn redact_keeps_pre_and_post_text() { + let out = redact_paths("failed to open /home/profit/lakehouse/data/x for read: ENOENT"); + assert_eq!(out, "failed to open [REDACTED] for read: ENOENT"); + } + + #[test] + fn redact_multiple_paths() { + let out = redact_paths("at /root/.cargo/registry/src/index.crates.io-foo/lance-table-4.0.0/src/io/commit.rs:364:26 from /home/profit/lakehouse"); + assert!(!out.contains("/root/.cargo")); + assert!(!out.contains("/home/")); + assert!(out.contains("[REDACTED]")); + } + + #[test] + fn redact_preserves_quote_terminator() { + let out = redact_paths("{\"path\":\"/home/profit/x\",\"err\":\"bad\"}"); + assert_eq!(out, "{\"path\":\"[REDACTED]\",\"err\":\"bad\"}"); + } + + #[test] + fn is_not_found_narrow_dataset_only() { + // Regression: opus WARN 2026-05-02. Old impl 404'd on any "not + // found" — including legitimate column/field-not-found 500s. + let (status, _) = sanitize_lance_err( + "column not found: vector".into(), "test_idx", + ); + assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR); + + let (status, _) = sanitize_lance_err( + "dataset not found at /home/profit/lakehouse/data/lance/missing".into(), "test_idx", + ); + assert_eq!(status, StatusCode::NOT_FOUND); + } + + #[test] + fn redact_does_not_match_prefix_substring() { + // /etcetera should NOT trigger /etc redaction. + let out = redact_paths("etcetera and /etcd"); + assert_eq!(out, "etcetera and /etcd"); + } + + #[test] + fn redact_relative_paths_lance_emits() { + // 2026-05-02: live missing-index probe surfaced Lance error of the + // form "Dataset at path home/profit/lakehouse/data/lance/x was not + // found" — leading slash stripped. Need to redact the relative form + // when preceded by a word boundary. + let out = redact_paths("Dataset at path home/profit/lakehouse/data/lance/x was not found"); + assert!(!out.contains("home/profit"), "should redact: {out}"); + assert!(out.contains("Dataset at path")); + assert!(out.contains("was not found")); + } + + #[test] + fn redact_does_not_eat_innocent_prefix_words() { + // "homecoming" must NOT trigger "home/" redaction. "Etcetera" must + // NOT trigger "etc/" redaction. The word-boundary guard handles this. + let out = redact_paths("homecoming etcetera vary tmpfile"); + assert_eq!(out, "homecoming etcetera vary tmpfile"); + } + + #[test] + fn is_not_found_lance_actual_phrasing() { + // Lance's actual error format observed live: "Dataset at path X was + // not found: Not found: ...". Must 404, not 500. + let (status, _) = sanitize_lance_err( + "Dataset at path home/profit/lakehouse/data/lance/x was not found".into(), + "x", + ); + assert_eq!(status, StatusCode::NOT_FOUND); + } + + #[test] + fn is_not_found_excludes_column_field_schema() { + // Real 500s with the "not found" phrase that aren't dataset-missing. + for err in [ + "column not found: vector", + "field not found in schema: doc_id", + "schema not found for dataset xyz", + ] { + let (status, _) = sanitize_lance_err(err.into(), "test_idx"); + assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR, "{err}"); + } + } + + #[test] + fn is_not_found_does_not_match_unrelated_path_missing() { + // Regression: opus WARN at service.rs:1949 from the 2026-05-03 + // re-scrum. A registry-file error from inside a Lance internal + // module should NOT be coerced to 404 just because it contains + // "no such file or directory" — it's a real 500. + let (status, _) = sanitize_lance_err( + "/root/.cargo/registry/src/index.crates.io-foo/lance-table-4.0.0/src/io/commit.rs: no such file or directory".into(), + "test_idx", + ); + assert_eq!(status, StatusCode::INTERNAL_SERVER_ERROR); + // (And the path is still redacted in the message.) + let (_, msg) = sanitize_lance_err( + "/root/.cargo/registry/src/lance-foo/x.rs: no such file or directory".into(), + "test_idx", + ); + assert!(!msg.contains("/root/.cargo"), "path leak: {msg}"); + } + + #[test] + fn redact_preserves_multibyte_utf8() { + // Regression: opus WARN at service.rs:2018 from the 2026-05-03 + // re-scrum. Old impl did `out.push(bytes[i] as char)` which + // corrupted multi-byte UTF-8 (e.g. a path containing user-supplied + // names with non-ASCII characters) into Latin-1 mojibake. + let input = "Failed to open /home/profit/工作/data — café not found"; + let out = redact_paths(input); + // The path is redacted... + assert!(!out.contains("/home/profit"), "path leak: {out}"); + // ...AND the multi-byte characters elsewhere are preserved verbatim. + assert!(out.contains("café"), "lost UTF-8: {out}"); + assert!(out.contains("not found"), "lost trailing context: {out}"); + } +} + /// Build the IVF_PQ index on the Lance dataset. async fn lance_build_index( State(state): State, @@ -1895,10 +2218,10 @@ async fn lance_build_index( Json(req): Json, ) -> impl IntoResponse { let lance_store = state.lance.store_for(&index_name).await - .map_err(|e| (StatusCode::BAD_REQUEST, e))?; + .map_err(|e| sanitize_lance_err(e, &index_name))?; match lance_store.build_index(req.num_partitions, req.num_bits, req.num_sub_vectors).await { Ok(stats) => Ok(Json(stats)), - Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)), + Err(e) => Err(sanitize_lance_err(e, &index_name)), } } @@ -1947,13 +2270,13 @@ async fn lance_search( let qv: Vec = embed_resp.embeddings[0].iter().map(|&x| x as f32).collect(); let lance_store = state.lance.store_for(&index_name).await - .map_err(|e| (StatusCode::BAD_REQUEST, e))?; + .map_err(|e| sanitize_lance_err(e, &index_name))?; let t0 = std::time::Instant::now(); let nprobes = req.nprobes.or(Some(LANCE_DEFAULT_NPROBES)); let refine = req.refine_factor.or(Some(LANCE_DEFAULT_REFINE_FACTOR)); let hits = lance_store.search(&qv, req.top_k, nprobes, refine).await - .map_err(|e| (StatusCode::INTERNAL_SERVER_ERROR, e))?; + .map_err(|e| sanitize_lance_err(e, &index_name))?; Ok(Json(serde_json::json!({ "index_name": index_name, @@ -1971,7 +2294,7 @@ async fn lance_get_doc( Path((index_name, doc_id)): Path<(String, String)>, ) -> impl IntoResponse { let lance_store = state.lance.store_for(&index_name).await - .map_err(|e| (StatusCode::BAD_REQUEST, e))?; + .map_err(|e| sanitize_lance_err(e, &index_name))?; let t0 = std::time::Instant::now(); match lance_store.get_by_doc_id(&doc_id).await { Ok(Some(row)) => Ok(Json(serde_json::json!({ @@ -1981,7 +2304,7 @@ async fn lance_get_doc( "row": row, }))), Ok(None) => Err((StatusCode::NOT_FOUND, format!("doc_id not found: {doc_id}"))), - Err(e) => Err((StatusCode::INTERNAL_SERVER_ERROR, e)), + Err(e) => Err(sanitize_lance_err(e, &index_name)), } } @@ -2013,7 +2336,7 @@ async fn lance_append( return Err((StatusCode::BAD_REQUEST, "rows array is empty".into())); } let lance_store = state.lance.store_for(&index_name).await - .map_err(|e| (StatusCode::BAD_REQUEST, e))?; + .map_err(|e| sanitize_lance_err(e, &index_name))?; let mut doc_ids = Vec::with_capacity(req.rows.len()); let mut chunk_idxs = Vec::with_capacity(req.rows.len()); diff --git a/data/_catalog/manifests/32ee74a0-59b4-4e5b-8edb-70c9347a4bf3.json b/data/_catalog/manifests/32ee74a0-59b4-4e5b-8edb-70c9347a4bf3.json index 7dcb03d..6dce197 100644 --- a/data/_catalog/manifests/32ee74a0-59b4-4e5b-8edb-70c9347a4bf3.json +++ b/data/_catalog/manifests/32ee74a0-59b4-4e5b-8edb-70c9347a4bf3.json @@ -11,15 +11,51 @@ } ], "created_at": "2026-04-20T11:07:57.308050648Z", - "updated_at": "2026-04-22T03:28:28.343843823Z", + "updated_at": "2026-04-28T01:28:31.280305207Z", "description": "", "owner": "", "sensitivity": null, - "columns": [], + "columns": [ + { + "name": "timestamp", + "data_type": "Utf8", + "sensitivity": null, + "description": "", + "is_pii": false + }, + { + "name": "operation", + "data_type": "Utf8", + "sensitivity": null, + "description": "", + "is_pii": false + }, + { + "name": "approach", + "data_type": "Utf8", + "sensitivity": null, + "description": "", + "is_pii": false + }, + { + "name": "result", + "data_type": "Utf8", + "sensitivity": null, + "description": "", + "is_pii": false + }, + { + "name": "context", + "data_type": "Utf8", + "sensitivity": null, + "description": "", + "is_pii": false + } + ], "lineage": null, "freshness": null, "tags": [], - "row_count": null, + "row_count": 2077, "last_embedded_at": null, "embedding_stale_since": null, "embedding_refresh_policy": null diff --git a/data/_catalog/manifests/564b00ae-cbf3-4efd-aa55-84cdb6d2b0b7.json b/data/_catalog/manifests/564b00ae-cbf3-4efd-aa55-84cdb6d2b0b7.json deleted file mode 100644 index fc35735..0000000 --- a/data/_catalog/manifests/564b00ae-cbf3-4efd-aa55-84cdb6d2b0b7.json +++ /dev/null @@ -1,117 +0,0 @@ -{ - "id": "564b00ae-cbf3-4efd-aa55-84cdb6d2b0b7", - "name": "client_workerskjkk", - "schema_fingerprint": "cdfe85348885ddf329e5e6e9bf0e2c75c92d1a86fdb0fd3875ed46e3f93c4d82", - "objects": [ - { - "bucket": "primary", - "key": "datasets/client_workerskjkk.parquet", - "size_bytes": 32201, - "created_at": "2026-04-21T00:49:04.623625149Z" - } - ], - "created_at": "2026-04-21T00:49:04.623626738Z", - "updated_at": "2026-04-21T00:49:04.623901788Z", - "description": "", - "owner": "", - "sensitivity": "pii", - "columns": [ - { - "name": "worker_id", - "data_type": "Utf8", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "name", - "data_type": "Utf8", - "sensitivity": "pii", - "description": "", - "is_pii": true - }, - { - "name": "role", - "data_type": "Utf8", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "city", - "data_type": "Utf8", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "state", - "data_type": "Utf8", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "email", - "data_type": "Utf8", - "sensitivity": "pii", - "description": "", - "is_pii": true - }, - { - "name": "phone", - "data_type": "Utf8", - "sensitivity": "pii", - "description": "", - "is_pii": true - }, - { - "name": "skills", - "data_type": "Utf8", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "certifications", - "data_type": "Utf8", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "availability", - "data_type": "Float64", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "reliability", - "data_type": "Float64", - "sensitivity": null, - "description": "", - "is_pii": false - }, - { - "name": "archetype", - "data_type": "Utf8", - "sensitivity": null, - "description": "", - "is_pii": false - } - ], - "lineage": { - "source_system": "csv", - "source_file": "staffing_roster_sample.csv", - "ingest_job": "ingest-1776732544623", - "ingest_timestamp": "2026-04-21T00:49:04.623625149Z", - "parent_datasets": [] - }, - "freshness": null, - "tags": [], - "row_count": 180, - "last_embedded_at": null, - "embedding_stale_since": null, - "embedding_refresh_policy": null -} \ No newline at end of file diff --git a/data/headshots/manifest.jsonl b/data/headshots/manifest.jsonl new file mode 100644 index 0000000..6105d10 --- /dev/null +++ b/data/headshots/manifest.jsonl @@ -0,0 +1,1000 @@ +{"id": 0, "file": "face_0000.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 1, "file": "face_0001.jpg", "gender": "man", "race": "east_asian", "age": 24} +{"id": 2, "file": "face_0002.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 3, "file": "face_0003.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 4, "file": "face_0004.jpg", "gender": "man", "race": "caucasian", "age": 44} +{"id": 5, "file": "face_0005.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 6, "file": "face_0006.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 7, "file": "face_0007.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 8, "file": "face_0008.jpg", "gender": "woman", "race": "caucasian", "age": 17, "excluded": "minor"} +{"id": 9, "file": "face_0009.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 10, "file": "face_0010.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 11, "file": "face_0011.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 12, "file": "face_0012.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 13, "file": "face_0013.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 14, "file": "face_0014.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 15, "file": "face_0015.jpg", "gender": "woman", "race": "caucasian", "age": 19, "excluded": "minor"} +{"id": 16, "file": "face_0016.jpg", "gender": "woman", "race": "caucasian", "age": 19, "excluded": "minor"} +{"id": 17, "file": "face_0017.jpg", "gender": "woman", "race": "hispanic", "age": 30} +{"id": 18, "file": "face_0018.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 19, "file": "face_0019.jpg", "gender": "man", "race": "caucasian", "age": 41} +{"id": 20, "file": "face_0020.jpg", "gender": "woman", "race": "hispanic", "age": 30} +{"id": 21, "file": "face_0021.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 22, "file": "face_0022.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 23, "file": "face_0023.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 24, "file": "face_0024.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 25, "file": "face_0025.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 26, "file": "face_0026.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 27, "file": "face_0027.jpg", "gender": "man", "race": "caucasian", "age": 38} +{"id": 28, "file": "face_0028.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 29, "file": "face_0029.jpg", "gender": "man", "race": "middle_eastern", "age": 28} +{"id": 30, "file": "face_0030.jpg", "gender": "man", "race": "caucasian", "age": 38} +{"id": 31, "file": "face_0031.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 32, "file": "face_0032.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 33, "file": "face_0033.jpg", "gender": "woman", "race": "south_asian", "age": 31} +{"id": 34, "file": "face_0034.jpg", "gender": "woman", "race": "east_asian", "age": 23} +{"id": 35, "file": "face_0035.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 36, "file": "face_0036.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 37, "file": "face_0037.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 38, "file": "face_0038.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 39, "file": "face_0039.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 40, "file": "face_0040.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 41, "file": "face_0041.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 42, "file": "face_0042.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 43, "file": "face_0043.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 44, "file": "face_0044.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 45, "file": "face_0045.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 46, "file": "face_0046.jpg", "gender": "woman", "race": "caucasian", "age": 22} +{"id": 47, "file": "face_0047.jpg", "gender": "woman", "race": "caucasian", "age": 22} +{"id": 48, "file": "face_0048.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 49, "file": "face_0049.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 50, "file": "face_0050.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 51, "file": "face_0051.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 52, "file": "face_0052.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 53, "file": "face_0053.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 54, "file": "face_0054.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 55, "file": "face_0055.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 56, "file": "face_0056.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 57, "file": "face_0057.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 58, "file": "face_0058.jpg", "gender": "man", "race": "hispanic", "age": 43} +{"id": 59, "file": "face_0059.jpg", "gender": "man", "race": "hispanic", "age": 43} +{"id": 60, "file": "face_0060.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 61, "file": "face_0061.jpg", "gender": "man", "race": "east_asian", "age": 35} +{"id": 62, "file": "face_0062.jpg", "gender": "man", "race": "east_asian", "age": 35} +{"id": 63, "file": "face_0063.jpg", "gender": "man", "race": "hispanic", "age": 39} +{"id": 64, "file": "face_0064.jpg", "gender": "man", "race": "hispanic", "age": 39} +{"id": 65, "file": "face_0065.jpg", "gender": "man", "race": "hispanic", "age": 31} +{"id": 66, "file": "face_0066.jpg", "gender": "man", "race": "hispanic", "age": 31} +{"id": 67, "file": "face_0067.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 68, "file": "face_0068.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 69, "file": "face_0069.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 70, "file": "face_0070.jpg", "gender": "man", "race": "east_asian", "age": 14, "excluded": "minor"} +{"id": 71, "file": "face_0071.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 72, "file": "face_0072.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 73, "file": "face_0073.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 74, "file": "face_0074.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 75, "file": "face_0075.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 76, "file": "face_0076.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 77, "file": "face_0077.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 78, "file": "face_0078.jpg", "gender": "woman", "race": "east_asian", "age": 34} +{"id": 79, "file": "face_0079.jpg", "gender": "woman", "race": "east_asian", "age": 34} +{"id": 80, "file": "face_0080.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 81, "file": "face_0081.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 82, "file": "face_0082.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 83, "file": "face_0083.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 84, "file": "face_0084.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 85, "file": "face_0085.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 86, "file": "face_0086.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 87, "file": "face_0087.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 88, "file": "face_0088.jpg", "gender": "man", "race": "caucasian", "age": 45} +{"id": 89, "file": "face_0089.jpg", "gender": "man", "race": "caucasian", "age": 45} +{"id": 90, "file": "face_0090.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 91, "file": "face_0091.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 92, "file": "face_0092.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 93, "file": "face_0093.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 94, "file": "face_0094.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 95, "file": "face_0095.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 96, "file": "face_0096.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 97, "file": "face_0097.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 98, "file": "face_0098.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 99, "file": "face_0099.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 100, "file": "face_0100.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 101, "file": "face_0101.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 102, "file": "face_0102.jpg", "gender": "man", "race": "caucasian", "age": 42} +{"id": 103, "file": "face_0103.jpg", "gender": "woman", "race": "hispanic", "age": 36} +{"id": 104, "file": "face_0104.jpg", "gender": "woman", "race": "hispanic", "age": 36} +{"id": 105, "file": "face_0105.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 106, "file": "face_0106.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 107, "file": "face_0107.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 108, "file": "face_0108.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 109, "file": "face_0109.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 110, "file": "face_0110.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 111, "file": "face_0111.jpg", "gender": "man", "race": "caucasian", "age": 40} +{"id": 112, "file": "face_0112.jpg", "gender": "man", "race": "caucasian", "age": 40} +{"id": 113, "file": "face_0113.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 114, "file": "face_0114.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 115, "file": "face_0115.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 116, "file": "face_0116.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 117, "file": "face_0117.jpg", "gender": "man", "race": "east_asian", "age": 28} +{"id": 118, "file": "face_0118.jpg", "gender": "man", "race": "east_asian", "age": 28} +{"id": 119, "file": "face_0119.jpg", "gender": "woman", "race": "caucasian", "age": 39} +{"id": 120, "file": "face_0120.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 121, "file": "face_0121.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 122, "file": "face_0122.jpg", "gender": "man", "race": "middle_eastern", "age": 29} +{"id": 123, "file": "face_0123.jpg", "gender": "man", "race": "caucasian", "age": 40} +{"id": 124, "file": "face_0124.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 125, "file": "face_0125.jpg", "gender": "man", "race": "caucasian", "age": 42} +{"id": 126, "file": "face_0126.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 127, "file": "face_0127.jpg", "gender": "woman", "race": "caucasian", "age": 41} +{"id": 128, "file": "face_0128.jpg", "gender": "woman", "race": "hispanic", "age": 36} +{"id": 129, "file": "face_0129.jpg", "gender": "woman", "race": "hispanic", "age": 36} +{"id": 130, "file": "face_0130.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 131, "file": "face_0131.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 132, "file": "face_0132.jpg", "gender": "woman", "race": "hispanic", "age": 27} +{"id": 133, "file": "face_0133.jpg", "gender": "woman", "race": "hispanic", "age": 27} +{"id": 134, "file": "face_0134.jpg", "gender": "woman", "race": "middle_eastern", "age": 31} +{"id": 135, "file": "face_0135.jpg", "gender": "woman", "race": "middle_eastern", "age": 31} +{"id": 136, "file": "face_0136.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 137, "file": "face_0137.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 138, "file": "face_0138.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 139, "file": "face_0139.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 140, "file": "face_0140.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 141, "file": "face_0141.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 142, "file": "face_0142.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 143, "file": "face_0143.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 144, "file": "face_0144.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 145, "file": "face_0145.jpg", "gender": "woman", "race": "east_asian", "age": 17, "excluded": "minor"} +{"id": 146, "file": "face_0146.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 147, "file": "face_0147.jpg", "gender": "man", "race": "hispanic", "age": 18, "excluded": "minor"} +{"id": 148, "file": "face_0148.jpg", "gender": "man", "race": "hispanic", "age": 18, "excluded": "minor"} +{"id": 149, "file": "face_0149.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 150, "file": "face_0150.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 151, "file": "face_0151.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 152, "file": "face_0152.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 153, "file": "face_0153.jpg", "gender": "woman", "race": "caucasian", "age": 46} +{"id": 154, "file": "face_0154.jpg", "gender": "woman", "race": "caucasian", "age": 46} +{"id": 155, "file": "face_0155.jpg", "gender": "man", "race": "caucasian", "age": 40} +{"id": 156, "file": "face_0156.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 157, "file": "face_0157.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 158, "file": "face_0158.jpg", "gender": "man", "race": "hispanic", "age": 31} +{"id": 159, "file": "face_0159.jpg", "gender": "man", "race": "hispanic", "age": 31} +{"id": 160, "file": "face_0160.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 161, "file": "face_0161.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 162, "file": "face_0162.jpg", "gender": "man", "race": "middle_eastern", "age": 48} +{"id": 163, "file": "face_0163.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 164, "file": "face_0164.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 165, "file": "face_0165.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 166, "file": "face_0166.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 167, "file": "face_0167.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 168, "file": "face_0168.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 169, "file": "face_0169.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 170, "file": "face_0170.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 171, "file": "face_0171.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 172, "file": "face_0172.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 173, "file": "face_0173.jpg", "gender": "woman", "race": "black", "age": 33} +{"id": 174, "file": "face_0174.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 175, "file": "face_0175.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 176, "file": "face_0176.jpg", "gender": "man", "race": "middle_eastern", "age": 37} +{"id": 177, "file": "face_0177.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 178, "file": "face_0178.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 179, "file": "face_0179.jpg", "gender": "man", "race": "hispanic", "age": 28} +{"id": 180, "file": "face_0180.jpg", "gender": "woman", "race": "caucasian", "age": 38} +{"id": 181, "file": "face_0181.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 182, "file": "face_0182.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 183, "file": "face_0183.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 184, "file": "face_0184.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 185, "file": "face_0185.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 186, "file": "face_0186.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 187, "file": "face_0187.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 188, "file": "face_0188.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 189, "file": "face_0189.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 190, "file": "face_0190.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 191, "file": "face_0191.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 192, "file": "face_0192.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 193, "file": "face_0193.jpg", "gender": "man", "race": "middle_eastern", "age": 34} +{"id": 194, "file": "face_0194.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 195, "file": "face_0195.jpg", "gender": "man", "race": "hispanic", "age": 41} +{"id": 196, "file": "face_0196.jpg", "gender": "woman", "race": "caucasian", "age": 23} +{"id": 197, "file": "face_0197.jpg", "gender": "woman", "race": "caucasian", "age": 19, "excluded": "minor"} +{"id": 198, "file": "face_0198.jpg", "gender": "man", "race": "east_asian", "age": 24} +{"id": 199, "file": "face_0199.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 200, "file": "face_0200.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 201, "file": "face_0201.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 202, "file": "face_0202.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 203, "file": "face_0203.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 204, "file": "face_0204.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 205, "file": "face_0205.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 206, "file": "face_0206.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 207, "file": "face_0207.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 208, "file": "face_0208.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 209, "file": "face_0209.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 210, "file": "face_0210.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 211, "file": "face_0211.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 212, "file": "face_0212.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 213, "file": "face_0213.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 214, "file": "face_0214.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 215, "file": "face_0215.jpg", "gender": "man", "race": "east_asian", "age": 34} +{"id": 216, "file": "face_0216.jpg", "gender": "man", "race": "east_asian", "age": 34} +{"id": 217, "file": "face_0217.jpg", "gender": "man", "race": "east_asian", "age": 34} +{"id": 218, "file": "face_0218.jpg", "gender": "man", "race": "east_asian", "age": 34} +{"id": 219, "file": "face_0219.jpg", "gender": "woman", "race": "hispanic", "age": 26} +{"id": 220, "file": "face_0220.jpg", "gender": "woman", "race": "hispanic", "age": 26} +{"id": 221, "file": "face_0221.jpg", "gender": "woman", "race": "hispanic", "age": 26} +{"id": 222, "file": "face_0222.jpg", "gender": "woman", "race": "hispanic", "age": 26} +{"id": 223, "file": "face_0223.jpg", "gender": "man", "race": "hispanic", "age": 30} +{"id": 224, "file": "face_0224.jpg", "gender": "woman", "race": "caucasian", "age": 22} +{"id": 225, "file": "face_0225.jpg", "gender": "woman", "race": "caucasian", "age": 22} +{"id": 226, "file": "face_0226.jpg", "gender": "woman", "race": "caucasian", "age": 22} +{"id": 227, "file": "face_0227.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 228, "file": "face_0228.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 229, "file": "face_0229.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 230, "file": "face_0230.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 231, "file": "face_0231.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 232, "file": "face_0232.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 233, "file": "face_0233.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 234, "file": "face_0234.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 235, "file": "face_0235.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 236, "file": "face_0236.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 237, "file": "face_0237.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 238, "file": "face_0238.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 239, "file": "face_0239.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 240, "file": "face_0240.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 241, "file": "face_0241.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 242, "file": "face_0242.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 243, "file": "face_0243.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 244, "file": "face_0244.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 245, "file": "face_0245.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 246, "file": "face_0246.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 247, "file": "face_0247.jpg", "gender": "man", "race": "east_asian", "age": 28} +{"id": 248, "file": "face_0248.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 249, "file": "face_0249.jpg", "gender": "woman", "race": "caucasian", "age": 50} +{"id": 250, "file": "face_0250.jpg", "gender": "woman", "race": "caucasian", "age": 50} +{"id": 251, "file": "face_0251.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 252, "file": "face_0252.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 253, "file": "face_0253.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 254, "file": "face_0254.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 255, "file": "face_0255.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 256, "file": "face_0256.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 257, "file": "face_0257.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 258, "file": "face_0258.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 259, "file": "face_0259.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 260, "file": "face_0260.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 261, "file": "face_0261.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 262, "file": "face_0262.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 263, "file": "face_0263.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 264, "file": "face_0264.jpg", "gender": "man", "race": "caucasian", "age": 42} +{"id": 265, "file": "face_0265.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 266, "file": "face_0266.jpg", "gender": "woman", "race": "hispanic", "age": 31} +{"id": 267, "file": "face_0267.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 268, "file": "face_0268.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 269, "file": "face_0269.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 270, "file": "face_0270.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 271, "file": "face_0271.jpg", "gender": "man", "race": "middle_eastern", "age": 47} +{"id": 272, "file": "face_0272.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 273, "file": "face_0273.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 274, "file": "face_0274.jpg", "gender": "man", "race": "black", "age": 30} +{"id": 275, "file": "face_0275.jpg", "gender": "man", "race": "black", "age": 30} +{"id": 276, "file": "face_0276.jpg", "gender": "man", "race": "black", "age": 30} +{"id": 277, "file": "face_0277.jpg", "gender": "man", "race": "black", "age": 30} +{"id": 278, "file": "face_0278.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 279, "file": "face_0279.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 280, "file": "face_0280.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 281, "file": "face_0281.jpg", "gender": "man", "race": "caucasian", "age": 19, "excluded": "minor"} +{"id": 282, "file": "face_0282.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 283, "file": "face_0283.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 284, "file": "face_0284.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 285, "file": "face_0285.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 286, "file": "face_0286.jpg", "gender": "man", "race": "middle_eastern", "age": 27} +{"id": 287, "file": "face_0287.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 288, "file": "face_0288.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 289, "file": "face_0289.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 290, "file": "face_0290.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 291, "file": "face_0291.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 292, "file": "face_0292.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 293, "file": "face_0293.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 294, "file": "face_0294.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 295, "file": "face_0295.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 296, "file": "face_0296.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 297, "file": "face_0297.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 298, "file": "face_0298.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 299, "file": "face_0299.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 300, "file": "face_0300.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 301, "file": "face_0301.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 302, "file": "face_0302.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 303, "file": "face_0303.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 304, "file": "face_0304.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 305, "file": "face_0305.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 306, "file": "face_0306.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 307, "file": "face_0307.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 308, "file": "face_0308.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 309, "file": "face_0309.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 310, "file": "face_0310.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 311, "file": "face_0311.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 312, "file": "face_0312.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 313, "file": "face_0313.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 314, "file": "face_0314.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 315, "file": "face_0315.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 316, "file": "face_0316.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 317, "file": "face_0317.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 318, "file": "face_0318.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 319, "file": "face_0319.jpg", "gender": "woman", "race": "caucasian", "age": 46} +{"id": 320, "file": "face_0320.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 321, "file": "face_0321.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 322, "file": "face_0322.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 323, "file": "face_0323.jpg", "gender": "woman", "race": "caucasian", "age": 39} +{"id": 324, "file": "face_0324.jpg", "gender": "woman", "race": "caucasian", "age": 39} +{"id": 325, "file": "face_0325.jpg", "gender": "woman", "race": "caucasian", "age": 39} +{"id": 326, "file": "face_0326.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 327, "file": "face_0327.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 328, "file": "face_0328.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 329, "file": "face_0329.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 330, "file": "face_0330.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 331, "file": "face_0331.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 332, "file": "face_0332.jpg", "gender": "woman", "race": "east_asian", "age": 24} +{"id": 333, "file": "face_0333.jpg", "gender": "woman", "race": "east_asian", "age": 24} +{"id": 334, "file": "face_0334.jpg", "gender": "woman", "race": "east_asian", "age": 24} +{"id": 335, "file": "face_0335.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 336, "file": "face_0336.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 337, "file": "face_0337.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 338, "file": "face_0338.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 339, "file": "face_0339.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 340, "file": "face_0340.jpg", "gender": "man", "race": "east_asian", "age": 32} +{"id": 341, "file": "face_0341.jpg", "gender": "man", "race": "east_asian", "age": 32} +{"id": 342, "file": "face_0342.jpg", "gender": "man", "race": "east_asian", "age": 32} +{"id": 343, "file": "face_0343.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 344, "file": "face_0344.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 345, "file": "face_0345.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 346, "file": "face_0346.jpg", "gender": "man", "race": "caucasian", "age": 35} +{"id": 347, "file": "face_0347.jpg", "gender": "man", "race": "caucasian", "age": 35} +{"id": 348, "file": "face_0348.jpg", "gender": "man", "race": "caucasian", "age": 35} +{"id": 349, "file": "face_0349.jpg", "gender": "man", "race": "caucasian", "age": 35} +{"id": 350, "file": "face_0350.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 351, "file": "face_0351.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 352, "file": "face_0352.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 353, "file": "face_0353.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 354, "file": "face_0354.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 355, "file": "face_0355.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 356, "file": "face_0356.jpg", "gender": "woman", "race": "hispanic", "age": 34} +{"id": 357, "file": "face_0357.jpg", "gender": "woman", "race": "hispanic", "age": 34} +{"id": 358, "file": "face_0358.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 359, "file": "face_0359.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 360, "file": "face_0360.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 361, "file": "face_0361.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 362, "file": "face_0362.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 363, "file": "face_0363.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 364, "file": "face_0364.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 365, "file": "face_0365.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 366, "file": "face_0366.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 367, "file": "face_0367.jpg", "gender": "woman", "race": "hispanic", "age": 39} +{"id": 368, "file": "face_0368.jpg", "gender": "woman", "race": "hispanic", "age": 39} +{"id": 369, "file": "face_0369.jpg", "gender": "woman", "race": "hispanic", "age": 39} +{"id": 370, "file": "face_0370.jpg", "gender": "woman", "race": "hispanic", "age": 25} +{"id": 371, "file": "face_0371.jpg", "gender": "woman", "race": "hispanic", "age": 25} +{"id": 372, "file": "face_0372.jpg", "gender": "woman", "race": "hispanic", "age": 25} +{"id": 373, "file": "face_0373.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 374, "file": "face_0374.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 375, "file": "face_0375.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 376, "file": "face_0376.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 377, "file": "face_0377.jpg", "gender": "woman", "race": "caucasian", "age": 49} +{"id": 378, "file": "face_0378.jpg", "gender": "woman", "race": "caucasian", "age": 49} +{"id": 379, "file": "face_0379.jpg", "gender": "woman", "race": "caucasian", "age": 49} +{"id": 380, "file": "face_0380.jpg", "gender": "woman", "race": "caucasian", "age": 49} +{"id": 381, "file": "face_0381.jpg", "gender": "woman", "race": "south_asian", "age": 31} +{"id": 382, "file": "face_0382.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 383, "file": "face_0383.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 384, "file": "face_0384.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 385, "file": "face_0385.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 386, "file": "face_0386.jpg", "gender": "man", "race": "middle_eastern", "age": 30} +{"id": 387, "file": "face_0387.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 388, "file": "face_0388.jpg", "gender": "woman", "race": "caucasian", "age": 13, "excluded": "minor"} +{"id": 389, "file": "face_0389.jpg", "gender": "woman", "race": "caucasian", "age": 13, "excluded": "minor"} +{"id": 390, "file": "face_0390.jpg", "gender": "woman", "race": "caucasian", "age": 13, "excluded": "minor"} +{"id": 391, "file": "face_0391.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 392, "file": "face_0392.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 393, "file": "face_0393.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 394, "file": "face_0394.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 395, "file": "face_0395.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 396, "file": "face_0396.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 397, "file": "face_0397.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 398, "file": "face_0398.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 399, "file": "face_0399.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 400, "file": "face_0400.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 401, "file": "face_0401.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 402, "file": "face_0402.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 403, "file": "face_0403.jpg", "gender": "woman", "race": "caucasian", "age": 17, "excluded": "minor"} +{"id": 404, "file": "face_0404.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 405, "file": "face_0405.jpg", "gender": "woman", "race": "caucasian", "age": 17, "excluded": "minor"} +{"id": 406, "file": "face_0406.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 407, "file": "face_0407.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 408, "file": "face_0408.jpg", "gender": "man", "race": "hispanic", "age": 24} +{"id": 409, "file": "face_0409.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 410, "file": "face_0410.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 411, "file": "face_0411.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 412, "file": "face_0412.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 413, "file": "face_0413.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 414, "file": "face_0414.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 415, "file": "face_0415.jpg", "gender": "man", "race": "middle_eastern", "age": 30} +{"id": 416, "file": "face_0416.jpg", "gender": "man", "race": "middle_eastern", "age": 30} +{"id": 417, "file": "face_0417.jpg", "gender": "woman", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 418, "file": "face_0418.jpg", "gender": "woman", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 419, "file": "face_0419.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 420, "file": "face_0420.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 421, "file": "face_0421.jpg", "gender": "man", "race": "caucasian", "age": 17, "excluded": "minor"} +{"id": 422, "file": "face_0422.jpg", "gender": "man", "race": "caucasian", "age": 17, "excluded": "minor"} +{"id": 423, "file": "face_0423.jpg", "gender": "man", "race": "caucasian", "age": 17, "excluded": "minor"} +{"id": 424, "file": "face_0424.jpg", "gender": "man", "race": "caucasian", "age": 17, "excluded": "minor"} +{"id": 425, "file": "face_0425.jpg", "gender": "woman", "race": "caucasian", "age": 37} +{"id": 426, "file": "face_0426.jpg", "gender": "woman", "race": "caucasian", "age": 37} +{"id": 427, "file": "face_0427.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 428, "file": "face_0428.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 429, "file": "face_0429.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 430, "file": "face_0430.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 431, "file": "face_0431.jpg", "gender": "man", "race": "caucasian", "age": 39} +{"id": 432, "file": "face_0432.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 433, "file": "face_0433.jpg", "gender": "woman", "race": "hispanic", "age": 31} +{"id": 434, "file": "face_0434.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 435, "file": "face_0435.jpg", "gender": "woman", "race": "east_asian", "age": 37} +{"id": 436, "file": "face_0436.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 437, "file": "face_0437.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 438, "file": "face_0438.jpg", "gender": "man", "race": "middle_eastern", "age": 27} +{"id": 439, "file": "face_0439.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 440, "file": "face_0440.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 441, "file": "face_0441.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 442, "file": "face_0442.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 443, "file": "face_0443.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 444, "file": "face_0444.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 445, "file": "face_0445.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 446, "file": "face_0446.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 447, "file": "face_0447.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 448, "file": "face_0448.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 449, "file": "face_0449.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 450, "file": "face_0450.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 451, "file": "face_0451.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 452, "file": "face_0452.jpg", "gender": "man", "race": "middle_eastern", "age": 36} +{"id": 453, "file": "face_0453.jpg", "gender": "man", "race": "middle_eastern", "age": 36} +{"id": 454, "file": "face_0454.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 455, "file": "face_0455.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 456, "file": "face_0456.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 457, "file": "face_0457.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 458, "file": "face_0458.jpg", "gender": "man", "race": "east_asian", "age": 12, "excluded": "minor"} +{"id": 459, "file": "face_0459.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 460, "file": "face_0460.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 461, "file": "face_0461.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 462, "file": "face_0462.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 463, "file": "face_0463.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 464, "file": "face_0464.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 465, "file": "face_0465.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 466, "file": "face_0466.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 467, "file": "face_0467.jpg", "gender": "woman", "race": "caucasian", "age": 39} +{"id": 468, "file": "face_0468.jpg", "gender": "woman", "race": "caucasian", "age": 39} +{"id": 469, "file": "face_0469.jpg", "gender": "woman", "race": "caucasian", "age": 39} +{"id": 470, "file": "face_0470.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 471, "file": "face_0471.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 472, "file": "face_0472.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 473, "file": "face_0473.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 474, "file": "face_0474.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 475, "file": "face_0475.jpg", "gender": "woman", "race": "caucasian", "age": 49} +{"id": 476, "file": "face_0476.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 477, "file": "face_0477.jpg", "gender": "man", "race": "middle_eastern", "age": 23} +{"id": 478, "file": "face_0478.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 479, "file": "face_0479.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 480, "file": "face_0480.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 481, "file": "face_0481.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 482, "file": "face_0482.jpg", "gender": "man", "race": "hispanic", "age": 33} +{"id": 483, "file": "face_0483.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 484, "file": "face_0484.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 485, "file": "face_0485.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 486, "file": "face_0486.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 487, "file": "face_0487.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 488, "file": "face_0488.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 489, "file": "face_0489.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 490, "file": "face_0490.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 491, "file": "face_0491.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 492, "file": "face_0492.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 493, "file": "face_0493.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 494, "file": "face_0494.jpg", "gender": "man", "race": "caucasian", "age": 44} +{"id": 495, "file": "face_0495.jpg", "gender": "man", "race": "caucasian", "age": 44} +{"id": 496, "file": "face_0496.jpg", "gender": "man", "race": "caucasian", "age": 37} +{"id": 497, "file": "face_0497.jpg", "gender": "man", "race": "caucasian", "age": 37} +{"id": 498, "file": "face_0498.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 499, "file": "face_0499.jpg", "gender": "man", "race": "east_asian", "age": 20, "excluded": "minor"} +{"id": 500, "file": "face_0500.jpg", "gender": "man", "race": "east_asian", "age": 20, "excluded": "minor"} +{"id": 501, "file": "face_0501.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 502, "file": "face_0502.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 503, "file": "face_0503.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 504, "file": "face_0504.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 505, "file": "face_0505.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 506, "file": "face_0506.jpg", "gender": "man", "race": "hispanic", "age": 38} +{"id": 507, "file": "face_0507.jpg", "gender": "man", "race": "hispanic", "age": 38} +{"id": 508, "file": "face_0508.jpg", "gender": "man", "race": "middle_eastern", "age": 24} +{"id": 509, "file": "face_0509.jpg", "gender": "man", "race": "middle_eastern", "age": 24} +{"id": 510, "file": "face_0510.jpg", "gender": "woman", "race": "caucasian", "age": 44} +{"id": 511, "file": "face_0511.jpg", "gender": "woman", "race": "caucasian", "age": 44} +{"id": 512, "file": "face_0512.jpg", "gender": "man", "race": "east_asian", "age": 24} +{"id": 513, "file": "face_0513.jpg", "gender": "man", "race": "east_asian", "age": 24} +{"id": 514, "file": "face_0514.jpg", "gender": "man", "race": "east_asian", "age": 24} +{"id": 515, "file": "face_0515.jpg", "gender": "man", "race": "east_asian", "age": 24} +{"id": 516, "file": "face_0516.jpg", "gender": "man", "race": "middle_eastern", "age": 32} +{"id": 517, "file": "face_0517.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 518, "file": "face_0518.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 519, "file": "face_0519.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 520, "file": "face_0520.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 521, "file": "face_0521.jpg", "gender": "man", "race": "caucasian", "age": 32} +{"id": 522, "file": "face_0522.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 523, "file": "face_0523.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 524, "file": "face_0524.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 525, "file": "face_0525.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 526, "file": "face_0526.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 527, "file": "face_0527.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 528, "file": "face_0528.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 529, "file": "face_0529.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 530, "file": "face_0530.jpg", "gender": "man", "race": "caucasian", "age": 35} +{"id": 531, "file": "face_0531.jpg", "gender": "man", "race": "caucasian", "age": 35} +{"id": 532, "file": "face_0532.jpg", "gender": "man", "race": "hispanic", "age": 24} +{"id": 533, "file": "face_0533.jpg", "gender": "man", "race": "hispanic", "age": 24} +{"id": 534, "file": "face_0534.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 535, "file": "face_0535.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 536, "file": "face_0536.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 537, "file": "face_0537.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 538, "file": "face_0538.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 539, "file": "face_0539.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 540, "file": "face_0540.jpg", "gender": "woman", "race": "east_asian", "age": 23} +{"id": 541, "file": "face_0541.jpg", "gender": "woman", "race": "east_asian", "age": 23} +{"id": 542, "file": "face_0542.jpg", "gender": "man", "race": "caucasian", "age": 47} +{"id": 543, "file": "face_0543.jpg", "gender": "man", "race": "caucasian", "age": 47} +{"id": 544, "file": "face_0544.jpg", "gender": "man", "race": "middle_eastern", "age": 48} +{"id": 545, "file": "face_0545.jpg", "gender": "man", "race": "middle_eastern", "age": 48} +{"id": 546, "file": "face_0546.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 547, "file": "face_0547.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 548, "file": "face_0548.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 549, "file": "face_0549.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 550, "file": "face_0550.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 551, "file": "face_0551.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 552, "file": "face_0552.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 553, "file": "face_0553.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 554, "file": "face_0554.jpg", "gender": "man", "race": "middle_eastern", "age": 32} +{"id": 555, "file": "face_0555.jpg", "gender": "man", "race": "middle_eastern", "age": 32} +{"id": 556, "file": "face_0556.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 557, "file": "face_0557.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 558, "file": "face_0558.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 559, "file": "face_0559.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 560, "file": "face_0560.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 561, "file": "face_0561.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 562, "file": "face_0562.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 563, "file": "face_0563.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 564, "file": "face_0564.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 565, "file": "face_0565.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 566, "file": "face_0566.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 567, "file": "face_0567.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 568, "file": "face_0568.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 569, "file": "face_0569.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 570, "file": "face_0570.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 571, "file": "face_0571.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 572, "file": "face_0572.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 573, "file": "face_0573.jpg", "gender": "man", "race": "middle_eastern", "age": 24} +{"id": 574, "file": "face_0574.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 575, "file": "face_0575.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 576, "file": "face_0576.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 577, "file": "face_0577.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 578, "file": "face_0578.jpg", "gender": "woman", "race": "east_asian", "age": 25} +{"id": 579, "file": "face_0579.jpg", "gender": "woman", "race": "east_asian", "age": 25} +{"id": 580, "file": "face_0580.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 581, "file": "face_0581.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 582, "file": "face_0582.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 583, "file": "face_0583.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 584, "file": "face_0584.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 585, "file": "face_0585.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 586, "file": "face_0586.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 587, "file": "face_0587.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 588, "file": "face_0588.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 589, "file": "face_0589.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 590, "file": "face_0590.jpg", "gender": "woman", "race": "hispanic", "age": 31} +{"id": 591, "file": "face_0591.jpg", "gender": "woman", "race": "hispanic", "age": 31} +{"id": 592, "file": "face_0592.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 593, "file": "face_0593.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 594, "file": "face_0594.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 595, "file": "face_0595.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 596, "file": "face_0596.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 597, "file": "face_0597.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 598, "file": "face_0598.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 599, "file": "face_0599.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 600, "file": "face_0600.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 601, "file": "face_0601.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 602, "file": "face_0602.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 603, "file": "face_0603.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 604, "file": "face_0604.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 605, "file": "face_0605.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 606, "file": "face_0606.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 607, "file": "face_0607.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 608, "file": "face_0608.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 609, "file": "face_0609.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 610, "file": "face_0610.jpg", "gender": "man", "race": "caucasian", "age": 47} +{"id": 611, "file": "face_0611.jpg", "gender": "man", "race": "caucasian", "age": 47} +{"id": 612, "file": "face_0612.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 613, "file": "face_0613.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 614, "file": "face_0614.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 615, "file": "face_0615.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 616, "file": "face_0616.jpg", "gender": "man", "race": "hispanic", "age": 43} +{"id": 617, "file": "face_0617.jpg", "gender": "man", "race": "hispanic", "age": 43} +{"id": 618, "file": "face_0618.jpg", "gender": "woman", "race": "caucasian", "age": 38} +{"id": 619, "file": "face_0619.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 620, "file": "face_0620.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 621, "file": "face_0621.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 622, "file": "face_0622.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 623, "file": "face_0623.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 624, "file": "face_0624.jpg", "gender": "man", "race": "east_asian", "age": 25} +{"id": 625, "file": "face_0625.jpg", "gender": "man", "race": "east_asian", "age": 25} +{"id": 626, "file": "face_0626.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 627, "file": "face_0627.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 628, "file": "face_0628.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 629, "file": "face_0629.jpg", "gender": "man", "race": "middle_eastern", "age": 34} +{"id": 630, "file": "face_0630.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 631, "file": "face_0631.jpg", "gender": "man", "race": "east_asian", "age": 26} +{"id": 632, "file": "face_0632.jpg", "gender": "man", "race": "hispanic", "age": 14, "excluded": "minor"} +{"id": 633, "file": "face_0633.jpg", "gender": "man", "race": "hispanic", "age": 14, "excluded": "minor"} +{"id": 634, "file": "face_0634.jpg", "gender": "man", "race": "caucasian", "age": 47} +{"id": 635, "file": "face_0635.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 636, "file": "face_0636.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 637, "file": "face_0637.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 638, "file": "face_0638.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 639, "file": "face_0639.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 640, "file": "face_0640.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 641, "file": "face_0641.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 642, "file": "face_0642.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 643, "file": "face_0643.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 644, "file": "face_0644.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 645, "file": "face_0645.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 646, "file": "face_0646.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 647, "file": "face_0647.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 648, "file": "face_0648.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 649, "file": "face_0649.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 650, "file": "face_0650.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 651, "file": "face_0651.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 652, "file": "face_0652.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 653, "file": "face_0653.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 654, "file": "face_0654.jpg", "gender": "woman", "race": "hispanic", "age": 32} +{"id": 655, "file": "face_0655.jpg", "gender": "woman", "race": "hispanic", "age": 32} +{"id": 656, "file": "face_0656.jpg", "gender": "man", "race": "east_asian", "age": 31} +{"id": 657, "file": "face_0657.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 658, "file": "face_0658.jpg", "gender": "man", "race": "hispanic", "age": 35} +{"id": 659, "file": "face_0659.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 660, "file": "face_0660.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 661, "file": "face_0661.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 662, "file": "face_0662.jpg", "gender": "woman", "race": "caucasian", "age": 22} +{"id": 663, "file": "face_0663.jpg", "gender": "woman", "race": "hispanic", "age": 20, "excluded": "minor"} +{"id": 664, "file": "face_0664.jpg", "gender": "man", "race": "east_asian", "age": 31} +{"id": 665, "file": "face_0665.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 666, "file": "face_0666.jpg", "gender": "man", "race": "hispanic", "age": 25} +{"id": 667, "file": "face_0667.jpg", "gender": "man", "race": "caucasian", "age": 45} +{"id": 668, "file": "face_0668.jpg", "gender": "man", "race": "caucasian", "age": 45} +{"id": 669, "file": "face_0669.jpg", "gender": "man", "race": "caucasian", "age": 45} +{"id": 670, "file": "face_0670.jpg", "gender": "man", "race": "caucasian", "age": 45} +{"id": 671, "file": "face_0671.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 672, "file": "face_0672.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 673, "file": "face_0673.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 674, "file": "face_0674.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 675, "file": "face_0675.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 676, "file": "face_0676.jpg", "gender": "man", "race": "east_asian", "age": 28} +{"id": 677, "file": "face_0677.jpg", "gender": "man", "race": "east_asian", "age": 28} +{"id": 678, "file": "face_0678.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 679, "file": "face_0679.jpg", "gender": "woman", "race": "caucasian", "age": 37} +{"id": 680, "file": "face_0680.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 681, "file": "face_0681.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 682, "file": "face_0682.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 683, "file": "face_0683.jpg", "gender": "woman", "race": "hispanic", "age": 30} +{"id": 684, "file": "face_0684.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 685, "file": "face_0685.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 686, "file": "face_0686.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 687, "file": "face_0687.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 688, "file": "face_0688.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 689, "file": "face_0689.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 690, "file": "face_0690.jpg", "gender": "man", "race": "caucasian", "age": 49} +{"id": 691, "file": "face_0691.jpg", "gender": "man", "race": "caucasian", "age": 49} +{"id": 692, "file": "face_0692.jpg", "gender": "man", "race": "caucasian", "age": 49} +{"id": 693, "file": "face_0693.jpg", "gender": "man", "race": "caucasian", "age": 49} +{"id": 694, "file": "face_0694.jpg", "gender": "man", "race": "east_asian", "age": 31} +{"id": 695, "file": "face_0695.jpg", "gender": "man", "race": "east_asian", "age": 31} +{"id": 696, "file": "face_0696.jpg", "gender": "man", "race": "east_asian", "age": 31} +{"id": 697, "file": "face_0697.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 698, "file": "face_0698.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 699, "file": "face_0699.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 700, "file": "face_0700.jpg", "gender": "woman", "race": "hispanic", "age": 46} +{"id": 701, "file": "face_0701.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 702, "file": "face_0702.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 703, "file": "face_0703.jpg", "gender": "woman", "race": "south_asian", "age": 26} +{"id": 704, "file": "face_0704.jpg", "gender": "man", "race": "middle_eastern", "age": 34} +{"id": 705, "file": "face_0705.jpg", "gender": "man", "race": "middle_eastern", "age": 39} +{"id": 706, "file": "face_0706.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 707, "file": "face_0707.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 708, "file": "face_0708.jpg", "gender": "woman", "race": "hispanic", "age": 46} +{"id": 709, "file": "face_0709.jpg", "gender": "man", "race": "caucasian", "age": 37} +{"id": 710, "file": "face_0710.jpg", "gender": "man", "race": "caucasian", "age": 37} +{"id": 711, "file": "face_0711.jpg", "gender": "woman", "race": "hispanic", "age": 30} +{"id": 712, "file": "face_0712.jpg", "gender": "woman", "race": "hispanic", "age": 30} +{"id": 713, "file": "face_0713.jpg", "gender": "woman", "race": "hispanic", "age": 28} +{"id": 714, "file": "face_0714.jpg", "gender": "woman", "race": "hispanic", "age": 28} +{"id": 715, "file": "face_0715.jpg", "gender": "woman", "race": "hispanic", "age": 28} +{"id": 716, "file": "face_0716.jpg", "gender": "man", "race": "middle_eastern", "age": 40} +{"id": 717, "file": "face_0717.jpg", "gender": "man", "race": "middle_eastern", "age": 40} +{"id": 718, "file": "face_0718.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 719, "file": "face_0719.jpg", "gender": "man", "race": "middle_eastern", "age": 40} +{"id": 720, "file": "face_0720.jpg", "gender": "woman", "race": "caucasian", "age": 22} +{"id": 721, "file": "face_0721.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 722, "file": "face_0722.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 723, "file": "face_0723.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 724, "file": "face_0724.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 725, "file": "face_0725.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 726, "file": "face_0726.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 727, "file": "face_0727.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 728, "file": "face_0728.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 729, "file": "face_0729.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 730, "file": "face_0730.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 731, "file": "face_0731.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 732, "file": "face_0732.jpg", "gender": "woman", "race": "east_asian", "age": 27} +{"id": 733, "file": "face_0733.jpg", "gender": "woman", "race": "east_asian", "age": 27} +{"id": 734, "file": "face_0734.jpg", "gender": "man", "race": "caucasian", "age": 20, "excluded": "minor"} +{"id": 735, "file": "face_0735.jpg", "gender": "man", "race": "caucasian", "age": 20, "excluded": "minor"} +{"id": 736, "file": "face_0736.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 737, "file": "face_0737.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 738, "file": "face_0738.jpg", "gender": "woman", "race": "east_asian", "age": 31} +{"id": 739, "file": "face_0739.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 740, "file": "face_0740.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 741, "file": "face_0741.jpg", "gender": "woman", "race": "east_asian", "age": 32} +{"id": 742, "file": "face_0742.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 743, "file": "face_0743.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 744, "file": "face_0744.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 745, "file": "face_0745.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 746, "file": "face_0746.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 747, "file": "face_0747.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 748, "file": "face_0748.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 749, "file": "face_0749.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 750, "file": "face_0750.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 751, "file": "face_0751.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 752, "file": "face_0752.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 753, "file": "face_0753.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 754, "file": "face_0754.jpg", "gender": "man", "race": "east_asian", "age": 23} +{"id": 755, "file": "face_0755.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 756, "file": "face_0756.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 757, "file": "face_0757.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 758, "file": "face_0758.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 759, "file": "face_0759.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 760, "file": "face_0760.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 761, "file": "face_0761.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 762, "file": "face_0762.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 763, "file": "face_0763.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 764, "file": "face_0764.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 765, "file": "face_0765.jpg", "gender": "man", "race": "middle_eastern", "age": 39} +{"id": 766, "file": "face_0766.jpg", "gender": "man", "race": "middle_eastern", "age": 39} +{"id": 767, "file": "face_0767.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 768, "file": "face_0768.jpg", "gender": "man", "race": "caucasian", "age": 31} +{"id": 769, "file": "face_0769.jpg", "gender": "woman", "race": "hispanic", "age": 25} +{"id": 770, "file": "face_0770.jpg", "gender": "woman", "race": "hispanic", "age": 25} +{"id": 771, "file": "face_0771.jpg", "gender": "man", "race": "east_asian", "age": 25} +{"id": 772, "file": "face_0772.jpg", "gender": "man", "race": "east_asian", "age": 25} +{"id": 773, "file": "face_0773.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 774, "file": "face_0774.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 775, "file": "face_0775.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 776, "file": "face_0776.jpg", "gender": "woman", "race": "caucasian", "age": 24} +{"id": 777, "file": "face_0777.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 778, "file": "face_0778.jpg", "gender": "man", "race": "hispanic", "age": 25} +{"id": 779, "file": "face_0779.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 780, "file": "face_0780.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 781, "file": "face_0781.jpg", "gender": "man", "race": "caucasian", "age": 39} +{"id": 782, "file": "face_0782.jpg", "gender": "man", "race": "caucasian", "age": 39} +{"id": 783, "file": "face_0783.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 784, "file": "face_0784.jpg", "gender": "woman", "race": "caucasian", "age": 34} +{"id": 785, "file": "face_0785.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 786, "file": "face_0786.jpg", "gender": "man", "race": "hispanic", "age": 27} +{"id": 787, "file": "face_0787.jpg", "gender": "man", "race": "hispanic", "age": 27} +{"id": 788, "file": "face_0788.jpg", "gender": "man", "race": "hispanic", "age": 27} +{"id": 789, "file": "face_0789.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 790, "file": "face_0790.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 791, "file": "face_0791.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 792, "file": "face_0792.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 793, "file": "face_0793.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 794, "file": "face_0794.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 795, "file": "face_0795.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 796, "file": "face_0796.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 797, "file": "face_0797.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 798, "file": "face_0798.jpg", "gender": "man", "race": "east_asian", "age": 25} +{"id": 799, "file": "face_0799.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 800, "file": "face_0800.jpg", "gender": "woman", "race": "black", "age": 35} +{"id": 801, "file": "face_0801.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 802, "file": "face_0802.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 803, "file": "face_0803.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 804, "file": "face_0804.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 805, "file": "face_0805.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 806, "file": "face_0806.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 807, "file": "face_0807.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 808, "file": "face_0808.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 809, "file": "face_0809.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 810, "file": "face_0810.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 811, "file": "face_0811.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 812, "file": "face_0812.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 813, "file": "face_0813.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 814, "file": "face_0814.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 815, "file": "face_0815.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 816, "file": "face_0816.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 817, "file": "face_0817.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 818, "file": "face_0818.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 819, "file": "face_0819.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 820, "file": "face_0820.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 821, "file": "face_0821.jpg", "gender": "woman", "race": "hispanic", "age": 26} +{"id": 822, "file": "face_0822.jpg", "gender": "woman", "race": "hispanic", "age": 26} +{"id": 823, "file": "face_0823.jpg", "gender": "man", "race": "middle_eastern", "age": 29} +{"id": 824, "file": "face_0824.jpg", "gender": "man", "race": "middle_eastern", "age": 29} +{"id": 825, "file": "face_0825.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 826, "file": "face_0826.jpg", "gender": "man", "race": "middle_eastern", "age": 26} +{"id": 827, "file": "face_0827.jpg", "gender": "man", "race": "middle_eastern", "age": 27} +{"id": 828, "file": "face_0828.jpg", "gender": "man", "race": "middle_eastern", "age": 27} +{"id": 829, "file": "face_0829.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 830, "file": "face_0830.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 831, "file": "face_0831.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 832, "file": "face_0832.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 833, "file": "face_0833.jpg", "gender": "man", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 834, "file": "face_0834.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 835, "file": "face_0835.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 836, "file": "face_0836.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 837, "file": "face_0837.jpg", "gender": "woman", "race": "caucasian", "age": 42} +{"id": 838, "file": "face_0838.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 839, "file": "face_0839.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 840, "file": "face_0840.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 841, "file": "face_0841.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 842, "file": "face_0842.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 843, "file": "face_0843.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 844, "file": "face_0844.jpg", "gender": "woman", "race": "caucasian", "age": 37} +{"id": 845, "file": "face_0845.jpg", "gender": "woman", "race": "caucasian", "age": 37} +{"id": 846, "file": "face_0846.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 847, "file": "face_0847.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 848, "file": "face_0848.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 849, "file": "face_0849.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 850, "file": "face_0850.jpg", "gender": "woman", "race": "hispanic", "age": 31} +{"id": 851, "file": "face_0851.jpg", "gender": "woman", "race": "hispanic", "age": 31} +{"id": 852, "file": "face_0852.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 853, "file": "face_0853.jpg", "gender": "woman", "race": "east_asian", "age": 35} +{"id": 854, "file": "face_0854.jpg", "gender": "woman", "race": "east_asian", "age": 35} +{"id": 855, "file": "face_0855.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 856, "file": "face_0856.jpg", "gender": "woman", "race": "caucasian", "age": 36} +{"id": 857, "file": "face_0857.jpg", "gender": "man", "race": "hispanic", "age": 46} +{"id": 858, "file": "face_0858.jpg", "gender": "man", "race": "hispanic", "age": 34} +{"id": 859, "file": "face_0859.jpg", "gender": "man", "race": "hispanic", "age": 34} +{"id": 860, "file": "face_0860.jpg", "gender": "man", "race": "hispanic", "age": 34} +{"id": 861, "file": "face_0861.jpg", "gender": "man", "race": "hispanic", "age": 34} +{"id": 862, "file": "face_0862.jpg", "gender": "woman", "race": "hispanic", "age": 39} +{"id": 863, "file": "face_0863.jpg", "gender": "woman", "race": "hispanic", "age": 39} +{"id": 864, "file": "face_0864.jpg", "gender": "woman", "race": "hispanic", "age": 39} +{"id": 865, "file": "face_0865.jpg", "gender": "woman", "race": "hispanic", "age": 39} +{"id": 866, "file": "face_0866.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 867, "file": "face_0867.jpg", "gender": "man", "race": "caucasian", "age": 22} +{"id": 868, "file": "face_0868.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 869, "file": "face_0869.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 870, "file": "face_0870.jpg", "gender": "man", "race": "caucasian", "age": 33} +{"id": 871, "file": "face_0871.jpg", "gender": "woman", "race": "hispanic", "age": 32} +{"id": 872, "file": "face_0872.jpg", "gender": "woman", "race": "hispanic", "age": 31} +{"id": 873, "file": "face_0873.jpg", "gender": "woman", "race": "caucasian", "age": 48} +{"id": 874, "file": "face_0874.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 875, "file": "face_0875.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 876, "file": "face_0876.jpg", "gender": "woman", "race": "caucasian", "age": 30} +{"id": 877, "file": "face_0877.jpg", "gender": "man", "race": "hispanic", "age": 44} +{"id": 878, "file": "face_0878.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 879, "file": "face_0879.jpg", "gender": "man", "race": "caucasian", "age": 23} +{"id": 880, "file": "face_0880.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 881, "file": "face_0881.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 882, "file": "face_0882.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 883, "file": "face_0883.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 884, "file": "face_0884.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 885, "file": "face_0885.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 886, "file": "face_0886.jpg", "gender": "man", "race": "east_asian", "age": 36} +{"id": 887, "file": "face_0887.jpg", "gender": "man", "race": "east_asian", "age": 36} +{"id": 888, "file": "face_0888.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 889, "file": "face_0889.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 890, "file": "face_0890.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 891, "file": "face_0891.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 892, "file": "face_0892.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 893, "file": "face_0893.jpg", "gender": "man", "race": "caucasian", "age": 24} +{"id": 894, "file": "face_0894.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 895, "file": "face_0895.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 896, "file": "face_0896.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 897, "file": "face_0897.jpg", "gender": "man", "race": "caucasian", "age": 27} +{"id": 898, "file": "face_0898.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 899, "file": "face_0899.jpg", "gender": "man", "race": "caucasian", "age": 28} +{"id": 900, "file": "face_0900.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 901, "file": "face_0901.jpg", "gender": "woman", "race": "east_asian", "age": 28} +{"id": 902, "file": "face_0902.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 903, "file": "face_0903.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 904, "file": "face_0904.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 905, "file": "face_0905.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 906, "file": "face_0906.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 907, "file": "face_0907.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 908, "file": "face_0908.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 909, "file": "face_0909.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 910, "file": "face_0910.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 911, "file": "face_0911.jpg", "gender": "woman", "race": "caucasian", "age": 27} +{"id": 912, "file": "face_0912.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 913, "file": "face_0913.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 914, "file": "face_0914.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 915, "file": "face_0915.jpg", "gender": "woman", "race": "east_asian", "age": 30} +{"id": 916, "file": "face_0916.jpg", "gender": "man", "race": "caucasian", "age": 30} +{"id": 917, "file": "face_0917.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 918, "file": "face_0918.jpg", "gender": "man", "race": "caucasian", "age": 35} +{"id": 919, "file": "face_0919.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 920, "file": "face_0920.jpg", "gender": "woman", "race": "caucasian", "age": 28} +{"id": 921, "file": "face_0921.jpg", "gender": "woman", "race": "caucasian", "age": 35} +{"id": 922, "file": "face_0922.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 923, "file": "face_0923.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 924, "file": "face_0924.jpg", "gender": "woman", "race": "caucasian", "age": 42} +{"id": 925, "file": "face_0925.jpg", "gender": "woman", "race": "caucasian", "age": 42} +{"id": 926, "file": "face_0926.jpg", "gender": "woman", "race": "caucasian", "age": 42} +{"id": 927, "file": "face_0927.jpg", "gender": "woman", "race": "caucasian", "age": 42} +{"id": 928, "file": "face_0928.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 929, "file": "face_0929.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 930, "file": "face_0930.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 931, "file": "face_0931.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 932, "file": "face_0932.jpg", "gender": "man", "race": "hispanic", "age": 29} +{"id": 933, "file": "face_0933.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 934, "file": "face_0934.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 935, "file": "face_0935.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 936, "file": "face_0936.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 937, "file": "face_0937.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 938, "file": "face_0938.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 939, "file": "face_0939.jpg", "gender": "man", "race": "caucasian", "age": 26} +{"id": 940, "file": "face_0940.jpg", "gender": "woman", "race": "caucasian", "age": 29} +{"id": 941, "file": "face_0941.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 942, "file": "face_0942.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 943, "file": "face_0943.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 944, "file": "face_0944.jpg", "gender": "woman", "race": "caucasian", "age": 33} +{"id": 945, "file": "face_0945.jpg", "gender": "man", "race": "caucasian", "age": 38} +{"id": 946, "file": "face_0946.jpg", "gender": "man", "race": "caucasian", "age": 38} +{"id": 947, "file": "face_0947.jpg", "gender": "man", "race": "caucasian", "age": 47} +{"id": 948, "file": "face_0948.jpg", "gender": "man", "race": "caucasian", "age": 47} +{"id": 949, "file": "face_0949.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 950, "file": "face_0950.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 951, "file": "face_0951.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 952, "file": "face_0952.jpg", "gender": "woman", "race": "caucasian", "age": 25} +{"id": 953, "file": "face_0953.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 954, "file": "face_0954.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 955, "file": "face_0955.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 956, "file": "face_0956.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 957, "file": "face_0957.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 958, "file": "face_0958.jpg", "gender": "woman", "race": "east_asian", "age": 26} +{"id": 959, "file": "face_0959.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 960, "file": "face_0960.jpg", "gender": "man", "race": "caucasian", "age": 25} +{"id": 961, "file": "face_0961.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 962, "file": "face_0962.jpg", "gender": "woman", "race": "caucasian", "age": 31} +{"id": 963, "file": "face_0963.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 964, "file": "face_0964.jpg", "gender": "woman", "race": "caucasian", "age": 26} +{"id": 965, "file": "face_0965.jpg", "gender": "woman", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 966, "file": "face_0966.jpg", "gender": "woman", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 967, "file": "face_0967.jpg", "gender": "woman", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 968, "file": "face_0968.jpg", "gender": "woman", "race": "caucasian", "age": 21, "excluded": "minor"} +{"id": 969, "file": "face_0969.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 970, "file": "face_0970.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 971, "file": "face_0971.jpg", "gender": "man", "race": "hispanic", "age": 37} +{"id": 972, "file": "face_0972.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 973, "file": "face_0973.jpg", "gender": "woman", "race": "east_asian", "age": 33} +{"id": 974, "file": "face_0974.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 975, "file": "face_0975.jpg", "gender": "woman", "race": "east_asian", "age": 29} +{"id": 976, "file": "face_0976.jpg", "gender": "man", "race": "middle_eastern", "age": 33} +{"id": 977, "file": "face_0977.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 978, "file": "face_0978.jpg", "gender": "woman", "race": "caucasian", "age": 32} +{"id": 979, "file": "face_0979.jpg", "gender": "woman", "race": "east_asian", "age": 37} +{"id": 980, "file": "face_0980.jpg", "gender": "woman", "race": "east_asian", "age": 37} +{"id": 981, "file": "face_0981.jpg", "gender": "man", "race": "middle_eastern", "age": 32} +{"id": 982, "file": "face_0982.jpg", "gender": "man", "race": "middle_eastern", "age": 32} +{"id": 983, "file": "face_0983.jpg", "gender": "man", "race": "caucasian", "age": 37} +{"id": 984, "file": "face_0984.jpg", "gender": "man", "race": "caucasian", "age": 37} +{"id": 985, "file": "face_0985.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 986, "file": "face_0986.jpg", "gender": "man", "race": "caucasian", "age": 34} +{"id": 987, "file": "face_0987.jpg", "gender": "man", "race": "east_asian", "age": 18, "excluded": "minor"} +{"id": 988, "file": "face_0988.jpg", "gender": "man", "race": "east_asian", "age": 18, "excluded": "minor"} +{"id": 989, "file": "face_0989.jpg", "gender": "man", "race": "caucasian", "age": 40} +{"id": 990, "file": "face_0990.jpg", "gender": "man", "race": "caucasian", "age": 40} +{"id": 991, "file": "face_0991.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 992, "file": "face_0992.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 993, "file": "face_0993.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 994, "file": "face_0994.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 995, "file": "face_0995.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 996, "file": "face_0996.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 997, "file": "face_0997.jpg", "gender": "woman", "race": "black", "age": 32} +{"id": 998, "file": "face_0998.jpg", "gender": "man", "race": "caucasian", "age": 29} +{"id": 999, "file": "face_0999.jpg", "gender": "man", "race": "caucasian", "age": 29} diff --git a/docs/ARCHITECTURE_COMPARISON.md b/docs/ARCHITECTURE_COMPARISON.md new file mode 100644 index 0000000..b13d6d4 --- /dev/null +++ b/docs/ARCHITECTURE_COMPARISON.md @@ -0,0 +1,46 @@ +# Lakehouse: Rust vs Go architecture comparison + +> **Source of truth lives in the golangLAKEHOUSE repo:** +> [`/home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md`](file:///home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md) +> +> J's living document — pulled from there into this repo's docs as +> a pointer so the comparison is reachable from either side. + +## Why the source lives in golangLAKEHOUSE + +The Go rewrite was the trigger for the comparison. The doc updates as +J ships fixes on either side, and most of the open backlog items +(materializer port, replay port, validators network surface) land in +the Go repo. Keeping the source there means PR auditing on Go +commits also catches doc drift. + +## When to update from this side + +If a fix lands in the Rust repo that changes a comparison value +(e.g. embed cache change, sidecar drop, new validator), update both: + +1. The source at `/home/profit/golangLAKEHOUSE/docs/ARCHITECTURE_COMPARISON.md` +2. The change log section at the bottom of the same file + +This file is a pointer — **do not put authoritative content here.** +Drift between two copies wastes the discipline. + +## Quick links + +- **Decisions tracker** — section near the top of the source file. + Lists actioned items + open backlog with LOC estimates. +- **Performance numbers** — Python dependency section. Updated each + time a load test is rerun. +- **Distillation porting status** — table of phase-by-phase port + state across runtimes. +- **Recommendation** — current working hypothesis on Go-primary vs + Rust-primary. Subject to change as fixes ship. + +## Last known state + +- **2026-05-01**: Rust embed cache shipped (`150cc3b`), 236× RPS gain. +- **2026-05-01**: Go validator port shipped (`b03521a`), production + safety net now on Go side. +- **Open**: Drop Rust Python sidecar (~200 LOC, universal-win). +- **Open**: Port Rust materializer to Go (~500-800 LOC, unblocks + Go-only end-to-end pipeline). diff --git a/docs/PHASE_AUDIT_GUIDE.md b/docs/PHASE_AUDIT_GUIDE.md new file mode 100644 index 0000000..aca1899 --- /dev/null +++ b/docs/PHASE_AUDIT_GUIDE.md @@ -0,0 +1,107 @@ +# Phase Audit Guidance for Claude Code + +## Purpose +This document provides the proper workflow for auditing completed phases in the Lakehouse project. + +## ⚠️ Important: Do NOT Skip Steps +Each phase requires BOTH: +1. PRD spec verification (check code exists) +2. Full SCRUM execution (6 commands) + +## Proper Phase Audit Workflow + +### Step 1: Read PRD Specification +For each phase, read the PRD to understand what's supposed to ship: +```bash +# Read from docs/PRD.md or docs/PHASES.md +cat docs/PHASES.md | grep -A20 "Phase N:" +``` + +### Step 2: Verify Code Exists +Check that each deliverable from the PRD spec has corresponding code: +```bash +# Example - check for specific implementations +grep -r "function_name" crates/*/src/ +ls crates/*/src/*.rs +``` + +### Step 3: Run Full SCRUM (6 Commands) +In order, execute ALL of these for the phase's crates: + +```bash +# 1. Build +cargo build -p + +# 2. Test +cargo test -p + +# 3. Clippy (if installed) +cargo clippy -p -- -D warnings + +# 4. Format check +cargo fmt -p -- --check + +# 5. Cargo check +cargo check -p + +# 6. Doc check +cargo doc -p --no-deps +``` + +### Step 4: Fix Issues +If any SCRUM command fails: +- Fix the code +- Re-run the failing command +- Re-run ALL 6 commands to verify + +### Step 5: Update Phase Documentation +Only mark as ✅ after ALL 6 SCRUM commands pass: +```markdown +## Phase N: [Name] ✅ +- [x] spec item 1 +- [x] spec item 2 + - SCRUM: build ✅ test ✅ clippy ✅ fmt ✅ check ✅ doc ✅ +``` + +## Current Phase Status + +| Phase | Status | Notes | +|-------|--------|-------| +| 0 | ✅ | Bootstrap complete | +| 1 | ✅ | Storage + Catalog | +| 2 | ✅ | Query Engine | +| 3 | ✅ | AI Integration | +| 4 | ✅ | Frontend | +| 5 | ✅ | Hardening | +| 6-42 | ✅ | See docs/PHASES.md | + +## Notes from Previous Session + +- Clippy and rustfmt are NOT installed on this system +- Run `rustup component add clippy rustfmt` to install +- Some crates have 0 unit tests (expected for service crates) +- 28 warnings remain in unused code paths (ui/vectord) + +## Key Files + +- `docs/PHASES.md` - Phase tracker with checkboxes +- `docs/PRD.md` - Full product requirements +- `docs/CONTROL_PLANE_PRD.md` - Phases 38+ specifications +- `crates/*/` - All crate implementations + +## Quick Reference + +```bash +# Full workspace SCRUM +cargo build --workspace +cargo test --workspace +# (clippy if installed) +cargo fmt -- --check +cargo check --workspace +cargo doc --no-deps + +# Per-crate +cargo build -p +cargo test -p +cargo check -p +``` \ No newline at end of file diff --git a/lakehouse.toml b/lakehouse.toml index 19061a1..1c1582f 100644 --- a/lakehouse.toml +++ b/lakehouse.toml @@ -3,6 +3,15 @@ [gateway] host = "0.0.0.0" port = 3100 +# Coordinator session JSONL — one row per /v1/iterate session for +# offline DuckDB analysis. Cross-runtime parity with the Go-side +# [validatord].session_log_path. Set to the SAME path Go validatord +# writes to so DuckDB queries see one unified longitudinal stream +# across both runtimes (rows are tagged daemon="gateway" vs +# daemon="validatord" so producers stay distinguishable). Append-write +# is atomic at the row sizes both runtimes produce — both daemons +# co-writing is safe. +session_log_path = "/tmp/lakehouse-validator/sessions.jsonl" [storage] root = "./data" @@ -44,12 +53,22 @@ manifest_prefix = "_catalog/manifests" # max_rows_per_query = 10000 [sidecar] -url = "http://localhost:3200" +# Post-2026-05-02: AiClient talks directly to Ollama; the Python +# sidecar's hot-path role (~120 LOC of pure Ollama wrappers) was +# retired. Field name kept for migration compat — value now points +# at Ollama on :11434. Lab UI + pipeline_lab Python remains as a +# dev-only tool, NOT on this URL. +url = "http://localhost:11434" [ai] embed_model = "nomic-embed-text" -gen_model = "qwen2.5" -rerank_model = "qwen2.5" +# Local-tier defaults bumped 2026-04-30: qwen3.5:latest is the +# stronger local rung in the 5-loop substrate (per +# project_small_model_pipeline_vision.md). Same JSON-clean property +# as qwen2.5, more capacity. Ollama still serves both — bump back +# in this file if a workload regressed. +gen_model = "qwen3.5:latest" +rerank_model = "qwen3.5:latest" [auth] enabled = false @@ -72,7 +91,9 @@ min_recall = 0.9 # never promote below this max_trials_per_hour = 20 # hard budget cap # Model roster — available for profile hot-swap +# qwen3.5:latest: stronger local rung — JSON-clean, 8K+ context, +# default for gen_model and rerank_model # qwen3: 8.2B, 40K context, thinking+tools, best for reasoning tasks -# qwen2.5: 7B, 8K context, fast, good for SQL generation -# mistral: 7B, 8K context, good for general generation +# qwen2.5: 7B, 8K context, fast — kept loaded for the 2026-04 era +# comparison runs; new defaults use qwen3.5:latest # nomic-embed-text: 137M, embedding-only, used by all profiles diff --git a/mcp-server/console.html b/mcp-server/console.html index eada43c..250fc0b 100644 --- a/mcp-server/console.html +++ b/mcp-server/console.html @@ -51,9 +51,28 @@ details .body{padding-top:10px;font-size:12px;color:#8b949e} .accent-b{border-left:3px solid #1f6feb} .accent-a{border-left:3px solid #bc8cff} .accent-w{border-left:3px solid #d29922} +.accent-g{border-left:3px solid #3fb950} +.accent-r{border-left:3px solid #f85149} -.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px} -.worker .av{width:28px;height:28px;border-radius:6px;background:#1a2744;display:flex;align-items:center;justify-content:center;font-weight:600;color:#e6edf3;font-size:10px;flex-shrink:0} +.worker{display:flex;align-items:center;gap:10px;padding:8px 10px;background:#161b22;border-radius:6px;margin-bottom:4px;font-size:12px;border-left:3px solid #30363d} +.worker .av{width:32px;height:32px;border-radius:50%;background:#0d1117;border:1px solid #21262d;display:flex;align-items:center;justify-content:center;font-weight:600;color:#c9d1d9;font-size:11px;flex-shrink:0;letter-spacing:0.5px;overflow:hidden;position:relative} +.worker .av img{position:absolute;inset:0;width:100%;height:100%;object-fit:cover;display:block; + /* Softening — mirror of search.html. Pulls saturation + contrast off + the SDXL Turbo over-render so faces feel less "AI-generated". + If you tweak one, tweak the other. */ + filter: saturate(0.86) contrast(0.93) brightness(1.02) blur(0.3px); +} +.worker[data-role-band="warehouse"]{border-left-color:#58a6ff} +.worker[data-role-band="production"]{border-left-color:#d29922} +.worker[data-role-band="trades"]{border-left-color:#bc8cff} +.worker[data-role-band="driver"]{border-left-color:#3fb950} +.worker[data-role-band="lead"]{border-left-color:#f0883e} +.role-pill{display:inline-block;font-size:9px;padding:1px 7px;border-radius:3px;background:#0d1117;color:#8b949e;margin-right:6px;font-weight:600;letter-spacing:0.4px;text-transform:uppercase;border-left:2px solid #30363d;vertical-align:1px} +.role-pill[data-rb="warehouse"]{border-left-color:#58a6ff;color:#79c0ff} +.role-pill[data-rb="production"]{border-left-color:#d29922;color:#e3b341} +.role-pill[data-rb="trades"]{border-left-color:#bc8cff;color:#d2a8ff} +.role-pill[data-rb="driver"]{border-left-color:#3fb950;color:#56d364} +.role-pill[data-rb="lead"]{border-left-color:#f0883e;color:#ffa657} .worker .info{flex:1;min-width:0} .worker .nm{color:#e6edf3;font-weight:500} .worker .why{color:#545d68;font-size:11px;margin-top:1px} @@ -95,6 +114,7 @@ details .body{padding-top:10px;font-size:12px;color:#8b949e}