2 Commits

Author SHA1 Message Date
root
546c7b081f Fix staffing simulation verifier + clean regression: 0 hallucinations
Verifier was checking claims={"name": ""} against actual names,
producing false-positive hallucinations on every RAG source. Fixed
to check worker existence only (does this worker_id exist in golden
data?). Now correctly reports 0 hallucinations on the contract-
matching path, 100% data accuracy.

Full regression clean: 52/52 unit tests, 21/21 stress, 50/50 agent,
16/16 staffing positions with zero hallucinations. Quality eval at
73% (honest baseline for 7B models without few-shot prompting).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 23:28:54 -05:00
root
10383b40b7 Staffing day simulation — multi-agent stress test on 10K Ethereal workers
5 contracts, 16 positions, 10K worker pool. Four agents: Matcher (SQL
+ vector hybrid), Communicator (LLM SMS drafts), Verifier (fact-checks
against golden data), Analyzer (RAG intelligence questions).

Results:
  - SQL matching: 16/16 positions filled, ZERO hallucinations. Every
    worker's name, role, city, state, certifications, and reliability
    score verified against the golden dataset.
  - SMS generation: 16/16 messages drafted with correct worker names.
  - RAG intelligence: retrieval returns semantically similar but
    structurally wrong workers (wrong state, wrong archetype) because
    vector search can't do structured filtering. LLM correctly reports
    context limitations — doesn't hallucinate beyond retrieved chunks.

Key finding: SQL path is production-ready. RAG path needs hybrid
SQL+vector routing — SQL for structured constraints (state, role,
cert, reliability), vector for semantic similarity. That's the
architectural gap to close.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 22:31:54 -05:00