matrix-agent-validated/docs/ADR-021-sparse-data-trust.md

# ADR-021: Sparse Data Trust Path — start with nothing, earn everything

**Status:** Proposed — 2026-04-17
**Triggered by:** Legacy staffing company pushback: "we don't have that data"
**Owner:** J

---

## The Problem

We demonstrated a system with rich worker profiles (18 fields,
behavioral scores, certifications, communication history). The
staffing company said: "We don't have any of that. We have a name,
a phone number, and a contract."

They're right. Our demo assumed data that doesn't exist in their
world. Showing AI that only works with perfect data is worse than
useless — it builds distrust.

## Their Reality

Day 1 data for a typical worker:
- Name
- Phone number
- Maybe: city, role, one or two skills mentioned on a call

Day 1 data for a contract:
- Client name
- Role needed
- Headcount
- Start date
- Maybe: required certs, location

That's it. No reliability scores. No availability metrics. No
communication history. No certifications in a database. The staffing
coordinator carries that knowledge in their HEAD.

## The Trust Path

### Phase 1: Work with what they have (Day 1)
- Accept sparse profiles: name + phone + role. That's enough.
- Match contracts to workers by role + location only. No scores.
- The system is useful immediately: "here are the 12 people you
  have tagged as Forklift Operators in IL." That's already faster
  than scrolling a spreadsheet.
- Don't show empty fields. Don't show 0% bars. Don't show what's
  missing — show what's THERE.

### Phase 2: Earn data through use (Week 1-4)
- Every placement generates a timesheet → reliability starts building
- Every call logged → communication history accumulates
- Every check-in → availability becomes real
- Every cert verified → certification database grows
- The staffer doesn't "enter data" — they just do their job, and
  the system learns from it.

### Phase 3: AI starts helping (Month 2+)
- Enough data to actually score workers → show reliability
- Enough history to predict availability → surface it
- Enough placements to know client preferences → suggest matches
- The enrichment happened BECAUSE they used the system, not as a
  prerequisite TO use it.

## What This Means for the UI

- Worker cards must gracefully degrade. Name only? Show name only.
  Name + role? Show name + role. Full profile? Show everything.
- Never show empty metrics. No "Reliability: 0%" — that looks like
  the worker is unreliable. Just don't show it until there's data.
- Lead with what the staffer KNOWS: "you placed this worker at
  Company X last month" — that's information they trust because they
  lived it.

## What This Means for the Architecture

- The vector index works on whatever text exists. A name + role is
  200 characters. That's enough for an embedding. As more data
  arrives, the embeddings get richer and search gets better.
- The hybrid search works with sparse SQL filters too. "role =
  'Forklift Operator'" is a valid filter even without reliability.
- The playbook system captures the staffer's decisions: "you picked
  this worker for this contract" → that IS the training data for
  future AI recommendations.

## Implementation

1. Sparse profile ingest: accept CSV with as few as 2 columns
   (name, phone). Everything else optional.
2. Graceful UI degradation: worker cards only show fields that exist
3. Progressive enrichment: timesheet ingest → auto-calculate
   reliability; check-in ingest → auto-calculate availability
4. Trust indicators: "based on 3 placements" not "Reliability: 87%"
   — show WHERE the number comes from