root 13b01fee9f ADR-021: Sparse data trust path — start with nothing, earn everything

The staffing company said: 'we don't have any of that data.'
They're right. We showed a demo with 18-field profiles and they
have a name and a phone number.

This ADR documents the trust path:
  Phase 1 (Day 1): Work with name + phone + role. That's enough.
  Phase 2 (Week 1-4): Timesheets → reliability. Calls → history.
  Phase 3 (Month 2+): AI starts helping with real earned data.

Key principles:
- Never show empty fields or 0% bars
- Show what's THERE, not what's missing
- Trust indicators: 'based on 3 placements' not just 'Reliability: 87%'
- The system earns data by being useful, not by demanding it upfront

Also created sparse_workers dataset (200 workers, 74% have role,
34% have notes, 5 have ONLY name+phone) for realistic testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-17 15:32:06 -05:00

3.5 KiB

Raw Blame History

ADR-021: Sparse Data Trust Path — start with nothing, earn everything

Status: Proposed — 2026-04-17 Triggered by: Legacy staffing company pushback: "we don't have that data" Owner: J

The Problem

We demonstrated a system with rich worker profiles (18 fields, behavioral scores, certifications, communication history). The staffing company said: "We don't have any of that. We have a name, a phone number, and a contract."

They're right. Our demo assumed data that doesn't exist in their world. Showing AI that only works with perfect data is worse than useless — it builds distrust.

Their Reality

Day 1 data for a typical worker:

Name
Phone number
Maybe: city, role, one or two skills mentioned on a call

Day 1 data for a contract:

Client name
Role needed
Headcount
Start date
Maybe: required certs, location

That's it. No reliability scores. No availability metrics. No communication history. No certifications in a database. The staffing coordinator carries that knowledge in their HEAD.

The Trust Path

Phase 1: Work with what they have (Day 1)

Accept sparse profiles: name + phone + role. That's enough.
Match contracts to workers by role + location only. No scores.
The system is useful immediately: "here are the 12 people you have tagged as Forklift Operators in IL." That's already faster than scrolling a spreadsheet.
Don't show empty fields. Don't show 0% bars. Don't show what's missing — show what's THERE.

Phase 2: Earn data through use (Week 1-4)

Every placement generates a timesheet → reliability starts building
Every call logged → communication history accumulates
Every check-in → availability becomes real
Every cert verified → certification database grows
The staffer doesn't "enter data" — they just do their job, and the system learns from it.

Phase 3: AI starts helping (Month 2+)

Enough data to actually score workers → show reliability
Enough history to predict availability → surface it
Enough placements to know client preferences → suggest matches
The enrichment happened BECAUSE they used the system, not as a prerequisite TO use it.

What This Means for the UI

Worker cards must gracefully degrade. Name only? Show name only. Name + role? Show name + role. Full profile? Show everything.
Never show empty metrics. No "Reliability: 0%" — that looks like the worker is unreliable. Just don't show it until there's data.
Lead with what the staffer KNOWS: "you placed this worker at Company X last month" — that's information they trust because they lived it.

What This Means for the Architecture

The vector index works on whatever text exists. A name + role is 200 characters. That's enough for an embedding. As more data arrives, the embeddings get richer and search gets better.
The hybrid search works with sparse SQL filters too. "role = 'Forklift Operator'" is a valid filter even without reliability.
The playbook system captures the staffer's decisions: "you picked this worker for this contract" → that IS the training data for future AI recommendations.

Implementation

Sparse profile ingest: accept CSV with as few as 2 columns (name, phone). Everything else optional.
Graceful UI degradation: worker cards only show fields that exist
Progressive enrichment: timesheet ingest → auto-calculate reliability; check-in ingest → auto-calculate availability
Trust indicators: "based on 3 placements" not "Reliability: 87%" — show WHERE the number comes from

3.5 KiB Raw Blame History