Addresses the reality-test gap surfaced by the candidates and
multi-corpus e2e runs (0d1553c, a97881d): semantic-only retrieval
can't gate by status / state / availability. SearchRequest now
takes an optional MetadataFilter map; results whose metadata
doesn't match every key are dropped before top-K truncation.
Filter value semantics:
string|number|bool → exact equality (JSON-canonical, so 1 ≡ 1.0)
[]any → OR within key (any element matching wins)
AND across keys: every filter key must match.
Missing key in metadata = drop. Malformed metadata = drop. Filter
absent or empty = pass through (zero overhead).
The response now reports MetadataFilterDropped so callers can see
how aggressive the filter was without re-querying.
Caveat (also captured in code comment): this is POST-retrieval, not
PRE-filtering via SQL. Aggressive filters can shrink the result set
below K; caller should bump PerCorpusK to compensate. A queryd-
backed pre-filter is a future commit; this lands the user-visible
fix today.
Tests:
- 7 unit tests (internal/matrix/filter_test.go) covering: nil/
empty filter pass-through, missing-metadata always-fails,
single-value exact match (incl. numeric 5 ≡ 5.0), AND across
keys, OR within list, bool match, malformed JSON metadata
- matrix_smoke.sh: new assertion #7 — filter
label∈{"a near","b near"} drops the 4 mid/far entries from the
6-entry pool, keeping exactly 2 (one per corpus, both with the
matching label). Dropped count surfaces in the response.
15-smoke regression all green. vet clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>