HideView.
← All practice
Data

Embedding-Based Matching with Re-Rank

pgvector candidate generation, structured re-ranking with hard constraints, per-result rationale.

Outcome

Matching systems that combine semantic similarity with hard constraints and operator preferences, surfacing ranked results with structured rationales the user can inspect — and trust conditionally rather than blindly.

Vector + structured re-rank
Pattern
Per-result rationale
Trust model
Hard-constraint filter, not silent demotion
Failure mode
Technologies
PostgrespgvectorOpenAI embeddingsStructured filter layerOperator must-have rulesPer-result rationale
Problem

Pure cosine similarity sounds like the answer. It isn't. It surfaces semantically similar candidates that fail real-world constraints — wrong location, wrong authorization, wrong must-have. Users lose trust in the rank the first time it happens, and the product never recovers.

How it's built
  • Embed candidates and queries with OpenAI embeddings into pgvector columns indexed for cosine similarity
  • Use vector search as the candidate generator; return top-N
  • Re-rank with hard constraints (filters that drop unqualified results) and soft constraints (bounded score adjustments)
  • Surface each result with a per-result rationale: "Embedding fit 0.82. +0.08 for location match. Failed: salary band too high."

Embedding similarity is a candidate generator, not an answer. The product lives in the re-rank — the boring, structured rules that turn semantic similarity into a ranking your users can actually trust.

Hard constraints filter; soft constraints adjust. Hard: citizenship, location, work authorization, salary band. Soft: years of experience close to target, recency, location radius. Each adjustment is bounded so no single soft factor can dominate the embedding signal.

The breakdown is the trust. Every ranked result includes a structured rationale showing which inputs moved the score. Without the rationale, users either trust the rank entirely (and are wrong sometimes) or distrust it entirely (and fall back to keyword search). With the rationale, they trust the rank conditionally — exactly the right relationship to have with a probabilistic system.

What I'd tell someone about to build this
  • Embedding rank is not the product. The re-rank with hard constraints is.
  • Surface the rationale. Trust in probabilistic systems should be conditional, not blind.
  • Treat the re-ranker as software with tests, not as a model. You control its behavior.

Want this for your product?

Let's talk about what you're trying to ship.

Book a call →
More practice