Reranker
cross-encoderA second-stage model that re-scores the top results from a fast retriever to push the most relevant ones to the top.
In one line
A second-stage model that re-scores the top results from a fast retriever to push the most relevant ones to the top.
What it actually means
Bi-encoders (the embedding models you use for the first retrieval pass) are fast because they encode the query and the document independently and just compare vectors. Cross-encoder rerankers concatenate the query and each candidate document and score them together with a transformer, which is slower per pair but much more accurate. The standard pattern is “retrieve top 50 with the bi-encoder, rerank with a cross-encoder, keep top 5”. Cohere Rerank, BGE Reranker, and Jina Reranker are common choices.
Why it matters
Adding a reranker is the single highest-leverage upgrade for most RAG systems that already work. Recall@50 is usually fine; the win is moving the right chunk from rank 30 into rank 1 so the LLM actually sees it. The cost is a few hundred milliseconds and one extra model call per query.
Example
candidates = vector_store.search(query, top_k=50)
scored = reranker.rank(query, [c.text for c in candidates])
top = [candidates[s.index] for s in scored[:5]]
You’ll hear it when
- Improving an existing RAG pipeline that’s “almost right”.
- Debating BM25 + reranker vs vector + reranker.
- Profiling latency budgets for retrieval.
- Picking a hosted vs self-hosted reranker.
- Discussing two-stage retrieval at any search talk.