Vector search is fast but rough. It casts a wide net and ranks only by embedding similarity. A reranker re-reads the top candidates against the query and reorders them by true relevance.
Use it when a wrong answer is expensive and precision matters more than speed. Skip it when latency is critical or results are already good.
How to use it
Retrieve wide, rerank down:
from ragrails import RagRails
rag = RagRails()
embedder = rag.embedder(provider="voyage", model="voyage-3", input_type="query")
reranker = rag.reranker(provider="voyage", model="rerank-2-lite")
result = rag.retrieve(
"How do I authenticate?",
embedder=embedder,
vector_db="qdrant", collection="docs", url="http://localhost:6333",
top_k=20, # retrieve a wide candidate set
use_rerank=True,
reranker=reranker,
rerank_top_k=5, # keep the best few
)
The reranker needs candidates to choose from, so set top_k higher than rerank_top_k (retrieve 20, keep 5).
Choosing a reranker
| Provider | Cost | Best for |
|---|
voyage | API call per query | Semantic relevance; catches matches with no shared keywords |
bm25 | Free, local, no network | Exact-term matching; offline or cost-sensitive setups |
reranker = rag.reranker(provider="bm25", model="bm25") # local, no API key
Reference: SDK retrieval · reranker() providers.