Skip to main content
Vector search is fast but rough. It casts a wide net and ranks only by embedding similarity. A reranker re-reads the top candidates against the query and reorders them by true relevance.
Use it when a wrong answer is expensive and precision matters more than speed. Skip it when latency is critical or results are already good.

How to use it

Retrieve wide, rerank down:
from ragrails import RagRails

rag = RagRails()
embedder = rag.embedder(provider="voyage", model="voyage-3", input_type="query")
reranker = rag.reranker(provider="voyage", model="rerank-2-lite")

result = rag.retrieve(
    "How do I authenticate?",
    embedder=embedder,
    vector_db="qdrant", collection="docs", url="http://localhost:6333",
    top_k=20,             # retrieve a wide candidate set
    use_rerank=True,
    reranker=reranker,
    rerank_top_k=5,       # keep the best few
)
The reranker needs candidates to choose from, so set top_k higher than rerank_top_k (retrieve 20, keep 5).

Choosing a reranker

ProviderCostBest for
voyageAPI call per querySemantic relevance; catches matches with no shared keywords
bm25Free, local, no networkExact-term matching; offline or cost-sensitive setups
reranker = rag.reranker(provider="bm25", model="bm25")  # local, no API key
Reference: SDK retrieval · reranker() providers.