Retrieval embeds the query, searches the vector store, and returns the closest chunks.
embedder = rag.embedder(provider="voyage", model="voyage-3", input_type="query")
result = rag.retrieve(
"How do I authenticate?",
embedder=embedder,
vector_db="qdrant",
collection="docs",
url="http://localhost:6333",
top_k=10,
)
for chunk in result.items:
print(chunk.score, chunk.text)
Tune it by symptom
| Symptom | Fix |
|---|
| Not enough / too many results | Adjust top_k |
| Right topic, wrong ranking | Turn on reranking |
| Conversational follow-ups miss | Turn on query rewriting |
Reranking
Why: vector search is fast but rough. A reranker re-reads the top candidates and reorders them by true relevance. Use it when precision matters more than speed.
reranker = rag.reranker(provider="voyage", model="rerank-2-lite")
result = rag.retrieve(
"How do I authenticate?",
embedder=embedder,
vector_db="qdrant", collection="docs", url="http://localhost:6333",
top_k=20, # retrieve wide
use_rerank=True,
reranker=reranker,
rerank_top_k=5, # keep the best few
)
Retrieve wide (top_k=20), rerank down (rerank_top_k=5). The reranker needs candidates to choose from.
Query rewriting
Why: follow-ups like “what about the second one?” mean nothing to a search engine alone. Query rewriting uses context to expand them into standalone questions.
rewrite_llm = rag.llm(provider="openai", model="gpt-4o-mini")
result = rag.retrieve(
"What about the second step?",
embedder=embedder,
vector_db="qdrant", collection="docs", url="http://localhost:6333",
use_query_rewrite=True,
rewrite_llm=rewrite_llm,
session_context="User is asking about the onboarding flow.",
)
result.search_query # the rewritten query actually searched
Reference
Full parameters: SDK retrieval · CLI · REST.