Skip to main content
Retrieval embeds the query, searches the vector store, and returns the closest chunks.
embedder = rag.embedder(provider="voyage", model="voyage-3", input_type="query")

result = rag.retrieve(
    "How do I authenticate?",
    embedder=embedder,
    vector_db="qdrant",
    collection="docs",
    url="http://localhost:6333",
    top_k=10,
)
for chunk in result.items:
    print(chunk.score, chunk.text)

Tune it by symptom

SymptomFix
Not enough / too many resultsAdjust top_k
Right topic, wrong rankingTurn on reranking
Conversational follow-ups missTurn on query rewriting

Reranking

Why: vector search is fast but rough. A reranker re-reads the top candidates and reorders them by true relevance. Use it when precision matters more than speed.
reranker = rag.reranker(provider="voyage", model="rerank-2-lite")

result = rag.retrieve(
    "How do I authenticate?",
    embedder=embedder,
    vector_db="qdrant", collection="docs", url="http://localhost:6333",
    top_k=20,                 # retrieve wide
    use_rerank=True,
    reranker=reranker,
    rerank_top_k=5,           # keep the best few
)
Retrieve wide (top_k=20), rerank down (rerank_top_k=5). The reranker needs candidates to choose from.

Query rewriting

Why: follow-ups like “what about the second one?” mean nothing to a search engine alone. Query rewriting uses context to expand them into standalone questions.
rewrite_llm = rag.llm(provider="openai", model="gpt-4o-mini")

result = rag.retrieve(
    "What about the second step?",
    embedder=embedder,
    vector_db="qdrant", collection="docs", url="http://localhost:6333",
    use_query_rewrite=True,
    rewrite_llm=rewrite_llm,
    session_context="User is asking about the onboarding flow.",
)
result.search_query  # the rewritten query actually searched

Reference

Full parameters: SDK retrieval · CLI · REST.