In a conversation, people ask follow-ups like “what about the second one?”, which is meaningless to a search engine on its own. Query rewriting uses the chat history and context to expand it into a standalone question like “what is the second step of authentication?” before retrieval runs.
Use it when you’re building chat or multi-turn search. Skip it for one-shot, self-contained queries, since it adds an LLM call.
In retrieval
rewrite_llm = rag.llm(provider="openai", model="gpt-4o-mini")
result = rag.retrieve(
"What about the second step?",
embedder=embedder,
vector_db="qdrant", collection="docs", url="http://localhost:6333",
use_query_rewrite=True,
rewrite_llm=rewrite_llm,
session_context="User is asking about the onboarding flow.",
)
result.search_query # the rewritten query actually searched
In chat
from ragrails import QueryRewriteConfig
rag.chat(..., query_rewrite=QueryRewriteConfig(enabled=True, session_context="Onboarding flow"))
Rewriting runs an extra LLM call. Use a small, cheap model (
gpt-4o-mini) via
rewrite_llm, since it’s a short, mechanical task. See
cost optimization.
Reference: SDK retrieval · SDK chat.