Skip to main content
In a conversation, people ask follow-ups like “what about the second one?”, which is meaningless to a search engine on its own. Query rewriting uses the chat history and context to expand it into a standalone question like “what is the second step of authentication?” before retrieval runs.
Use it when you’re building chat or multi-turn search. Skip it for one-shot, self-contained queries, since it adds an LLM call.

In retrieval

rewrite_llm = rag.llm(provider="openai", model="gpt-4o-mini")

result = rag.retrieve(
    "What about the second step?",
    embedder=embedder,
    vector_db="qdrant", collection="docs", url="http://localhost:6333",
    use_query_rewrite=True,
    rewrite_llm=rewrite_llm,
    session_context="User is asking about the onboarding flow.",
)
result.search_query   # the rewritten query actually searched

In chat

from ragrails import QueryRewriteConfig

rag.chat(..., query_rewrite=QueryRewriteConfig(enabled=True, session_context="Onboarding flow"))
Rewriting runs an extra LLM call. Use a small, cheap model (gpt-4o-mini) via rewrite_llm, since it’s a short, mechanical task. See cost optimization.
Reference: SDK retrieval · SDK chat.