Skip to main content
Chat runs retrieval and answer generation together, with conversation history.
llm = rag.llm(provider="openai", model="gpt-4o-mini")
embedder = rag.embedder(provider="voyage", model="voyage-3", input_type="query")

result = rag.chat(
    "How do I authenticate?",
    llm=llm,
    embedder=embedder,
    vector_db="qdrant", collection="docs", url="http://localhost:6333",
    history=[],
)
print(result.answer)
Chat is stateless. Ragrails never stores conversations. You pass history in and get the updated history back, so you decide where it lives (session, database, memory). Safe for multi-user apps.

Tuning knobs

Each is a config object. Use them by symptom:
SymptomConfigWhat it does
Follow-ups miss contextQueryRewriteConfigExpands follow-ups into standalone questions.
Long chats slow / costlyHistoryCompactionConfigSummarizes old turns, keeps recent ones.
Small talk triggers searchIntentRoutingConfigSkips retrieval for non-questions.
Model answers with weak contextChatRetrievalQualityConfigSets confidence thresholds; answers with caution or refuses.

Query rewriting

Why: “what about the second one?” is meaningless to search alone. Rewriting uses history to make it standalone.
from ragrails import QueryRewriteConfig
rag.chat(..., query_rewrite=QueryRewriteConfig(enabled=True))

History compaction

Why: long conversations overflow the context window and cost more every turn. Compaction summarizes old messages and keeps recent ones verbatim.
from ragrails import HistoryCompactionConfig
rag.chat(..., history_compaction=HistoryCompactionConfig(history_limit=15, keep_recent=5))

Intent routing

Why: “Thanks!” doesn’t need a database lookup. Routing skips retrieval for small talk, which is faster and avoids irrelevant context.
from ragrails import IntentRoutingConfig
rag.chat(..., intent_routing=IntentRoutingConfig(enabled=True))

Retrieval quality

Why: sometimes your index just doesn’t have the answer. Rather than hallucinate, Ragrails scores the retrieved context and can answer with caution or refuse.
from ragrails import ChatRetrievalQualityConfig
rag.chat(..., retrieval_quality=ChatRetrievalQualityConfig(
    min_retrieval_score=0.35,
    low_confidence_mode="answer_with_caution",  # or "ask_clarifying_question", "refuse_grounded_answer"
))

What you get back

FieldUse it for
answerThe grounded reply
historyPass to the next turn
sourcesShow citations
intent"rag" or "direct"
answer_confidenceFlag low-confidence answers in your UI

Reference

Full parameters: SDK chat · CLI · REST.