Skip to main content
chat() takes config objects that control how it handles conversation flow and low-quality retrieval. Each targets a specific failure mode.

Conversation management

History compaction

Long conversations overflow the LLM’s context window and cost more on every turn. Compaction summarizes older messages into a short recap and keeps recent turns verbatim.
from ragrails import HistoryCompactionConfig

rag.chat(..., history_compaction=HistoryCompactionConfig(
    enabled=True,
    history_limit=15,   # summarize once history passes this many turns
    keep_recent=5,      # keep the last 5 turns verbatim
))
Compaction is also a cost lever: fewer tokens per turn means lower spend on long chats. See cost optimization.

Intent routing

Not every message needs a database lookup. “Thanks!” or “who are you?” should get a direct reply. Intent routing detects these and skips retrieval, which is faster, cheaper, and avoids irrelevant context.
from ragrails import IntentRoutingConfig

rag.chat(..., intent_routing=IntentRoutingConfig(enabled=True))  # on by default
result.intent is "rag" (retrieval ran) or "direct" (answered without retrieval).

Answer quality

Sometimes your index simply doesn’t contain the answer. Rather than hallucinate, Ragrails scores the retrieved context and acts on low confidence.
from ragrails import ChatRetrievalQualityConfig

rag.chat(..., retrieval_quality=ChatRetrievalQualityConfig(
    min_retrieval_score=0.35,
    min_rerank_score=0.50,
    low_confidence_mode="answer_with_caution",
    max_context_chunks=None,
))
low_confidence_modeBehaviour when context is weak
answer_with_cautionAnswers, but flags the uncertainty (default)
ask_clarifying_questionAsks the user to refine the question instead of answering
refuse_grounded_answerDeclines to answer rather than risk a wrong one
Every response carries result.answer_confidence and result.retrieval_quality. Surface them in your UI to flag shaky answers. Reference: SDK chat.