chat() takes config objects that control how it handles conversation flow and low-quality retrieval. Each targets a specific failure mode.
Conversation management
History compaction
Long conversations overflow the LLM’s context window and cost more on every turn. Compaction summarizes older messages into a short recap and keeps recent turns verbatim.Compaction is also a cost lever: fewer tokens per turn means lower spend on long chats. See cost optimization.
Intent routing
Not every message needs a database lookup. “Thanks!” or “who are you?” should get a direct reply. Intent routing detects these and skips retrieval, which is faster, cheaper, and avoids irrelevant context.result.intent is "rag" (retrieval ran) or "direct" (answered without retrieval).
Answer quality
Sometimes your index simply doesn’t contain the answer. Rather than hallucinate, Ragrails scores the retrieved context and acts on low confidence.low_confidence_mode | Behaviour when context is weak |
|---|---|
answer_with_caution | Answers, but flags the uncertainty (default) |
ask_clarifying_question | Asks the user to refine the question instead of answering |
refuse_grounded_answer | Declines to answer rather than risk a wrong one |
result.answer_confidence and result.retrieval_quality. Surface them in your UI to flag shaky answers.
Reference: SDK chat.
