Skip to main content
Chat is stateless. Pass history in on each turn and persist the returned result.history in your application.
pip install "ragrails[voyage,openai,qdrant]"
from ragrails import RagRails

rag = RagRails()
llm = rag.llm(provider="openai", model="gpt-4o-mini")
embedder = rag.embedder(provider="voyage", model="voyage-3", input_type="query")

history = []

result = rag.chat(
    "How do I authenticate?",
    llm=llm,
    embedder=embedder,
    vector_db="qdrant",
    collection="docs",
    url="http://localhost:6333",
    history=history,
)

print(result.answer)
history = result.history  # pass to the next turn

Result fields

FieldDescription
result.answerLLM answer string
result.sourcesSource chunk dicts used for the answer
result.historyUpdated history list; pass to the next turn
result.intent"rag" (retrieved context used) or "direct" (LLM answered directly)
result.answer_confidenceConfidence assessment dict
result.retrieval_qualityRetrieval quality assessment dict
result.compactedTrue if history was summarised this turn
result.errorsList of error dicts

Config objects

QueryRewriteConfig

Rewrites follow-up questions into self-contained queries before retrieval.
from ragrails import QueryRewriteConfig

result = rag.chat(
    "What about the second step?",
    ...,
    query_rewrite=QueryRewriteConfig(
        enabled=True,
        session_context="User is asking about the onboarding flow.",
    ),
)
FieldDefaultDescription
enabledFalseEnable query rewriting
session_context""Context hint for the rewriter
llmNoneSeparate LLM for rewriting (falls back to the chat LLM)

HistoryCompactionConfig

Automatically summarises old history turns to stay within context limits.
from ragrails import HistoryCompactionConfig

result = rag.chat(
    ...,
    history_compaction=HistoryCompactionConfig(enabled=True, history_limit=15, keep_recent=5),
)
FieldDefaultDescription
enabledTrueEnable automatic history summarisation
history_limit15Summarise when history exceeds this many turns
keep_recent5Recent turns to keep verbatim after summarisation

IntentRoutingConfig

Routes small-talk and direct questions to the LLM without retrieval.
from ragrails import IntentRoutingConfig

result = rag.chat(..., intent_routing=IntentRoutingConfig(enabled=True))
FieldDefaultDescription
enabledTrueRoute non-RAG queries directly to the LLM

ChatRetrievalQualityConfig

Controls confidence thresholds for retrieved context.
from ragrails import ChatRetrievalQualityConfig

result = rag.chat(
    ...,
    retrieval_quality=ChatRetrievalQualityConfig(
        min_retrieval_score=0.35,
        min_rerank_score=0.50,
        low_confidence_mode="answer_with_caution",
    ),
)
FieldDefaultDescription
min_retrieval_score0.35Below this, retrieval is considered low quality
min_rerank_score0.50Below this, reranked results are low quality
low_confidence_mode"answer_with_caution""answer_with_caution", "ask_clarifying_question", or "refuse_grounded_answer"
max_context_chunksNoneCap chunks sent to the LLM

llm() parameters

ParameterDefaultDescription
provider"""openai", "anthropic", or "google" (inferred from the model when omitted)
modelrequiredModel name, e.g. "gpt-4o-mini", "claude-sonnet-4-6", "gemini-2.5-flash"
max_tokens1024Max output tokens

chat() parameters

ParameterDefaultDescription
queryrequiredUser message
llmrequiredLLM object from rag.llm()
embedderrequiredEmbedder object from rag.embedder() with input_type="query"
vector_db"qdrant"Vector DB provider
collectionNoneCollection name
urlNoneVector DB URL
rerankerNoneReranker object from rag.reranker()
historyNonePrevious turns list
history_compactionNoneHistoryCompactionConfig
query_rewriteNoneQueryRewriteConfig
intent_routingNoneIntentRoutingConfig
retrieval_qualityNoneChatRetrievalQualityConfig
persona""System persona injected into the prompt
optionsNoneProvider-specific options dict