history in on each turn and persist the returned result.history in your application.
Result fields
| Field | Description |
|---|---|
result.answer | LLM answer string |
result.sources | Source chunk dicts used for the answer |
result.history | Updated history list; pass to the next turn |
result.intent | "rag" (retrieved context used) or "direct" (LLM answered directly) |
result.answer_confidence | Confidence assessment dict |
result.retrieval_quality | Retrieval quality assessment dict |
result.compacted | True if history was summarised this turn |
result.errors | List of error dicts |
Config objects
QueryRewriteConfig
Rewrites follow-up questions into self-contained queries before retrieval.
| Field | Default | Description |
|---|---|---|
enabled | False | Enable query rewriting |
session_context | "" | Context hint for the rewriter |
llm | None | Separate LLM for rewriting (falls back to the chat LLM) |
HistoryCompactionConfig
Automatically summarises old history turns to stay within context limits.
| Field | Default | Description |
|---|---|---|
enabled | True | Enable automatic history summarisation |
history_limit | 15 | Summarise when history exceeds this many turns |
keep_recent | 5 | Recent turns to keep verbatim after summarisation |
IntentRoutingConfig
Routes small-talk and direct questions to the LLM without retrieval.
| Field | Default | Description |
|---|---|---|
enabled | True | Route non-RAG queries directly to the LLM |
ChatRetrievalQualityConfig
Controls confidence thresholds for retrieved context.
| Field | Default | Description |
|---|---|---|
min_retrieval_score | 0.35 | Below this, retrieval is considered low quality |
min_rerank_score | 0.50 | Below this, reranked results are low quality |
low_confidence_mode | "answer_with_caution" | "answer_with_caution", "ask_clarifying_question", or "refuse_grounded_answer" |
max_context_chunks | None | Cap chunks sent to the LLM |
llm() parameters
| Parameter | Default | Description |
|---|---|---|
provider | "" | "openai", "anthropic", or "google" (inferred from the model when omitted) |
model | required | Model name, e.g. "gpt-4o-mini", "claude-sonnet-4-6", "gemini-2.5-flash" |
max_tokens | 1024 | Max output tokens |
chat() parameters
| Parameter | Default | Description |
|---|---|---|
query | required | User message |
llm | required | LLM object from rag.llm() |
embedder | required | Embedder object from rag.embedder() with input_type="query" |
vector_db | "qdrant" | Vector DB provider |
collection | None | Collection name |
url | None | Vector DB URL |
reranker | None | Reranker object from rag.reranker() |
history | None | Previous turns list |
history_compaction | None | HistoryCompactionConfig |
query_rewrite | None | QueryRewriteConfig |
intent_routing | None | IntentRoutingConfig |
retrieval_quality | None | ChatRetrievalQualityConfig |
persona | "" | System persona injected into the prompt |
options | None | Provider-specific options dict |

