Skip to main content

One-shot chat turn

Run a single question against your vector index:
ragrails chat "How do I authenticate?" \
  --vector-db qdrant \
  --collection docs \
  --url http://localhost:6333 \
  --llm-provider openai \
  --llm-model gpt-4o-mini

Multi-turn with history file

Pass --history-file to persist history across turns. The file is created on the first turn and updated after each turn.
ragrails chat "How do I authenticate?" \
  --history-file files/chat/history.json \
  --vector-db qdrant --collection docs --url http://localhost:6333 \
  --llm-provider openai --llm-model gpt-4o-mini

ragrails chat "What about the second step?" \
  --history-file files/chat/history.json \
  --rewrite-query

Interactive REPL

Run chat with no query argument to start a full interactive session: a REPL that keeps history, streams answers, and can call tools.
ragrails chat

Slash commands

Type these in the REPL to control the session live:
CommandWhat it does
/modelList available models, or /model gpt-4o to switch mid-conversation
/stream on / /stream offToggle token-by-token streaming
/toolsList the tools the assistant can call
/debugShow the retrieval trace for the last answer (scores, rewritten query, chunks used)
/debug is the fastest way to see why an answer was weak. It shows what was retrieved, the rerank scores, and the query that was actually searched.

Agentic tools

When the answer isn’t in your vector index, the assistant can call tools to go get it:
ToolWhat it does
web_fetchFetches a web page as readable text, for live lookups beyond your indexed content
api_callMakes a live HTTP request to perform an action the user asks for (e.g. trigger a transfer, fetch account details)
api_call makes real requests with the values you provide. The REPL asks for confirmation before each tool call.

Options

OptionDefaultDescription
QUERYnoneQuestion (positional). Omit to start interactive REPL
--llm-provideropenaiLLM provider
--llm-modelgpt-4o-miniLLM model name
--max-tokens1024LLM max output tokens
--embedder-providervoyageQuery embedding provider
--embedder-modelvoyage-3Query embedding model
--vector-dbqdrantVector DB provider
--collectionNoneCollection name
--urlNoneVector DB URL
--persona""System persona for the chat turn
--history-fileNoneJSON file to load and save history
--rewrite-queryoffRewrite the query before retrieval
--rewrite-session-context""Context hint for query rewriting
--rerankoffRerank retrieved chunks
--rerankervoyageReranker provider
--reranker-modelrerank-2-liteReranker model
--rerank-top-k5Chunks to keep after reranking
--disable-intent-routingoffAlways run retrieval, even for small-talk queries
--disable-history-compactionoffReturn full history without summarisation