One-shot chat turn
Run a single question against your vector index:
ragrails chat "How do I authenticate?" \
--vector-db qdrant \
--collection docs \
--url http://localhost:6333 \
--llm-provider openai \
--llm-model gpt-4o-mini
Multi-turn with history file
Pass --history-file to persist history across turns. The file is created on the first turn and updated after each turn.
ragrails chat "How do I authenticate?" \
--history-file files/chat/history.json \
--vector-db qdrant --collection docs --url http://localhost:6333 \
--llm-provider openai --llm-model gpt-4o-mini
ragrails chat "What about the second step?" \
--history-file files/chat/history.json \
--rewrite-query
Interactive REPL
Run chat with no query argument to start a full interactive session: a REPL that keeps history, streams answers, and can call tools.
Slash commands
Type these in the REPL to control the session live:
| Command | What it does |
|---|
/model | List available models, or /model gpt-4o to switch mid-conversation |
/stream on / /stream off | Toggle token-by-token streaming |
/tools | List the tools the assistant can call |
/debug | Show the retrieval trace for the last answer (scores, rewritten query, chunks used) |
/debug is the fastest way to see why an answer was weak. It shows what was retrieved, the rerank scores, and the query that was actually searched.
When the answer isn’t in your vector index, the assistant can call tools to go get it:
| Tool | What it does |
|---|
web_fetch | Fetches a web page as readable text, for live lookups beyond your indexed content |
api_call | Makes a live HTTP request to perform an action the user asks for (e.g. trigger a transfer, fetch account details) |
api_call makes real requests with the values you provide. The REPL asks for confirmation before each tool call.
Options
| Option | Default | Description |
|---|
QUERY | none | Question (positional). Omit to start interactive REPL |
--llm-provider | openai | LLM provider |
--llm-model | gpt-4o-mini | LLM model name |
--max-tokens | 1024 | LLM max output tokens |
--embedder-provider | voyage | Query embedding provider |
--embedder-model | voyage-3 | Query embedding model |
--vector-db | qdrant | Vector DB provider |
--collection | None | Collection name |
--url | None | Vector DB URL |
--persona | "" | System persona for the chat turn |
--history-file | None | JSON file to load and save history |
--rewrite-query | off | Rewrite the query before retrieval |
--rewrite-session-context | "" | Context hint for query rewriting |
--rerank | off | Rerank retrieved chunks |
--reranker | voyage | Reranker provider |
--reranker-model | rerank-2-lite | Reranker model |
--rerank-top-k | 5 | Chunks to keep after reranking |
--disable-intent-routing | off | Always run retrieval, even for small-talk queries |
--disable-history-compaction | off | Return full history without summarisation |