CLI Chat - Ragrails

One-shot chat turn

Run a single question against your vector index:

ragrails chat "How do I authenticate?" \
  --vector-db qdrant \
  --collection docs \
  --url http://localhost:6333 \
  --llm-provider openai \
  --llm-model gpt-4o-mini

Multi-turn with history file

Pass --history-file to persist history across turns. The file is created on the first turn and updated after each turn.

ragrails chat "How do I authenticate?" \
  --history-file files/chat/history.json \
  --vector-db qdrant --collection docs --url http://localhost:6333 \
  --llm-provider openai --llm-model gpt-4o-mini

ragrails chat "What about the second step?" \
  --history-file files/chat/history.json \
  --rewrite-query

Interactive REPL

Run chat with no query argument to start a full interactive session: a REPL that keeps history, streams answers, and can call tools.

ragrails chat

Slash commands

Type these in the REPL to control the session live:

Command	What it does
`/model`	List available models, or `/model gpt-4o` to switch mid-conversation
`/stream on` / `/stream off`	Toggle token-by-token streaming
`/tools`	List the tools the assistant can call
`/debug`	Show the retrieval trace for the last answer (scores, rewritten query, chunks used)

/debug is the fastest way to see why an answer was weak. It shows what was retrieved, the rerank scores, and the query that was actually searched.

Agentic tools

When the answer isn’t in your vector index, the assistant can call tools to go get it:

Tool	What it does
`web_fetch`	Fetches a web page as readable text, for live lookups beyond your indexed content
`api_call`	Makes a live HTTP request to perform an action the user asks for (e.g. trigger a transfer, fetch account details)

api_call makes real requests with the values you provide. The REPL asks for confirmation before each tool call.

Options

Option	Default	Description
`QUERY`	none	Question (positional). Omit to start interactive REPL
`--llm-provider`	`openai`	LLM provider
`--llm-model`	`gpt-4o-mini`	LLM model name
`--max-tokens`	`1024`	LLM max output tokens
`--embedder-provider`	`voyage`	Query embedding provider
`--embedder-model`	`voyage-3`	Query embedding model
`--vector-db`	`qdrant`	Vector DB provider
`--collection`	`None`	Collection name
`--url`	`None`	Vector DB URL
`--persona`	`""`	System persona for the chat turn
`--history-file`	`None`	JSON file to load and save history
`--rewrite-query`	off	Rewrite the query before retrieval
`--rewrite-session-context`	`""`	Context hint for query rewriting
`--rerank`	off	Rerank retrieved chunks
`--reranker`	`voyage`	Reranker provider
`--reranker-model`	`rerank-2-lite`	Reranker model
`--rerank-top-k`	`5`	Chunks to keep after reranking
`--disable-intent-routing`	off	Always run retrieval, even for small-talk queries
`--disable-history-compaction`	off	Return full history without summarisation

​One-shot chat turn

​Multi-turn with history file

​Interactive REPL

​Slash commands

​Agentic tools

​Options

One-shot chat turn

Multi-turn with history file

Interactive REPL

Slash commands

Agentic tools

Options