Skip to main content
RAG (Retrieval-Augmented Generation) gives an LLM the right facts at answer time. Instead of relying on what the model memorized, you retrieve relevant passages from your own content and hand them to the model. Ragrails runs that as a pipeline:
content → chunks → embeddings → vector store → retrieval → answer

The pipeline, stage by stage

1

Ingest

Pull in your raw content (websites, documents, or API responses) and normalize it into clean text.
2

Chunk

Split each document into small, focused passages.
Why: search works best on focused passages, not whole files. Chunking lets a query match the one paragraph that answers it instead of a 50-page PDF.
3

Embed

Convert each chunk into a vector, a list of numbers that captures meaning.
Why: vectors let “how do I get my money back” match a chunk titled “Refund policy” even with no shared words.
4

Store

Save the vectors in a vector database so they can be searched fast.
5

Retrieve

Embed the user’s question the same way, then find the closest chunks by meaning.
6

Answer

Hand those chunks to an LLM as context. The model answers grounded in your data.

Core terms

TermPlain meaning
ChunkA small passage of a document, sized for search.
EmbeddingText turned into numbers that capture meaning. Similar meaning → similar numbers.
Vector storeA database built to search embeddings by similarity (Qdrant, Pinecone, Weaviate).
RetrievalFinding the chunks most relevant to a query.
RerankingA second, smarter pass that reorders retrieved chunks by true relevance.
GroundingForcing the LLM to answer from retrieved chunks, not its own memory.

When you need the advanced features

The basic pipeline works out of the box. Reach for these when you hit the matching problem:
SymptomFeatureWhat it does
Top results are off-targetRerankingRe-reads top candidates and reorders by relevance.
Chat follow-ups return nothingQuery rewritingExpands “what about the second one?” into a standalone question.
Long chats get slow or expensiveHistory compactionSummarizes old turns, keeps recent ones.
”Thanks!” triggers a pointless searchIntent routingSkips retrieval for small talk.
Crawls partly failDead-letter queueRecords failures so you retry only those.

Next

Quickstart

Build it now.

Features

Tune each stage.