How RAG Works

The pipeline, stage by stage

Ingest

Pull in your raw content (websites, documents, or API responses) and normalize it into clean text.

Chunk

Split each document into small, focused passages.

Why: search works best on focused passages, not whole files. Chunking lets a query match the one paragraph that answers it instead of a 50-page PDF.

Embed

Convert each chunk into a vector, a list of numbers that captures meaning.

Why: vectors let “how do I get my money back” match a chunk titled “Refund policy” even with no shared words.

Store

Save the vectors in a vector database so they can be searched fast.

Retrieve

Embed the user’s question the same way, then find the closest chunks by meaning.

Answer

Hand those chunks to an LLM as context. The model answers grounded in your data.

Core terms

Term	Plain meaning
Chunk	A small passage of a document, sized for search.
Embedding	Text turned into numbers that capture meaning. Similar meaning → similar numbers.
Vector store	A database built to search embeddings by similarity (Qdrant, Pinecone, Weaviate).
Retrieval	Finding the chunks most relevant to a query.
Reranking	A second, smarter pass that reorders retrieved chunks by true relevance.
Grounding	Forcing the LLM to answer from retrieved chunks, not its own memory.

When you need the advanced features

The basic pipeline works out of the box. Reach for these when you hit the matching problem:

Symptom	Feature	What it does
Top results are off-target	Reranking	Re-reads top candidates and reorders by relevance.
Chat follow-ups return nothing	Query rewriting	Expands “what about the second one?” into a standalone question.
Long chats get slow or expensive	History compaction	Summarizes old turns, keeps recent ones.
”Thanks!” triggers a pointless search	Intent routing	Skips retrieval for small talk.
Crawls partly fail	Dead-letter queue	Records failures so you retry only those.

The pipeline, stage by stage

Core terms

When you need the advanced features

Next

Quickstart

Features

​The pipeline, stage by stage

​Core terms

​When you need the advanced features

​Next

Quickstart

Features

The pipeline, stage by stage

Core terms

When you need the advanced features

Next