The pipeline, stage by stage
Ingest
Pull in your raw content (websites, documents, or API responses) and normalize it into clean text.
Chunk
Split each document into small, focused passages.
Why: search works best on focused passages, not whole files. Chunking lets a query match the one paragraph that answers it instead of a 50-page PDF.
Embed
Convert each chunk into a vector, a list of numbers that captures meaning.
Why: vectors let “how do I get my money back” match a chunk titled “Refund policy” even with no shared words.
Core terms
| Term | Plain meaning |
|---|---|
| Chunk | A small passage of a document, sized for search. |
| Embedding | Text turned into numbers that capture meaning. Similar meaning → similar numbers. |
| Vector store | A database built to search embeddings by similarity (Qdrant, Pinecone, Weaviate). |
| Retrieval | Finding the chunks most relevant to a query. |
| Reranking | A second, smarter pass that reorders retrieved chunks by true relevance. |
| Grounding | Forcing the LLM to answer from retrieved chunks, not its own memory. |
When you need the advanced features
The basic pipeline works out of the box. Reach for these when you hit the matching problem:| Symptom | Feature | What it does |
|---|---|---|
| Top results are off-target | Reranking | Re-reads top candidates and reorders by relevance. |
| Chat follow-ups return nothing | Query rewriting | Expands “what about the second one?” into a standalone question. |
| Long chats get slow or expensive | History compaction | Summarizes old turns, keeps recent ones. |
| ”Thanks!” triggers a pointless search | Intent routing | Skips retrieval for small talk. |
| Crawls partly fail | Dead-letter queue | Records failures so you retry only those. |
Next
Quickstart
Build it now.
Features
Tune each stage.

