Chunking splits documents into small passages before embedding.
Why: search works best on focused passages, not whole files. Chunking lets a query match the one paragraph that answers it, not a 50-page document.
result = rag.chunk(
markdown=docs.outputs, # ingestion output, or plain strings
chunk_size=2000,
chunk_overlap=200,
min_chunk_length=100,
)
result.items # list of chunk dicts
Choosing chunk size
chunk_size is the main lever. It’s a tradeoff:
| Too small | Too large |
|---|
| Loses surrounding context | Retrieves noise around the answer |
| More chunks, higher cost | Weaker similarity matching |
Start at chunk_size=2000, chunk_overlap=200. Lower the size for dense, factual content (specs, FAQs); raise it for narrative content where context matters.
Overlap repeats a little text between adjacent chunks so a sentence split across a boundary isn’t lost. Keep it ~10% of chunk_size.
min_chunk_length drops tiny fragments (stray headings, footers) that add noise.
Reference
Full parameters: SDK chunking · CLI · REST.