Skip to main content
chunk() splits markdown documents into stable, embedding-ready pieces.
from ragrails import RagRails

rag = RagRails()

# From ingestion output
result = rag.chunk(
    markdown=ingest_result.outputs,  # list of dicts with a "text" key
    chunk_size=2000,
    chunk_overlap=200,
    min_chunk_length=100,
)

result.inputs  # documents passed in
result.chunks  # total chunks produced
result.items   # list of chunk dicts
result.failed  # documents that failed
result.errors  # list of error strings
Each chunk dict has:
FieldDescription
idStable unique chunk ID
textChunk text
sourceSource URL or file path
metadataDict with title, chunk_index, and other fields

Input formats

markdown accepts:
# Plain string
rag.chunk(markdown="# Guide\n\nContent here.")

# List of strings
rag.chunk(markdown=["# Doc 1\n\nContent.", "# Doc 2\n\nMore content."])

# List of dicts (ingestion output; uses the "text" key)
rag.chunk(markdown=ingestion_result.outputs)

Parameters

ParameterDefaultDescription
markdownrequiredMarkdown string, list of strings, or list of dicts with a text key
title""Document title (used when markdown is a plain string)
source""Source URL or path (used when markdown is a plain string)
chunk_size2000Max characters per chunk
chunk_overlap200Character overlap between adjacent chunks
min_chunk_length100Chunks shorter than this are dropped