chunk() splits markdown documents into stable, embedding-ready pieces.
| Field | Description |
|---|---|
id | Stable unique chunk ID |
text | Chunk text |
source | Source URL or file path |
metadata | Dict with title, chunk_index, and other fields |
Input formats
markdown accepts:
Parameters
| Parameter | Default | Description |
|---|---|---|
markdown | required | Markdown string, list of strings, or list of dicts with a text key |
title | "" | Document title (used when markdown is a plain string) |
source | "" | Source URL or path (used when markdown is a plain string) |
chunk_size | 2000 | Max characters per chunk |
chunk_overlap | 200 | Character overlap between adjacent chunks |
min_chunk_length | 100 | Chunks shorter than this are dropped |

