chunk reads JSON files from --input-dir (ingestion output) and writes a chunks.json file to --output-dir.
Options
| Option | Default | Description |
|---|---|---|
--input-dir | required | Folder containing ingestion output JSON files |
--output-dir | required | Folder to write chunks.json to |
--chunk-size | 2000 | Max characters per chunk |
--chunk-overlap | 200 | Character overlap between chunks |
--min-chunk-length | 100 | Minimum chunk length to keep |

