| Method | Source | Needs |
|---|---|---|
scrape() | Websites (one URL or a full crawl) | ragrails[url] + setup_url() |
parse() | Files: local paths, file URLs, or raw bytes (PDF, DOCX, XLSX, HTML, MD, CSV…) | none |
fetch() | REST API responses | none |
id, text, source, and metadata.
parse() takes more than file paths. Pass a local path, a file URL (downloaded automatically), or raw bytes, which is ideal for web uploads and object storage where the file never touches disk. See input forms.- SDK
- CLI
- REST API
Crawling websites
mode="each"- scrape only the exact URLs you pass.mode="full"- crawl the whole site from a starting URL.
Dead-letter queue: in a large crawl, some pages fail (timeouts, rate limits). Pass
dlq=DLQ("file.json") to record failures, then retry only those with scrape(dlq="file.json"). See DLQ.
