Skip to main content

URL ingestion

pip install "ragrails[url]"
ragrails setup-url
# Scrape exact URLs
ragrails scrape https://example.com/docs https://example.com/blog

# Full site crawl, save output as JSON files
ragrails scrape https://example.com --mode full --max-depth 2 --max-pages 50 --output-dir files/output/web/

# With frontmatter
ragrails scrape https://example.com/docs --frontmatter --output-dir files/output/web/
OptionDefaultDescription
URLrequiredOne or more URLs (positional, repeatable)
--modeeacheach or full
--max-depth3Crawl depth for mode=full
--max-pages200Max pages per URL
--frontmatteroffPrepend YAML frontmatter
--output-dirNoneSave output as JSON files to this directory

Document ingestion

# Parse a folder
ragrails parse --folder files/docs/ --output-dir files/output/docs/

# Parse specific files
ragrails parse --files files/guide.pdf --files files/pricing.csv --output-dir files/output/docs/
OptionDescription
--folderDirectory to parse
--filesFile path (repeatable)
--frontmatterPrepend YAML frontmatter
--output-dirSave output as JSON files

API ingestion

ragrails fetch https://api.example.com/posts \
  --title "Blog posts" \
  --header "Authorization:Bearer TOKEN" \
  --param limit:100 \
  --max-pages 10 \
  --output-dir files/output/api/
OptionDefaultDescription
URLrequiredAPI endpoint (positional)
--title"API Response"Document title
--description""Description metadata
--methodGETHTTP method
--headernoneKEY:VALUE (repeatable)
--paramnoneKEY:VALUE (repeatable)
--max-pages100Max paginated requests
--frontmatteroffPrepend YAML frontmatter
--output-dirNoneSave output as JSON files