Skip to main content
Large ingestion jobs don’t fully succeed on the first try. These features make them recoverable.

Retries with a dead-letter queue

Crawl hundreds of pages and a handful will fail from timeouts or rate limits. A dead-letter queue (DLQ) records just the failures so you retry those, not the whole crawl.
from ragrails import RagRails, DLQ

rag = RagRails()

# Collect failures to a file
result = rag.scrape("https://example.com", mode="full", dlq=DLQ("files/dlq/web.json"))

# Retry just the failures
result = rag.scrape(dlq="files/dlq/web.json")

# Or filter before retrying
result.dlq.items = [i for i in result.dlq.items if "docs" in i["url"]]
result = rag.scrape(dlq=result.dlq)
dlq valueBehaviour
DLQ()Collect retryable failures in memory
DLQ("path.json")Collect and save to a file
result.dlqRetry from a previous result
"path.json"Retry from a saved file
Only retryable failures (timeouts, transient errors) go to the DLQ. Permanent errors like 404s don’t.

API pagination

fetch() walks paginated APIs automatically. Pick the strategy your API uses:
StrategyWhenExample
pagePage numbers (?page=2){"type": "page", "param": "page", "size_param": "per_page", "size": 100}
offsetRow offsets (?offset=100){"type": "offset", "param": "offset", "size_param": "limit", "size": 100}
cursorNext-page token{"type": "cursor", "param": "cursor", "cursor_path": "meta.next_cursor"}
result = rag.fetch(
    url="https://api.example.com/products",
    pagination={"type": "page", "param": "page", "size_param": "per_page", "size": 100},
    max_pages=20,   # always cap as a safety stop
)
Reference: SDK ingestion.