Source content changes: docs get revised, pages get deleted, products get discontinued. If your vector index doesn’t keep up, your chatbot answers from stale data.
Two operations keep it current:
| Task | Method | What it does |
|---|
| Content changed | edit() | Re-embeds the updated chunk and replaces it by ID |
| Content removed | delete() | Removes chunks by ID |
Both target chunks by id. Give chunks stable IDs at chunk time so you can update or remove them later. Derive the ID from the source (e.g. file path + section) rather than letting it be random.
Update changed content
from ragrails import RagRails
rag = RagRails()
rag.edit(
chunks=[
{"id": "guide#refunds", "text": "Refunds now take 3-5 business days.", "source": "files/guide.pdf", "metadata": {"title": "Refunds"}},
],
embedder=rag.embedder(provider="voyage", model="voyage-3"),
vector_db="qdrant",
collection="docs",
url="http://localhost:6333",
)
ragrails edit \
--input-dir files/updated/ \
--vector-db qdrant --collection docs --url http://localhost:6333 \
--provider voyage --model voyage-3
curl -X POST http://127.0.0.1:8000/v1/edit \
-H "Content-Type: application/json" \
-d '{
"chunks": [{"id": "guide#refunds", "text": "Refunds now take 3-5 business days.", "source": "files/guide.pdf", "metadata": {}}],
"provider": "voyage",
"model": "voyage-3",
"vector_db": "qdrant",
"collection": "docs",
"url": "http://localhost:6333"
}'
Remove stale content
rag.delete(
ids=["guide#old-pricing", "blog#discontinued"],
vector_db="qdrant",
collection="docs",
url="http://localhost:6333",
)
ragrails delete \
--id guide#old-pricing \
--id blog#discontinued \
--vector-db qdrant --collection docs --url http://localhost:6333
curl -X POST http://127.0.0.1:8000/v1/delete \
-H "Content-Type: application/json" \
-d '{
"ids": ["guide#old-pricing", "blog#discontinued"],
"vector_db": "qdrant",
"collection": "docs",
"url": "http://localhost:6333"
}'
Re-indexing a source
To refresh a whole document, re-ingest it. New chunks with the same IDs overwrite the old ones; chunks that no longer exist should be deleted.
A simple refresh loop: re-ingest the source, then delete() any IDs that were in the old version but not the new one.
Full parameters: SDK storing · CLI · REST.