RAG
Monitoring
RAG Pipelines Fail Silently
Salvatan
November 30, 2024
7 min read
RAG pipelines have three failure modes: bad retrieval, bad generation, or both. The tricky part? They often degrade slowly, and users stop trusting the system before you notice.
Why RAG Breaks
- Index drift (new docs added, embeddings not updated)
- Query rephrasing changes retrieval
- Model updates change generation style
- Chunking strategy does not match queries
Metrics That Matter
- Retrieval precision (% of retrieved chunks actually used)
- Context relevance (does context answer the query?)
- Answer faithfulness (does output match context?)
- Token efficiency (is context bloated?)
Testing Strategy
Build a test set of queries with known good retrievals. Run it weekly. If precision drops, re-index. If faithfulness drops, tune generation prompt.