Salvatan

RAG pipelines have three failure modes: bad retrieval, bad generation, or both. The tricky part? They often degrade slowly, and users stop trusting the system before you notice.

Why RAG Breaks

Index drift (new docs added, embeddings not updated)
Query rephrasing changes retrieval
Model updates change generation style
Chunking strategy does not match queries

Metrics That Matter

Retrieval precision (% of retrieved chunks actually used)
Context relevance (does context answer the query?)
Answer faithfulness (does output match context?)
Token efficiency (is context bloated?)

Testing Strategy

Build a test set of queries with known good retrievals. Run it weekly. If precision drops, re-index. If faithfulness drops, tune generation prompt.

RAG Pipelines Fail Silently

Why RAG Breaks

Metrics That Matter

Testing Strategy