Observability
DevOps
The LLM Observability Stack
Salvatan
November 8, 2024
7 min read
Traditional observability (logs, metrics, traces) is necessary but insufficient for LLM systems. You need new primitives.
What to Track
- Prompt version per request
- Token usage and cost
- Latency (LLM call + tool use)
- Output quality scores (sampled)
- User feedback (thumbs up/down)
- Refusal rate
- Tool use success rate
Challenges
- High cardinality (every prompt version is a new dimension)
- Sampling required (cannot score every output)
- Delayed feedback (user reactions happen later)
Stack Recommendation
- Trace: Link request -> prompt version -> model -> output
- Sample: Score 1% of outputs with LLM-as-judge
- Aggregate: Daily rollups of cost, latency, quality by prompt version
- Alert: When metrics exceed thresholds
PromptOps provides the trace-to-prompt-version linkage out of the box.