Back to Blog
Observability
DevOps

The LLM Observability Stack

Salvatan
November 8, 2024
7 min read

Traditional observability (logs, metrics, traces) is necessary but insufficient for LLM systems. You need new primitives.

What to Track

  • Prompt version per request
  • Token usage and cost
  • Latency (LLM call + tool use)
  • Output quality scores (sampled)
  • User feedback (thumbs up/down)
  • Refusal rate
  • Tool use success rate

Challenges

  • High cardinality (every prompt version is a new dimension)
  • Sampling required (cannot score every output)
  • Delayed feedback (user reactions happen later)

Stack Recommendation

  • Trace: Link request -> prompt version -> model -> output
  • Sample: Score 1% of outputs with LLM-as-judge
  • Aggregate: Daily rollups of cost, latency, quality by prompt version
  • Alert: When metrics exceed thresholds

PromptOps provides the trace-to-prompt-version linkage out of the box.