Salvatan

Demos are easy. Production is hard. Here is what changes when you scale an LLM feature.

Prototype vs Production

**Prototype:** - Hardcoded prompts - Manual testing - No version control - One model (usually GPT-4) - No cost tracking - No fallbacks

**Production:** - Versioned prompts with rollback - Automated eval suite - CI integration - Multi-model support + failover - Per-request cost + latency tracking - Error handling and retries

The Checklist

1. Prompt versioning system 2. Golden test set (100+ examples) 3. Eval harness with pass/fail thresholds 4. CI pipeline (block deploys on eval failures) 5. Observability (trace requests to prompt versions) 6. Cost alerting (daily spend limits) 7. Rate limiting and quotas 8. Fallback model or cached responses 9. Security review (injection defenses) 10. Rollback plan

Most teams skip steps 1-4 and regret it when they need to debug a production issue.

Timeline

If you are starting from scratch: - Week 1-2: Implement versioning + eval harness - Week 3: CI integration + observability - Week 4: Security hardening + load testing

PromptOps compresses this to days by providing the infrastructure.

From Prototype to Production LLM

Prototype vs Production

The Checklist

Timeline

Related Posts

Why Prompt Versioning Matters

Building Eval Harnesses That Matter

Defending Against Prompt Injection