Blog
Thoughts on LLM engineering, evaluation practices, and building reliable AI systems.
Why Prompt Versioning Matters
You version your code. Why not your prompts? A case for treating LLM workflows as production systems.
Building Eval Harnesses That Matter
Most LLM evals are vanity metrics. Here is how to build tests that actually prevent regressions.
RAG Pipelines Fail Silently
Retrieval quality degrades over time. How to detect and fix it before users complain.
Why Solana for ML Infrastructure
On-chain protocol fees, transparent treasuries, and fast settlement for usage-based pricing.
Defending Against Prompt Injection
Practical strategies for hardening LLM systems against adversarial inputs.
The LLM Observability Stack
What you need to monitor when LLMs are in production: beyond logs and traces.
Berlin AI Scene in 2024
Notes from the ground: what early European AI builders are working on.
From Prototype to Production LLM
The missing checklist for taking your LLM feature from demo to scalable product.