Blog

Thoughts on LLM engineering, evaluation practices, and building reliable AI systems.

Engineering

Best Practices

Why Prompt Versioning Matters

You version your code. Why not your prompts? A case for treating LLM workflows as production systems.

Building Eval Harnesses That Matter

Most LLM evals are vanity metrics. Here is how to build tests that actually prevent regressions.

RAG Pipelines Fail Silently

Retrieval quality degrades over time. How to detect and fix it before users complain.

Why Solana for ML Infrastructure

On-chain protocol fees, transparent treasuries, and fast settlement for usage-based pricing.

Defending Against Prompt Injection

Practical strategies for hardening LLM systems against adversarial inputs.

The LLM Observability Stack

What you need to monitor when LLMs are in production: beyond logs and traces.

Berlin AI Scene in 2024

Notes from the ground: what early European AI builders are working on.

From Prototype to Production LLM

The missing checklist for taking your LLM feature from demo to scalable product.

Oct 18, 2024

10 min