Research
Original thinking from building AI that runs in production.
Grounding agents without hallucination drift
A practical retrieval architecture that keeps agents factual over long sessions.
The real cost of LLM eval at scale
What we learned running 2M evaluations across production systems.
Human-in-the-loop that actually scales
Designing review workflows that don't become the bottleneck.