Anatomy of an Eval Harness
Last week I got the LM Evaluation Harness running end-to-end. This week I wanted to understand how it actually works. What happens between feeding in...
Read more →Working on AI agents + evals
Building things that change the world
I'm passionate about exploring the frontiers of AI technology, with a focus on developing intelligent agents and evaluation systems. Currently working on OpenAlita and other projects that push the boundaries of what's possible.
Building an AI agent framework that pushes the boundaries of what's possible with intelligent systems and autonomous decision-making. Inspired by Alita - huge thanks to them.
Exploring the frontiers of AI agent capabilities, from reasoning to tool use and multi-modal interactions.
Following and exploring cutting-edge open-source projects in the AI agent space. Some projects I'm particularly excited about:
Built a platform designed to help researchers and developers discover evaluation metrics for their AI models, streamlining the process of evaluating AI models. Project discontinued (was busy climbing mountains in Europe).
Last week I got the LM Evaluation Harness running end-to-end. This week I wanted to understand how it actually works. What happens between feeding in...
Read more →I’ve been working on LLM and AI Agents for a while, but I’d never actually run a proper LM eval from start to finish, so...
Read more →“How Apple Fell Behind in the AI Race”. Critics have been quick to point out that Apple Intelligence isn’t up to standard. But what exactly...
Read more →