Building an AI Agent Framework from Scratch
I didn’t set out to build an AI agent framework. I set out to solve a problem: I had too many things to do and not enough of me to do them. What started as a simple wrapper around Claude’s API turned into three separate projects, a design pattern I now use for everything, and a fundamentally different way of thinking about how AI should work.
The Problem: One Brain, Too Many Tasks
By late 2024, I was using AI for everything — drafting emails, writing code, researching grant applications, generating content. But each task was a separate conversation, a separate context window, a separate thread of thought. I’d lose track of what I’d told one session versus another. I’d re-explain the same project context over and over. The AI was smart, but it had no memory and no awareness of what else I was working on.
So I built OpenClaw.
OpenClaw: The First Attempt
OpenClaw was a Node.js application that sat between me and Claude’s API. The idea was simple: instead of opening a new chat every time, I’d have a persistent agent that understood my projects, my preferences, and my current priorities. It would maintain context across sessions, remember what it had done, and pick up where it left off.
The first version was ugly. It was a terminal application with no UI to speak of, and it crashed constantly. But it worked well enough to prove the concept. I could tell it “continue working on the grant application” and it would actually know which grant application, what stage it was at, and what was left to do.
What I learned from OpenClaw was that the hard part of AI agents isn’t the AI. It’s the state management. Keeping track of what the agent knows, what it’s done, what it’s supposed to do next, and how to recover when something goes wrong — that’s where all the complexity lives.
Claude Coworker: The Electron Era
OpenClaw worked, but it was command-line only, which meant I was the only person who would ever use it. I wanted something I could show to other people — something with a real interface, real file management, and the ability to work on multiple projects simultaneously.
Claude Coworker was my answer. Built with Electron, it was a desktop application that gave the AI agent a proper workspace. You could create projects, attach files, set goals, and let the agent work through them. It had a chat interface for real-time interaction and a task board for tracking what the agent was working on.
The Electron choice was deliberate. I needed access to the local filesystem — the agent had to be able to read and write files, run scripts, and interact with the user’s actual development environment. A web app wouldn’t cut it. Electron gave me a real desktop app with full system access, wrapped in web technologies I already knew.
Claude Coworker taught me something important about AI agent design: the agent needs guardrails, and those guardrails need to be architectural, not just prompt-based. Telling an AI “don’t delete important files” is not the same as building a system where the AI literally cannot delete files outside its sandbox. I learned this the hard way when an early version of the agent helpfully “cleaned up” a directory it thought was temporary. It wasn’t.
The Hub-and-Spoke Pattern
The real breakthrough came when I stopped thinking about a single agent and started thinking about a system of agents. The hub-and-spoke pattern emerged from a practical need: some tasks require deep focus (writing a long document, analyzing a dataset), while others require broad coordination (managing a project timeline, triaging incoming requests). One agent can’t do both well.
In the hub-and-spoke model, a central “governor” agent manages the overall workflow. It receives requests, breaks them into subtasks, and dispatches them to specialized “spoke” agents. Each spoke agent has a narrow focus — one might be great at writing, another at code review, another at research. The governor decides who does what, monitors progress, and assembles the final output.
The governor pattern was inspired by how I actually manage projects in my day job. A good program manager doesn’t do all the work themselves. They understand the work well enough to assign it to the right people, check the output, and course-correct when things go sideways. That’s exactly what the governor agent does.
I implemented this using the Model Context Protocol (MCP), which gave each spoke agent access to specific tools and resources without giving them access to everything. The research agent could search the web but couldn’t modify files. The coding agent could write code but couldn’t send emails. The governor could see everything but only acted through its spokes.
What I Got Wrong
Plenty. The first version of the governor was too controlling — it would micromanage the spoke agents, checking in after every step, which slowed everything down and burned through API tokens. I had to learn to trust the spokes and only intervene when something actually went wrong.
I also underestimated the importance of error handling. When a spoke agent fails (and they do fail), the governor needs to decide: retry? reassign? escalate to the human? My early versions would just crash. Later versions got smarter about graceful degradation — if the code agent couldn’t solve a bug, the governor would gather the relevant context and present it to me as a clear question rather than a stack trace.
The biggest mistake was trying to make the system too autonomous too fast. The most useful version of the framework isn’t one that works completely on its own — it’s one that works with me, handling the tedious parts while keeping me in the loop for the decisions that matter.
Where It Is Now
The framework has evolved into something I use every day. It manages my project files, drafts communications, reviews code, and keeps track of dozens of parallel workstreams. It’s not perfect — it still gets confused sometimes, still needs correction, still occasionally tries to be too clever. But it’s genuinely useful in a way that a simple chatbot isn’t.
The tech stack settled on Next.js for the latest version, with MCP for tool integration and a SQLite database for state management. It runs locally, processes everything through Claude’s API, and keeps all data on my machine. No cloud dependencies beyond the API itself.
Building this taught me more about software architecture than any course or book. When your application is a system of autonomous agents that need to coordinate, communicate, and recover from failures, you learn very quickly what good design looks like — because bad design fails loudly and immediately.
More than anything, it taught me that AI isn’t magic. It’s a tool, and like any tool, its value depends entirely on the system you build around it.