Articles/Tooling
PlaybookTooling

Building an Agent Runtime From Scratch

Why existing orchestration frameworks fell short and what we built instead. A minimal runtime for maximum control.

February 3, 202611 min

We tried LangChain. We tried CrewAI. We tried AutoGen. Each solved some problems and created others. Eventually, we built our own runtime. Here's why and how.

What We Needed

  1. Execute a DAG of agent tasks with dependency resolution
  2. Provide isolation between tasks
  3. Support checkpointing and rollback at any node
  4. Stream intermediate results to an observability layer
  5. Handle timeouts, retries, and circuit breaking per-node

No existing framework did all five well.

Why Existing Frameworks Fall Short

LangChain

Good for prototyping. Too opinionated about LLM interaction, too flexible about everything else. Debugging a 10-step chain is painful.

CrewAI

Great concept — agents with roles. The "crew" metaphor breaks at scale. 50 agents need an orchestrator, not a crew.

AutoGen

Powerful multi-agent conversations. But conversations aren't always the right abstraction. Sometimes you need directed execution, not dialogue.

Our Runtime: Core Concepts

Three primitives:

interface Task<TInput, TOutput> {
  id: string;
  spec: AgentSpec;
  execute: (input: TInput, context: RuntimeContext) => Promise<TOutput>;
  validate: (output: TOutput) => ValidationResult;
  recover: (error: TaskError) => RecoveryAction;
}

interface Pipeline {
  tasks: Task[];
  edges: [string, string][];
  checkpoints: CheckpointStrategy;
}

interface Runtime {
  execute: (pipeline: Pipeline) => AsyncIterable<PipelineEvent>;
  pause: (pipelineId: string) => void;
  resume: (pipelineId: string) => void;
  rollback: (pipelineId: string, toCheckpoint: string) => void;
}

Events, Not Returns

Every task emits events rather than returning values:

  • Observability sees everything in real-time
  • Downstream tasks can start before upstream finishes
  • Failures are just another event type — no try/catch spaghetti

Context Is Explicit

Each task receives exactly the context it needs. No global state.

// Bad: implicit shared context
agent.execute({ ...globalContext, task: "refactor" });

// Good: explicit scoped context
pipeline.addEdge("analyze", "refactor");
// refactor only sees analyze's output

What We Learned

1. Isolation is non-negotiable. Shared context makes debugging impossible.

2. The runtime should be boring. Intelligence lives in specs and agents. The runtime is plumbing.

3. Streaming beats batch. When a pipeline takes 30 minutes, you need minute-5 visibility, not a minute-30 summary.

"The best infrastructure is the one that engineers forget exists."

Our runtime is 2,400 lines of TypeScript. No magic. No AI in the orchestration layer. Just dependency resolution, event streaming, and checkpoint management.

CatoCut
CatoCut
Agent-First Engineering