CatoCut

We tried LangChain. We tried CrewAI. We tried AutoGen. Each solved some problems and created others. Eventually, we built our own runtime. Here's why and how.

What We Needed

Execute a DAG of agent tasks with dependency resolution
Provide isolation between tasks
Support checkpointing and rollback at any node
Stream intermediate results to an observability layer
Handle timeouts, retries, and circuit breaking per-node

No existing framework did all five well.

Why Existing Frameworks Fall Short

LangChain

Good for prototyping. Too opinionated about LLM interaction, too flexible about everything else. Debugging a 10-step chain is painful.

CrewAI

Great concept — agents with roles. The "crew" metaphor breaks at scale. 50 agents need an orchestrator, not a crew.

AutoGen

Powerful multi-agent conversations. But conversations aren't always the right abstraction. Sometimes you need directed execution, not dialogue.

Our Runtime: Core Concepts

Three primitives:

interface Task<TInput, TOutput> {
  id: string;
  spec: AgentSpec;
  execute: (input: TInput, context: RuntimeContext) => Promise<TOutput>;
  validate: (output: TOutput) => ValidationResult;
  recover: (error: TaskError) => RecoveryAction;
}

interface Pipeline {
  tasks: Task[];
  edges: [string, string][];
  checkpoints: CheckpointStrategy;
}

interface Runtime {
  execute: (pipeline: Pipeline) => AsyncIterable<PipelineEvent>;
  pause: (pipelineId: string) => void;
  resume: (pipelineId: string) => void;
  rollback: (pipelineId: string, toCheckpoint: string) => void;
}

Events, Not Returns

Every task emits events rather than returning values:

Observability sees everything in real-time
Downstream tasks can start before upstream finishes
Failures are just another event type — no try/catch spaghetti

Context Is Explicit

Each task receives exactly the context it needs. No global state.

// Bad: implicit shared context
agent.execute({ ...globalContext, task: "refactor" });

// Good: explicit scoped context
pipeline.addEdge("analyze", "refactor");
// refactor only sees analyze's output

What We Learned

1. Isolation is non-negotiable. Shared context makes debugging impossible.

2. The runtime should be boring. Intelligence lives in specs and agents. The runtime is plumbing.

3. Streaming beats batch. When a pipeline takes 30 minutes, you need minute-5 visibility, not a minute-30 summary.

"The best infrastructure is the one that engineers forget exists."

Our runtime is 2,400 lines of TypeScript. No magic. No AI in the orchestration layer. Just dependency resolution, event streaming, and checkpoint management.