We tried LangChain. We tried CrewAI. We tried AutoGen. Each solved some problems and created others. Eventually, we built our own runtime. Here's why and how.
What We Needed
- Execute a DAG of agent tasks with dependency resolution
- Provide isolation between tasks
- Support checkpointing and rollback at any node
- Stream intermediate results to an observability layer
- Handle timeouts, retries, and circuit breaking per-node
No existing framework did all five well.
Why Existing Frameworks Fall Short
LangChain
Good for prototyping. Too opinionated about LLM interaction, too flexible about everything else. Debugging a 10-step chain is painful.
CrewAI
Great concept — agents with roles. The "crew" metaphor breaks at scale. 50 agents need an orchestrator, not a crew.
AutoGen
Powerful multi-agent conversations. But conversations aren't always the right abstraction. Sometimes you need directed execution, not dialogue.
Our Runtime: Core Concepts
Three primitives:
interface Task<TInput, TOutput> {
id: string;
spec: AgentSpec;
execute: (input: TInput, context: RuntimeContext) => Promise<TOutput>;
validate: (output: TOutput) => ValidationResult;
recover: (error: TaskError) => RecoveryAction;
}
interface Pipeline {
tasks: Task[];
edges: [string, string][];
checkpoints: CheckpointStrategy;
}
interface Runtime {
execute: (pipeline: Pipeline) => AsyncIterable<PipelineEvent>;
pause: (pipelineId: string) => void;
resume: (pipelineId: string) => void;
rollback: (pipelineId: string, toCheckpoint: string) => void;
}
Events, Not Returns
Every task emits events rather than returning values:
- Observability sees everything in real-time
- Downstream tasks can start before upstream finishes
- Failures are just another event type — no try/catch spaghetti
Context Is Explicit
Each task receives exactly the context it needs. No global state.
// Bad: implicit shared context
agent.execute({ ...globalContext, task: "refactor" });
// Good: explicit scoped context
pipeline.addEdge("analyze", "refactor");
// refactor only sees analyze's output
What We Learned
1. Isolation is non-negotiable. Shared context makes debugging impossible.
2. The runtime should be boring. Intelligence lives in specs and agents. The runtime is plumbing.
3. Streaming beats batch. When a pipeline takes 30 minutes, you need minute-5 visibility, not a minute-30 summary.
"The best infrastructure is the one that engineers forget exists."
Our runtime is 2,400 lines of TypeScript. No magic. No AI in the orchestration layer. Just dependency resolution, event streaming, and checkpoint management.
