Guide 47

Inner vs. Outer Harness — Why Modern Agent Stacks Split Concerns

Problem Framing

Pre-2024 agent frameworks (LangChain v0.0.x, AutoGPT-era stacks) mixed model-behavior concerns (prompt shape, tool schema, response parsing, retry logic) with infrastructure concerns (durable execution, checkpoint persistence, failure recovery, observability). The result: swapping the model required rewriting the runtime, swapping the runtime required rewriting the agent. The 2026 consensus is the Inner / Outer Harness Pattern — separate the two layers, let each evolve on its own cadence, define a small step-boundary contract between them.

Relevant Nodes

Topics: Agent Orchestration
Architectures: Inner/Outer Harness Pattern, Durable Agent Runtime, FAME Architecture
Technologies: Kitaru, Amazon Bedrock AgentCore Runtime
Pain Points: Agent State Loss on Pod Eviction

Decision Path

Inventory which layer each concern actually belongs to:
- Inner harness (model behavior): prompt formatting, tool / function schemas, structured-output decoding, response parsing, model selection, per-call retry policy on transient model errors.
- Outer harness (infrastructure): durable execution, checkpoint persistence, failure recovery, pause / resume, observability across runs, deployment topology, multi-tenant isolation.
Define a thin step-boundary contract between them. The outer harness needs to know: when a step starts, what the step inputs are, what the step outputs are, when the step succeeds, when it fails. Everything else (which model, which tool, how the prompt was assembled) is opaque. Kitaru's @step decorator + Pydantic AI's call boundaries are concrete examples of the contract shape.
Pick inner + outer harness independently.
- Inner harness options: Pydantic AI, LangGraph, LlamaIndex, AutoGen, crewAI, custom-built.
- Outer harness options: Kitaru, Temporal, Restate, Amazon Bedrock AgentCore Runtime, FAME-style serverless decomposition.
- The choice axes are orthogonal — Pydantic AI under Kitaru works the same way as LangGraph under Kitaru works the same way as crewAI under Kitaru.
Recognize the anti-pattern. Any framework asking you to write durable-execution logic inside your agent loop (LangChain pre-0.1 + AutoGPT-style stacks) violates the split. You will pay the rewrite tax on every model upgrade or runtime swap.
Apply the FAME variant if you're going serverless. FAME (Functions-as-a-Service for MCP-enabled agentic workflows) is the inner / outer split adapted for Lambda / Cloud Functions / Azure Functions. Inner-harness logic lives in stateless functions; outer-harness state routes to DynamoDB (hot conversational state) + S3 (heavy durable artifacts). Same pattern, different deployment topology.
Trust the productivity claim. Empirical results from FAME-style deployments: 13× latency reduction, 88% input-token reduction (from optimal caching), 66% cost reduction on representative agent benchmarks. Most of the gain comes from not re-running already-completed steps — exactly what the outer harness's checkpoint primitive enables.

What Changed Over Time

2023–2024: Mixed-concern monolithic agent frameworks dominated. Every model change cascaded into runtime changes.
2025: Pydantic AI, LangGraph, AutoGen all shipped explicit inner / outer separation.
2026: Kitaru released as a canonically outer-harness product (doesn't dictate which agent SDK you use). FAME paper (arXiv 2601.14735) formalized the split for serverless deployments.
Forward: Reference architectures from cloud vendors (AWS / GCP / Azure) standardizing on the inner / outer split for managed agent runtimes.

Problem Framing

Relevant Nodes

Decision Path

What Changed Over Time

Sources