
A Production-Grade AI Agent in Two API Calls
On 17 June 2026, Amazon Web Services announced the general availability of the managed agent harness in Amazon Bedrock AgentCore at AWS Summit New York. The announcement delivers on a promise AWS made when AgentCore launched in preview: to reduce the infrastructure gap between a developer's first agent prototype and a production-ready system to something that can be crossed in minutes rather than weeks. The harness achieves this through two API calls — CreateHarness and InvokeHarness — that together configure and deploy a fully managed, isolated agent runtime with no underlying infrastructure to operate.
The Infrastructure Problem AgentCore Harness Solves
Building an AI agent that works reliably in production at scale and one that works in a development notebook are two very different engineering problems. The production version needs an isolated execution environment to prevent agent sessions from interfering with each other, persistent memory and filesystem access across sessions, a secure context for executing generated code, and monitoring sufficient to evaluate whether the agent is performing as intended. Before AgentCore Harness, those requirements meant assembling and maintaining a custom runtime — typically containers, orchestration logic, memory backends, and a separate evaluation framework.
AgentCore Harness externalises that entire runtime stack into a managed service. Development teams define what the agent does: which model it uses, which tools it calls, which skills from the AgentCore catalogue it accesses, and which instructions it follows. The harness assembles the runtime from those specifications and runs the agent loop in a fully managed environment. The team is responsible for the agent's behaviour; AWS is responsible for the infrastructure that executes it.
CreateHarness and InvokeHarness: The Mechanics
The two-call model is straightforward. CreateHarness accepts a configuration specifying the model, tools, and instructions. InvokeHarness executes the agent against a given task or query. Every InvokeHarness call runs in a secure, isolated microVM — a lightweight virtual machine provisioned per session and torn down when the session ends. Each microVM has its own filesystem and shell, so agents can write code, execute it, store intermediate outputs, and persist memory across sessions without those operations touching other sessions or the host environment.
Grok 4.3, Claude Opus 4.8, GPT-4o, Google Gemini models, and any LiteLLM-compatible provider can all serve as the harness model. Critically, an agent can switch the underlying model mid-session without losing context, which enables multi-model workflows — one model for planning, another for code generation, a third for evaluation — within a single harness configuration. This provider flexibility means teams are not locked to a single AI vendor for the full agent workload.
Evaluations and Optimisation Built In
AgentCore Harness GA ships with AgentCore Evaluations, a built-in LLM-as-a-judge framework that scores harness traces against configurable quality criteria. Evaluations can run online as requests happen, on-demand, in batch over historical traces, or in simulation against test sets. The companion AgentCore Optimisation service reads those scores and generates prompt and tool-description recommendations, then validates changes by routing live traffic between the existing and updated configurations — an A/B testing mechanism applied to prompt engineering at the harness level, removing the need to build this validation infrastructure manually.
Pricing: No Harness Tax
AWS charges no separate fee for the harness itself. Costs break down into runtime compute at $0.0895 per vCPU-hour and $0.00945 per GB-hour, billed on active consumption during agent execution, plus the standard model inference costs for whatever model the agent uses. For teams currently self-hosting agent runtimes on EC2 or ECS, the managed harness cost compares favourably once developer time spent on container management, memory backends, and runtime maintenance is factored in.
What Indian Development Teams Can Build Today
For Indian software product companies and IT services firms running agent workloads on AWS, the AgentCore Harness changes the cost structure of bringing an agent prototype into production. The primary cost of production agents before this announcement was not model inference but infrastructure engineering — building, securing, and maintaining the runtime. With that cost replaced by a managed service, smaller engineering teams can operate production agents with the same reliability as teams with dedicated platform engineering capacity.
Indian organisations building internal enterprise tools — HR automation agents, procurement processing agents, customer support agents — can deploy into production from a working prototype without a separate infrastructure build phase. The multi-model support is particularly relevant for teams that want to evaluate different model providers on real production traffic without maintaining parallel infrastructure stacks or rewriting orchestration code between evaluations.
The Bottom Line
Amazon Web Services made the AgentCore Harness generally available on 17 June 2026 at AWS Summit New York, reducing the path from a working agent prototype to a production deployment to two API calls: CreateHarness and InvokeHarness. Each session runs in an isolated microVM, supports any LiteLLM-compatible model including mid-session provider switching, and includes built-in evaluation and prompt optimisation with live traffic A/B testing. Runtime pricing is $0.0895 per vCPU-hour and $0.00945 per GB-hour with no separate harness fee. For Indian engineering teams using AWS, AgentCore Harness eliminates the infrastructure engineering phase that has historically separated agent prototypes from production-ready systems.
Frequently Asked Questions
What is the Amazon Bedrock AgentCore Harness and when did it become generally available?+
The Amazon Bedrock AgentCore Harness is a managed service that deploys production-grade AI agents from a declarative configuration — developers specify the model, tools, skills, and instructions, and the harness assembles and runs the agent loop in a fully managed environment. It became generally available on 17 June 2026, announced at AWS Summit New York. The harness requires two API calls: CreateHarness to define the agent configuration, and InvokeHarness to run the agent against a given task.
How does the AgentCore Harness isolation model work?+
Every InvokeHarness call runs in a secure, isolated microVM — a lightweight virtual machine provisioned per session and torn down when the session ends. Each microVM has its own filesystem and shell, allowing agents to write and execute code, store intermediate files, and maintain memory across sessions without interfering with other sessions. This isolation means agents can safely perform side-effecting operations such as code execution and file management without shared-resource contention or cross-session data leakage.
Which AI models are compatible with the AgentCore Harness?+
The AgentCore Harness supports any model available through Amazon Bedrock, OpenAI, Google Gemini, or any LiteLLM-compatible provider. Agents can switch the underlying model mid-session without losing context, enabling multi-model pipelines where one model plans, another generates code, and a third evaluates output — all within a single harness configuration. This provider flexibility means teams are not locked to a single AI vendor for the full agent workload.
What does AgentCore Harness cost and what does the pricing include?+
There is no separate charge for the harness itself. Runtime compute is priced at $0.0895 per vCPU-hour and $0.00945 per GB-hour, billed on active consumption during agent execution. Model inference costs are charged separately at standard Bedrock rates for the model in use. AgentCore Evaluations and AgentCore Optimisation are included GA features that score agent traces with LLM-as-a-judge evaluators and generate prompt improvement recommendations validated through live traffic A/B testing — no additional infrastructure required.
Written by
TechPillow Team
Sharing insights on technology, product development, and the Indian tech ecosystem.