A2A — building agent ecosystems that talk to each other (with OpenAPI + MCP)
If 2024 was “give the LLM a tool”, and 2025 was “give the LLM many tools through MCP”, then 2026 is “let the LLM call other LLMs that have their own tools”. That’s the lane A2A (Agent-to-Agent) is built for.
This post is the protocol-level tour I wish I had when I started wiring agents together: what A2A actually is, how it differs from OpenAPI and MCP, the message shapes on the wire, and a worked example. I’ll keep it implementation- oriented — diagrams over hand-waving.
TL;DR — OpenAPI describes HTTP endpoints. MCP standardises the tool surface an agent exposes to its own model. A2A standardises how one agent talks to another agent as a peer, not as a function.
Why another protocol?
Three problems pile up the moment you have more than one agent:
- Discovery. How does Agent A even know Agent B exists, what it can do, what it costs, and whether it’s online?
- Delegation. A wants B to do something — multi-turn, possibly long-running, possibly streaming — not just answer a single function call.
- Trust + identity. Who is calling whom, on whose behalf, with what scopes?
You can fake all three with REST + ad-hoc JSON, but you’ll reinvent the same five things in every codebase: agent cards, task IDs, streaming envelopes, artifact handoff, and auth. A2A standardises that handshake so agents from different orgs can interoperate the way two SaaS apps interoperate over HTTPS.
flowchart LR
subgraph Without["Without A2A"]
A1[Agent A]
B1[Agent B]
C1[Agent C]
A1 -- bespoke JSON --> B1
A1 -- different JSON --> C1
B1 -- yet another JSON --> C1
end
subgraph With["With A2A"]
A2[Agent A]
B2[Agent B]
C2[Agent C]
R[(A2A registry / well-known)]
A2 -- A2A --> R
B2 -- A2A --> R
C2 -- A2A --> R
A2 <-- A2A tasks --> B2
A2 <-- A2A tasks --> C2
B2 <-- A2A tasks --> C2
end
OpenAPI vs. MCP vs. A2A — the one-line version
| layer | who talks to whom | shape of the unit | lifecycle |
|---|---|---|---|
| OpenAPI | client ↔ HTTP server | request / response | per-call, stateless |
| MCP | LLM ↔ its own toolbox | tool call / resource read | per-call, scoped to one model session |
| A2A | agent ↔ agent | task (multi-turn, streamable) | long-lived, with status transitions |
flowchart TB user[User] --> agentA[Agent A] agentA -- MCP --> tools[(Local tools<br/>fs, db, search)] agentA -- A2A --> agentB[Agent B<br/>e.g. Billing agent] agentB -- MCP --> btools[(Billing tools)] agentB -- OpenAPI --> stripe[(Stripe REST)] agentA -- OpenAPI --> intern[(Internal REST)]
Read it as: OpenAPI is between code and a service. MCP is between a model and its tools. A2A is between two agents that each already have their own model + tools.
They compose. A2A doesn’t replace OpenAPI or MCP — it sits above them.
The Agent Card (discovery)
Every A2A-speaking agent publishes an Agent Card at a well-known URL —
typically /.well-known/agent.json. It’s the agent equivalent of an OpenAPI
document.
{
"name": "billing-agent",
"description": "Handles invoices, refunds, and dunning for Acme Corp.",
"version": "1.4.0",
"url": "https://billing.acme.example/a2a",
"capabilities": {
"streaming": true,
"pushNotifications": true,
"stateTransitionHistory": true
},
"authentication": {
"schemes": ["oauth2", "bearer"]
},
"defaultInputModes": ["text", "data"],
"defaultOutputModes": ["text", "data", "file"],
"skills": [
{
"id": "issue_refund",
"name": "Issue refund",
"description": "Refund a customer order partially or in full.",
"tags": ["payments"],
"examples": [
"Refund order #1042 for $19.99",
"Full refund for the last invoice on customer cus_42"
]
},
{
"id": "lookup_invoice",
"name": "Look up invoice",
"description": "Find invoices by id, customer, or date range."
}
]
}
The card is intentionally lightweight: name, what it can do (skills), how to
talk to it (url, defaultInputModes), how to authenticate, and which extras
it supports (streaming, pushNotifications).
A calling agent fetches the card, picks a skill, and opens a task.
The unit of work: a Task
A2A’s central abstraction is the task, not the request. Tasks have IDs, state, history, and (optionally) artifacts.
stateDiagram-v2 [*] --> submitted submitted --> working: agent picks it up working --> input_required: needs a clarification input_required --> working: user/agent replies working --> completed: success + artifacts working --> failed: unrecoverable error working --> canceled: caller canceled completed --> [*] failed --> [*] canceled --> [*]
Why a state machine instead of req/res? Because real agent work isn’t synchronous. A “summarise this 200-page PDF and email Janet” task can run for minutes, ask a clarifying question halfway through, stream partial output, and finally produce a file artifact. Modelling that as a single HTTP call is painful; modelling it as a task with status transitions is natural.
The wire format
A2A is JSON-RPC 2.0 over HTTPS, with optional SSE streaming. A few core methods:
| method | purpose |
|---|---|
tasks/send | submit a new task (synchronous reply) |
tasks/sendSubscribe | submit a task and stream events (SSE) |
tasks/get | poll a task’s current state |
tasks/cancel | cancel a running task |
tasks/pushNotification/set | register a webhook for state changes |
A minimal tasks/send:
{
"jsonrpc": "2.0",
"id": "req-1",
"method": "tasks/send",
"params": {
"id": "task-7af3",
"sessionId": "sess-91",
"message": {
"role": "user",
"parts": [
{ "type": "text", "text": "Refund order 1042 in full and email the customer." }
]
},
"acceptedOutputModes": ["text", "data"]
}
}
Note the message.parts — A2A messages are multi-part, like email. A
single message can carry text, structured data, and file references. The
common part types:
TextPart— plain text or markdownDataPart— arbitrary JSON (often the structured args/results)FilePart—{ name, mimeType, bytes | uri }
A reply on success:
{
"jsonrpc": "2.0",
"id": "req-1",
"result": {
"id": "task-7af3",
"status": { "state": "completed", "timestamp": "2026-05-15T10:14:02Z" },
"artifacts": [
{
"name": "refund_receipt.pdf",
"parts": [
{ "type": "file", "file": { "name": "receipt.pdf", "mimeType": "application/pdf", "uri": "https://billing.acme.example/files/r-7af3.pdf" } }
]
},
{
"name": "summary",
"parts": [
{ "type": "text", "text": "Refund of $19.99 issued. Email sent to alex@example.com." },
{ "type": "data", "data": { "refundId": "re_abc", "amount": 1999, "currency": "usd" } }
]
}
]
}
}
The shape that matters: status (where in the state machine), history
(optional list of messages exchanged), and artifacts (the outputs, each
itself multi-part). This is what lets the calling agent both show a human-
readable summary and programmatically use the structured refundId.
Streaming with tasks/sendSubscribe
For long-running or chatty tasks, the caller subscribes via SSE and receives two kinds of events:
TaskStatusUpdateEvent— state transitioned (e.g.working → input_required)TaskArtifactUpdateEvent— a chunk of an artifact arrived (withindexandappend: trueso the caller can reassemble streamed text)
sequenceDiagram autonumber participant A as Caller agent participant B as Worker agent A->>B: POST tasks/sendSubscribe (task-7af3) B-->>A: SSE: status=submitted B-->>A: SSE: status=working B-->>A: SSE: artifact "summary" chunk 0 (text, append:true) B-->>A: SSE: artifact "summary" chunk 1 (text, append:true) B-->>A: SSE: status=input_required A->>B: tasks/send (same task id, message=clarification) B-->>A: SSE: status=working B-->>A: SSE: artifact "summary" chunk 2 (text, append:true) B-->>A: SSE: status=completed
Two design wins worth calling out:
- Same task id is reused for clarifications. The “conversation” lives on the task, not on top of it.
- Artifacts can be appended in chunks, so token-by-token streaming works without changing the artifact model.
Push notifications (no long-lived connection)
Streaming via SSE is fine when the caller stays online. For agent → agent
calls where the caller is itself behind a load balancer or runs on a queue,
A2A defines tasks/pushNotification/set:
{
"method": "tasks/pushNotification/set",
"params": {
"id": "task-7af3",
"pushNotificationConfig": {
"url": "https://caller.example/a2a/webhooks",
"token": "opaque-rotating-token",
"authentication": { "schemes": ["bearer"] }
}
}
}
Worker POSTs status/artifact events to the caller’s webhook with a signed token. Same event shapes as SSE — different transport.
sequenceDiagram participant A as Caller participant B as Worker participant W as Caller webhook A->>B: tasks/send + pushNotification/set B-->>A: 200 (task accepted) Note over B: long-running work... B->>W: POST status=working B->>W: POST artifact chunk B->>W: POST status=completed
Auth: who is calling, on whose behalf
A2A is deliberately auth-agnostic — the Agent Card declares which schemes it accepts (OAuth2, bearer JWT, mTLS, etc.). What matters is the convention:
- Agent identity — proves the worker is who it claims to be (TLS cert, signed agent card, registry attestation).
- Caller identity — proves the caller agent’s identity (mTLS or service JWT).
- End-user delegation — proves the caller is acting on behalf of a specific human (typically a downstream OAuth2 token, or signed user-claim JWT carried in the message).
A common pattern in production is OAuth2 token exchange (RFC 8693): the caller’s user token is exchanged for a downstream token scoped only to the worker’s API.
sequenceDiagram participant U as User participant A as Caller agent participant TS as Token service participant B as Worker agent U->>A: ask question (with user JWT) A->>TS: exchange user JWT → scoped JWT for billing-agent TS-->>A: scoped JWT A->>B: tasks/send (Bearer scoped JWT) B->>B: verify JWT, scope=refund:write, sub=user-91 B-->>A: task started
This keeps the worker auditable: every action ties back to a real user, not to a god-mode service account.
Worked example: travel booking across three agents
Setup: a concierge agent talks to a flights agent and a hotels
agent. Each is independently deployed by a different team, each exposes a
single /.well-known/agent.json.
flowchart LR user[User] --> concierge[Concierge agent] concierge -. fetch card .-> flights[/.well-known/ flights/] concierge -. fetch card .-> hotels[/.well-known/ hotels/] concierge -- A2A: tasks/sendSubscribe<br/>find flights --> flightsA[Flights agent] concierge -- A2A: tasks/sendSubscribe<br/>find hotels --> hotelsA[Hotels agent] flightsA -- OpenAPI --> amadeus[(Amadeus API)] hotelsA -- MCP --> hotelsTools[(internal hotel tools)] flightsA -- streamed artifacts --> concierge hotelsA -- streamed artifacts --> concierge concierge --> user
Sequence under the hood:
sequenceDiagram
autonumber
participant U as User
participant C as Concierge
participant F as Flights agent
participant H as Hotels agent
U->>C: "Book me Bangalore → Tokyo, 3 nights, under 80k INR"
par flights
C->>F: tasks/sendSubscribe (search criteria)
F-->>C: status=working
F-->>C: artifact "options" chunk 0 (top 3)
F-->>C: status=completed
and hotels
C->>H: tasks/sendSubscribe (city, dates, budget)
H-->>C: status=working
H-->>C: artifact "options" chunk 0
H-->>C: status=input_required (smoking pref?)
C->>H: tasks/send (clarification: non-smoking)
H-->>C: artifact "options" chunk 1
H-->>C: status=completed
end
C-->>U: combined plan + structured booking refs
U->>C: "go ahead with option 2"
C->>F: tasks/send (book flight ref)
C->>H: tasks/send (book hotel ref)
F-->>C: status=completed (PNR)
H-->>C: status=completed (booking id)
C-->>U: confirmations + receipts
Three things to notice:
- The concierge fans out in parallel because each call is just an HTTP POST that returns a task id.
- The hotels agent legitimately needs a clarification mid-task; A2A handles that without breaking the connection or reissuing a new request.
- The final booking is a separate
tasks/sendagainst the same agents, reusing the option ids the agents returned earlier asDataParts.
How A2A composes with MCP under the hood
When the flights agent receives a task, it isn’t doing magic — internally it runs an LLM with MCP tools.
flowchart LR C[Concierge<br/>via A2A] --> F[Flights agent HTTP server] F --> Frunner[Agent runtime<br/>LLM + planner] Frunner -- MCP --> tF1[search_flights tool] Frunner -- MCP --> tF2[price_quote tool] Frunner -- MCP --> tF3[book_pnr tool] tF1 -- OpenAPI --> amadeus[(Amadeus REST)] tF3 -- OpenAPI --> amadeus Frunner -. artifacts/status .- F F -- A2A SSE --> C
So the outside world sees A2A. The inside of each agent uses MCP to let its model use tools, and most of those tools wrap OpenAPI calls. All three layers cohabit cleanly.
Failure modes worth designing for
| failure | symptom | mitigation |
|---|---|---|
| Worker crashes mid-task | SSE drops, status frozen at working | Caller polls tasks/get; worker resumes from durable task store |
| Caller forgets task id | Orphaned task on worker | Tasks have a TTL; workers garbage-collect after N hours |
| Clarification storm | input_required ↔ working loops | Cap turns per task in the runtime; degrade to “best effort” artifact |
| Auth token expiry on long task | 401 on next interaction | Rotate via push notification config; refresh on input_required |
| Adversarial agent card | Skill description coaxes caller’s LLM into prompt injection | Pin allowed agents per registry; sanitize card text before showing it to the planner |
| Cost runaway | Worker chains to many other agents | Carry a budget hint in the task message; workers self-cap |
The last two are A2A-specific gotchas that don’t exist in pure REST: agent cards are prose that ends up in another agent’s prompt, and chained calls multiply cost in ways a single OpenAPI client never does.
A2A vs. MCP — when to pick which
You’ll often face the design choice: should this thing be an MCP tool of my agent, or a separate A2A agent?
Rough rules I use:
- Make it an MCP tool when: it’s stateless, cheap, owned by your team,
and the model just needs a function call. (
get_weather,query_db,send_email.) - Make it an A2A agent when: it has its own planning, its own tools, its
own SLA, or it’s owned by another team / vendor. (
billing-agent,flights-agent,legal-review-agent.)
If you find yourself wanting to give an MCP tool a system prompt, history, and the ability to call other tools — congratulations, it’s actually an A2A agent.
What I’d build next
- A small A2A registry that signs agent cards and exposes a search API (skills, tags, latency stats). Without this, discovery is just “we emailed each other URLs”.
- Schema-pinned skills. Today
skills[].examplesare prose. In practice you want a JSON schema for inputs/outputs per skill, so the caller’s planner doesn’t have to guess from natural language. - Cost & latency telemetry baked into the task envelope. Every artifact knows what it cost; every status update knows how long it took.
A2A is young, but the shape feels right: small protocol, big composability, and it slots cleanly above the OpenAPI + MCP stack everyone is already building. The next post in this series goes deep on MCP itself — the tool side of the same picture.