A2A — building agent ecosystems that talk to each other (with OpenAPI + MCP)

If 2024 was “give the LLM a tool”, and 2025 was “give the LLM many tools through MCP”, then 2026 is “let the LLM call other LLMs that have their own tools”. That’s the lane A2A (Agent-to-Agent) is built for.

This post is the protocol-level tour I wish I had when I started wiring agents together: what A2A actually is, how it differs from OpenAPI and MCP, the message shapes on the wire, and a worked example. I’ll keep it implementation- oriented — diagrams over hand-waving.

TL;DR — OpenAPI describes HTTP endpoints. MCP standardises the tool surface an agent exposes to its own model. A2A standardises how one agent talks to another agent as a peer, not as a function.


Why another protocol?

Three problems pile up the moment you have more than one agent:

  1. Discovery. How does Agent A even know Agent B exists, what it can do, what it costs, and whether it’s online?
  2. Delegation. A wants B to do something — multi-turn, possibly long-running, possibly streaming — not just answer a single function call.
  3. Trust + identity. Who is calling whom, on whose behalf, with what scopes?

You can fake all three with REST + ad-hoc JSON, but you’ll reinvent the same five things in every codebase: agent cards, task IDs, streaming envelopes, artifact handoff, and auth. A2A standardises that handshake so agents from different orgs can interoperate the way two SaaS apps interoperate over HTTPS.

flowchart LR
  subgraph Without["Without A2A"]
    A1[Agent A]
    B1[Agent B]
    C1[Agent C]
    A1 -- bespoke JSON --> B1
    A1 -- different JSON --> C1
    B1 -- yet another JSON --> C1
  end
  subgraph With["With A2A"]
    A2[Agent A]
    B2[Agent B]
    C2[Agent C]
    R[(A2A registry / well-known)]
    A2 -- A2A --> R
    B2 -- A2A --> R
    C2 -- A2A --> R
    A2 <-- A2A tasks --> B2
    A2 <-- A2A tasks --> C2
    B2 <-- A2A tasks --> C2
  end

OpenAPI vs. MCP vs. A2A — the one-line version

layerwho talks to whomshape of the unitlifecycle
OpenAPIclient ↔ HTTP serverrequest / responseper-call, stateless
MCPLLM ↔ its own toolboxtool call / resource readper-call, scoped to one model session
A2Aagent ↔ agenttask (multi-turn, streamable)long-lived, with status transitions
flowchart TB
  user[User] --> agentA[Agent A]
  agentA -- MCP --> tools[(Local tools<br/>fs, db, search)]
  agentA -- A2A --> agentB[Agent B<br/>e.g. Billing agent]
  agentB -- MCP --> btools[(Billing tools)]
  agentB -- OpenAPI --> stripe[(Stripe REST)]
  agentA -- OpenAPI --> intern[(Internal REST)]

Read it as: OpenAPI is between code and a service. MCP is between a model and its tools. A2A is between two agents that each already have their own model + tools.

They compose. A2A doesn’t replace OpenAPI or MCP — it sits above them.


The Agent Card (discovery)

Every A2A-speaking agent publishes an Agent Card at a well-known URL — typically /.well-known/agent.json. It’s the agent equivalent of an OpenAPI document.

{
  "name": "billing-agent",
  "description": "Handles invoices, refunds, and dunning for Acme Corp.",
  "version": "1.4.0",
  "url": "https://billing.acme.example/a2a",
  "capabilities": {
    "streaming": true,
    "pushNotifications": true,
    "stateTransitionHistory": true
  },
  "authentication": {
    "schemes": ["oauth2", "bearer"]
  },
  "defaultInputModes":  ["text", "data"],
  "defaultOutputModes": ["text", "data", "file"],
  "skills": [
    {
      "id": "issue_refund",
      "name": "Issue refund",
      "description": "Refund a customer order partially or in full.",
      "tags": ["payments"],
      "examples": [
        "Refund order #1042 for $19.99",
        "Full refund for the last invoice on customer cus_42"
      ]
    },
    {
      "id": "lookup_invoice",
      "name": "Look up invoice",
      "description": "Find invoices by id, customer, or date range."
    }
  ]
}

The card is intentionally lightweight: name, what it can do (skills), how to talk to it (url, defaultInputModes), how to authenticate, and which extras it supports (streaming, pushNotifications).

A calling agent fetches the card, picks a skill, and opens a task.


The unit of work: a Task

A2A’s central abstraction is the task, not the request. Tasks have IDs, state, history, and (optionally) artifacts.

stateDiagram-v2
  [*] --> submitted
  submitted --> working: agent picks it up
  working --> input_required: needs a clarification
  input_required --> working: user/agent replies
  working --> completed: success + artifacts
  working --> failed: unrecoverable error
  working --> canceled: caller canceled
  completed --> [*]
  failed --> [*]
  canceled --> [*]

Why a state machine instead of req/res? Because real agent work isn’t synchronous. A “summarise this 200-page PDF and email Janet” task can run for minutes, ask a clarifying question halfway through, stream partial output, and finally produce a file artifact. Modelling that as a single HTTP call is painful; modelling it as a task with status transitions is natural.


The wire format

A2A is JSON-RPC 2.0 over HTTPS, with optional SSE streaming. A few core methods:

methodpurpose
tasks/sendsubmit a new task (synchronous reply)
tasks/sendSubscribesubmit a task and stream events (SSE)
tasks/getpoll a task’s current state
tasks/cancelcancel a running task
tasks/pushNotification/setregister a webhook for state changes

A minimal tasks/send:

{
  "jsonrpc": "2.0",
  "id": "req-1",
  "method": "tasks/send",
  "params": {
    "id": "task-7af3",
    "sessionId": "sess-91",
    "message": {
      "role": "user",
      "parts": [
        { "type": "text", "text": "Refund order 1042 in full and email the customer." }
      ]
    },
    "acceptedOutputModes": ["text", "data"]
  }
}

Note the message.parts — A2A messages are multi-part, like email. A single message can carry text, structured data, and file references. The common part types:

  • TextPart — plain text or markdown
  • DataPart — arbitrary JSON (often the structured args/results)
  • FilePart{ name, mimeType, bytes | uri }

A reply on success:

{
  "jsonrpc": "2.0",
  "id": "req-1",
  "result": {
    "id": "task-7af3",
    "status": { "state": "completed", "timestamp": "2026-05-15T10:14:02Z" },
    "artifacts": [
      {
        "name": "refund_receipt.pdf",
        "parts": [
          { "type": "file", "file": { "name": "receipt.pdf", "mimeType": "application/pdf", "uri": "https://billing.acme.example/files/r-7af3.pdf" } }
        ]
      },
      {
        "name": "summary",
        "parts": [
          { "type": "text", "text": "Refund of $19.99 issued. Email sent to alex@example.com." },
          { "type": "data", "data": { "refundId": "re_abc", "amount": 1999, "currency": "usd" } }
        ]
      }
    ]
  }
}

The shape that matters: status (where in the state machine), history (optional list of messages exchanged), and artifacts (the outputs, each itself multi-part). This is what lets the calling agent both show a human- readable summary and programmatically use the structured refundId.


Streaming with tasks/sendSubscribe

For long-running or chatty tasks, the caller subscribes via SSE and receives two kinds of events:

  • TaskStatusUpdateEvent — state transitioned (e.g. working → input_required)
  • TaskArtifactUpdateEvent — a chunk of an artifact arrived (with index and append: true so the caller can reassemble streamed text)
sequenceDiagram
  autonumber
  participant A as Caller agent
  participant B as Worker agent
  A->>B: POST tasks/sendSubscribe (task-7af3)
  B-->>A: SSE: status=submitted
  B-->>A: SSE: status=working
  B-->>A: SSE: artifact "summary" chunk 0 (text, append:true)
  B-->>A: SSE: artifact "summary" chunk 1 (text, append:true)
  B-->>A: SSE: status=input_required
  A->>B: tasks/send (same task id, message=clarification)
  B-->>A: SSE: status=working
  B-->>A: SSE: artifact "summary" chunk 2 (text, append:true)
  B-->>A: SSE: status=completed

Two design wins worth calling out:

  1. Same task id is reused for clarifications. The “conversation” lives on the task, not on top of it.
  2. Artifacts can be appended in chunks, so token-by-token streaming works without changing the artifact model.

Push notifications (no long-lived connection)

Streaming via SSE is fine when the caller stays online. For agent → agent calls where the caller is itself behind a load balancer or runs on a queue, A2A defines tasks/pushNotification/set:

{
  "method": "tasks/pushNotification/set",
  "params": {
    "id": "task-7af3",
    "pushNotificationConfig": {
      "url": "https://caller.example/a2a/webhooks",
      "token": "opaque-rotating-token",
      "authentication": { "schemes": ["bearer"] }
    }
  }
}

Worker POSTs status/artifact events to the caller’s webhook with a signed token. Same event shapes as SSE — different transport.

sequenceDiagram
  participant A as Caller
  participant B as Worker
  participant W as Caller webhook
  A->>B: tasks/send + pushNotification/set
  B-->>A: 200 (task accepted)
  Note over B: long-running work...
  B->>W: POST status=working
  B->>W: POST artifact chunk
  B->>W: POST status=completed

Auth: who is calling, on whose behalf

A2A is deliberately auth-agnostic — the Agent Card declares which schemes it accepts (OAuth2, bearer JWT, mTLS, etc.). What matters is the convention:

  • Agent identity — proves the worker is who it claims to be (TLS cert, signed agent card, registry attestation).
  • Caller identity — proves the caller agent’s identity (mTLS or service JWT).
  • End-user delegation — proves the caller is acting on behalf of a specific human (typically a downstream OAuth2 token, or signed user-claim JWT carried in the message).

A common pattern in production is OAuth2 token exchange (RFC 8693): the caller’s user token is exchanged for a downstream token scoped only to the worker’s API.

sequenceDiagram
  participant U as User
  participant A as Caller agent
  participant TS as Token service
  participant B as Worker agent
  U->>A: ask question (with user JWT)
  A->>TS: exchange user JWT → scoped JWT for billing-agent
  TS-->>A: scoped JWT
  A->>B: tasks/send (Bearer scoped JWT)
  B->>B: verify JWT, scope=refund:write, sub=user-91
  B-->>A: task started

This keeps the worker auditable: every action ties back to a real user, not to a god-mode service account.


Worked example: travel booking across three agents

Setup: a concierge agent talks to a flights agent and a hotels agent. Each is independently deployed by a different team, each exposes a single /.well-known/agent.json.

flowchart LR
  user[User] --> concierge[Concierge agent]
  concierge -. fetch card .-> flights[/.well-known/ flights/]
  concierge -. fetch card .-> hotels[/.well-known/ hotels/]
  concierge -- A2A: tasks/sendSubscribe<br/>find flights --> flightsA[Flights agent]
  concierge -- A2A: tasks/sendSubscribe<br/>find hotels --> hotelsA[Hotels agent]
  flightsA -- OpenAPI --> amadeus[(Amadeus API)]
  hotelsA -- MCP --> hotelsTools[(internal hotel tools)]
  flightsA -- streamed artifacts --> concierge
  hotelsA -- streamed artifacts --> concierge
  concierge --> user

Sequence under the hood:

sequenceDiagram
  autonumber
  participant U as User
  participant C as Concierge
  participant F as Flights agent
  participant H as Hotels agent
  U->>C: "Book me Bangalore → Tokyo, 3 nights, under 80k INR"
  par flights
    C->>F: tasks/sendSubscribe (search criteria)
    F-->>C: status=working
    F-->>C: artifact "options" chunk 0 (top 3)
    F-->>C: status=completed
  and hotels
    C->>H: tasks/sendSubscribe (city, dates, budget)
    H-->>C: status=working
    H-->>C: artifact "options" chunk 0
    H-->>C: status=input_required (smoking pref?)
    C->>H: tasks/send (clarification: non-smoking)
    H-->>C: artifact "options" chunk 1
    H-->>C: status=completed
  end
  C-->>U: combined plan + structured booking refs
  U->>C: "go ahead with option 2"
  C->>F: tasks/send (book flight ref)
  C->>H: tasks/send (book hotel ref)
  F-->>C: status=completed (PNR)
  H-->>C: status=completed (booking id)
  C-->>U: confirmations + receipts

Three things to notice:

  • The concierge fans out in parallel because each call is just an HTTP POST that returns a task id.
  • The hotels agent legitimately needs a clarification mid-task; A2A handles that without breaking the connection or reissuing a new request.
  • The final booking is a separate tasks/send against the same agents, reusing the option ids the agents returned earlier as DataParts.

How A2A composes with MCP under the hood

When the flights agent receives a task, it isn’t doing magic — internally it runs an LLM with MCP tools.

flowchart LR
  C[Concierge<br/>via A2A] --> F[Flights agent HTTP server]
  F --> Frunner[Agent runtime<br/>LLM + planner]
  Frunner -- MCP --> tF1[search_flights tool]
  Frunner -- MCP --> tF2[price_quote tool]
  Frunner -- MCP --> tF3[book_pnr tool]
  tF1 -- OpenAPI --> amadeus[(Amadeus REST)]
  tF3 -- OpenAPI --> amadeus
  Frunner -. artifacts/status .- F
  F -- A2A SSE --> C

So the outside world sees A2A. The inside of each agent uses MCP to let its model use tools, and most of those tools wrap OpenAPI calls. All three layers cohabit cleanly.


Failure modes worth designing for

failuresymptommitigation
Worker crashes mid-taskSSE drops, status frozen at workingCaller polls tasks/get; worker resumes from durable task store
Caller forgets task idOrphaned task on workerTasks have a TTL; workers garbage-collect after N hours
Clarification storminput_requiredworking loopsCap turns per task in the runtime; degrade to “best effort” artifact
Auth token expiry on long task401 on next interactionRotate via push notification config; refresh on input_required
Adversarial agent cardSkill description coaxes caller’s LLM into prompt injectionPin allowed agents per registry; sanitize card text before showing it to the planner
Cost runawayWorker chains to many other agentsCarry a budget hint in the task message; workers self-cap

The last two are A2A-specific gotchas that don’t exist in pure REST: agent cards are prose that ends up in another agent’s prompt, and chained calls multiply cost in ways a single OpenAPI client never does.


A2A vs. MCP — when to pick which

You’ll often face the design choice: should this thing be an MCP tool of my agent, or a separate A2A agent?

Rough rules I use:

  • Make it an MCP tool when: it’s stateless, cheap, owned by your team, and the model just needs a function call. (get_weather, query_db, send_email.)
  • Make it an A2A agent when: it has its own planning, its own tools, its own SLA, or it’s owned by another team / vendor. (billing-agent, flights-agent, legal-review-agent.)

If you find yourself wanting to give an MCP tool a system prompt, history, and the ability to call other tools — congratulations, it’s actually an A2A agent.


What I’d build next

  • A small A2A registry that signs agent cards and exposes a search API (skills, tags, latency stats). Without this, discovery is just “we emailed each other URLs”.
  • Schema-pinned skills. Today skills[].examples are prose. In practice you want a JSON schema for inputs/outputs per skill, so the caller’s planner doesn’t have to guess from natural language.
  • Cost & latency telemetry baked into the task envelope. Every artifact knows what it cost; every status update knows how long it took.

A2A is young, but the shape feels right: small protocol, big composability, and it slots cleanly above the OpenAPI + MCP stack everyone is already building. The next post in this series goes deep on MCP itself — the tool side of the same picture.