MCP — the USB-C of LLM tools (servers, transports, and patterns that work)

If you’ve used a coding assistant in the last year, you’ve used MCP — even if you didn’t know it. It’s the protocol that lets the editor’s LLM see your filesystem, run your search index, hit your database, and talk to your issue tracker without each integration being bespoke glue.

This post is the protocol-level walkthrough I keep linking people to: what MCP actually is on the wire, what an MCP server has to implement, the three primitives (tools / resources / prompts), the two transports (stdio and HTTP+SSE), capability negotiation, and the patterns that survive contact with real users.

Companion post: A2A — building agent ecosystems. A2A is how agents talk to each other. MCP is how a single agent talks to its tools.

TL;DR — MCP is JSON-RPC 2.0 between an LLM host (the editor / agent) and an MCP server (your tool bundle). It standardises three things: callable tools, readable resources, and parameterised prompts — plus a small dance for capability negotiation and (optionally) letting the server ask the host’s model to do work.

The mental model

flowchart LR
  subgraph Host["Host (editor / agent runtime)"]
    LLM[LLM]
    Client[MCP client]
    LLM <--> Client
  end
  subgraph Servers["MCP servers (separate processes)"]
    S1[fs server<br/>tools: read, write, glob<br/>resources: file://...]
    S2[github server<br/>tools: list_prs, comment<br/>resources: gh://...]
    S3[postgres server<br/>tools: query<br/>resources: pg://table/...]
  end
  Client -- stdio --> S1
  Client -- stdio --> S2
  Client -- HTTP+SSE --> S3

Three roles, drawn explicitly because people conflate them constantly:

Host. The application the user sees (Claude Desktop, an editor, an agent runtime). It owns the LLM and the policy around what the LLM is allowed to do.
Client. The piece inside the host that speaks MCP — one client instance per connected server.
Server. A separate process exposing tools, resources, and/or prompts. It does not call the LLM directly (with one exception: sampling, below). It just answers JSON-RPC.

This separation is the whole point. Servers are tiny, single-purpose, and swappable. The host decides which to load and what scopes to grant.

Why MCP exists

Before MCP, every assistant invented its own “tools” interface:

bespoke JSON schemas per vendor,
bespoke auth model,
bespoke streaming format,
no way to discover what a third-party integration could do without reading its docs.

You could ship a stripe-tool for one assistant; making it work in another meant rewriting the adapter. MCP is the boring infrastructure fix: agree on JSON-RPC, agree on three primitive types, and let the ecosystem build sideways.

flowchart LR
  subgraph Before["Before MCP"]
    A[Assistant A] -- adapter A --> S1[Stripe]
    A -- adapter A --> S2[Postgres]
    B[Assistant B] -- adapter B --> S1
    B -- adapter B --> S2
    C[Assistant C] -- adapter C --> S1
    C -- adapter C --> S2
  end
  subgraph After["With MCP"]
    A2[Assistant A] -- MCP --> H[(Any MCP server)]
    B2[Assistant B] -- MCP --> H
    C2[Assistant C] -- MCP --> H
    H --> S3[Stripe]
    H --> S4[Postgres]
    H --> S5[Anything]
  end

The wire: JSON-RPC 2.0

Every MCP message is a JSON-RPC 2.0 envelope. Three shapes:

// Request (expects a response)
{ "jsonrpc": "2.0", "id": 1, "method": "tools/list" }

// Response
{ "jsonrpc": "2.0", "id": 1, "result": { "tools": [...] } }

// Notification (fire-and-forget, no id)
{ "jsonrpc": "2.0", "method": "notifications/tools/list_changed" }

Everything else — capability negotiation, tool listings, tool invocations, resource reads, sampling — is just specific method names on top.

The handshake

Before any real work, client and server negotiate capabilities. This is what lets the host know “this server has tools and resources but not prompts, and it speaks protocol version 2025-06-18”.

sequenceDiagram
  autonumber
  participant H as Host (client)
  participant S as MCP server
  H->>S: initialize { protocolVersion, clientCapabilities, clientInfo }
  S-->>H: result { protocolVersion, serverCapabilities, serverInfo, instructions? }
  H->>S: notifications/initialized
  Note over H,S: connection ready
  H->>S: tools/list
  S-->>H: { tools: [...] }
  H->>S: resources/list
  S-->>H: { resources: [...] }
  H->>S: prompts/list
  S-->>H: { prompts: [...] }

initialize is the only request the server must answer first. After notifications/initialized, either side may send notifications — including notifications/tools/list_changed when the server’s tool surface changes (e.g. a plugin loaded), so the host can re-list without polling.

The three primitives

flowchart TB
  subgraph MCP["What an MCP server can expose"]
    T[Tools<br/>model-invoked actions<br/>side-effecting OK]
    R[Resources<br/>app-readable data<br/>files, rows, blobs]
    P[Prompts<br/>parameterised templates<br/>user-invoked workflows]
  end
  T -. examples .- te[create_issue, run_query, send_email]
  R -. examples .- re[file://README.md, gh://repo/owner/issues, pg://table/users]
  P -. examples .- pe[/summarise this PR/, /draft a release note/]

1. Tools

Tools are functions the model decides to call. Each declares a JSON Schema for its input and (optionally) output:

{
  "name": "create_issue",
  "description": "Open a GitHub issue in the configured repo.",
  "inputSchema": {
    "type": "object",
    "required": ["title"],
    "properties": {
      "title": { "type": "string" },
      "body":  { "type": "string" },
      "labels": { "type": "array", "items": { "type": "string" } }
    }
  }
}

Calling one:

{
  "jsonrpc": "2.0", "id": 7, "method": "tools/call",
  "params": {
    "name": "create_issue",
    "arguments": { "title": "Flaky test in payments", "labels": ["bug"] }
  }
}

Result is content-blocks, like a chat message:

{
  "jsonrpc": "2.0", "id": 7,
  "result": {
    "content": [
      { "type": "text", "text": "Opened #482" },
      { "type": "resource", "resource": { "uri": "gh://repo/acme/app/issues/482", "mimeType": "application/json" } }
    ],
    "isError": false
  }
}

The model sees the text block; the host can also follow the resource link to fetch structured data.

2. Resources

Resources are things the host application reads — usually so it can pin them into the conversation as context. They have URIs and mime types; the server lists them and reads them on demand.

// resources/list (paginated)
{ "resources": [
  { "uri": "file:///repo/README.md", "name": "README.md", "mimeType": "text/markdown" },
  { "uri": "pg://table/users",       "name": "users table", "mimeType": "application/json" }
]}

// resources/read
{ "method": "resources/read", "params": { "uri": "file:///repo/README.md" } }

Servers can also publish resource templates with URI placeholders (pg://table/{name}) so the host knows the shape of dynamic URIs.

A subtle but important rule: tools have side effects, resources don’t. A resource read should be idempotent and safe to call repeatedly. This is what lets hosts cache and prefetch them aggressively.

3. Prompts

Prompts are parameterised templates a user explicitly picks (think slash commands). The server lists them; the host shows them in a menu; selecting one returns a fully-formed message sequence the host then sends to its LLM.

// prompts/list
{ "prompts": [
  {
    "name": "summarise_pr",
    "description": "Summarise a pull request for the release notes.",
    "arguments": [{ "name": "pr_number", "required": true }]
  }
]}

// prompts/get
{ "method": "prompts/get",
  "params": { "name": "summarise_pr", "arguments": { "pr_number": "482" } } }

// response
{ "messages": [
  { "role": "user", "content": { "type": "text", "text": "Summarise PR #482..." } }
]}

Tools, resources, prompts — model-driven, app-driven, user-driven. Different trust boundaries, different UX, same underlying server.

Transports

MCP defines two transports. Both carry the same JSON-RPC messages.

stdio (the default for local servers)

flowchart LR
  Host -- spawn --> Proc[Server process]
  Host -- stdin (newline-delimited JSON) --> Proc
  Proc -- stdout (newline-delimited JSON) --> Host
  Proc -- stderr (logs) --> Host

The host spawns the server as a child process and pipes JSON over stdin/stdout. That’s it. No ports, no auth — the OS process boundary is the security boundary. This is what every “local” MCP server uses (filesystem, git, sqlite, etc.).

Streamable HTTP (remote servers)

sequenceDiagram
  participant H as Host
  participant S as MCP server
  H->>S: POST /mcp (initialize)
  S-->>H: 200 { result: ... }
  H->>S: POST /mcp (tools/list)
  S-->>H: 200 { result: ... }
  H->>S: GET /mcp (Accept: text/event-stream)
  S-->>H: SSE: server-initiated notifications<br/>(tools/list_changed, log messages, sampling requests)

Each client → server message is a normal POST. The reverse direction — server-initiated messages like change notifications, log messages, or sampling requests — flows over a long-lived SSE stream the client opens with a GET. This is what hosted MCP servers use (think a SaaS exposing its tools to any MCP-aware editor).

Two practical knobs:

Auth: HTTP transport supports OAuth2 / bearer / mTLS — the spec doesn’t pick one, your deployment does.
Multiplexing: one HTTP server can serve many sessions; each session has its own SSE stream tagged with a session id.

Capability negotiation

The handshake exchanges capability flags. These tell the other side what optional protocol features are supported, so neither side has to call methods that will 404.

capability	meaning
`tools.listChanged`	server will emit `notifications/tools/list_changed`
`resources.subscribe`	client may subscribe to resource updates
`resources.listChanged`	server will emit list-change notifications
`prompts.listChanged`	same, for prompts
`sampling`	client is willing to fulfil server-initiated `sampling/createMessage`
`roots.listChanged`	client will tell the server when its workspace roots change
`logging`	server will emit structured log notifications

Capabilities are how the protocol stays small but grows safely. New features add a flag; old peers ignore them.

Sampling — the server asks the client’s LLM

Sampling is the one place the arrows reverse: a server can ask the host to run an LLM completion on its behalf.

sequenceDiagram
  autonumber
  participant H as Host (has the LLM)
  participant S as MCP server
  Note over S: tool needs LLM help mid-execution
  S->>H: sampling/createMessage { messages, modelPreferences, maxTokens }
  H->>H: human approves (UI prompt)
  H->>H: pick model, run completion
  H-->>S: result { role:"assistant", content:[...] }
  S-->>S: continue tool execution

Why? It lets a server be agentic without bundling its own model and API key. A code_review tool can ask the host’s LLM to summarise a diff using the user’s existing model entitlements. The host stays in control: it shows the request to the user, picks the model, enforces rate limits, and bills the right account.

This is also the most important security surface in MCP. A malicious server could try to use sampling to extract secrets or pivot. Hosts MUST gate it behind explicit user approval per call (or per session, per server, with clear opt-in).

Roots — telling the server what’s in scope

A client may publish roots — the directories or URIs it considers “the current workspace”:

{ "method": "roots/list", "params": {} }
// →
{ "roots": [
  { "uri": "file:///home/y/repos/portfolio", "name": "portfolio" }
]}

A filesystem MCP server uses roots to know which paths it’s allowed to read or write. Without roots, the server has to guess (or refuse). With them, the host enforces a clean sandbox.

A worked example: an “issues” MCP server

Imagine a server that exposes a project’s issue tracker. It would advertise:

flowchart TB
  subgraph Server["issues MCP server"]
    direction TB
    T1[tool: create_issue]
    T2[tool: comment_issue]
    T3[tool: search_issues]
    R1[resource: issue://acme/app/482]
    R2[resource template: issue://{owner}/{repo}/{number}]
    P1[prompt: triage_inbox]
  end
  Server -. stdio .- Host

A typical conversation:

sequenceDiagram
  autonumber
  participant U as User
  participant H as Host (editor)
  participant L as LLM
  participant S as issues server
  U->>H: "Find recent payments bugs and triage them"
  H->>L: user msg + tool list (from S)
  L->>H: tool_call: search_issues({"q":"label:payments state:open"})
  H->>S: tools/call search_issues
  S-->>H: content [text, data]
  H->>L: tool_result
  L->>H: tool_call: comment_issue({"id":482,"body":"Looks like a duplicate of #401"})
  H->>U: ⚠️ confirm? "Comment on #482"
  U->>H: yes
  H->>S: tools/call comment_issue
  S-->>H: content [text "ok"]
  H->>L: tool_result
  L-->>H: final user-facing summary
  H-->>U: "Found 7 issues; commented duplicate hint on #482."

Two host-policy rules visible in the diagram:

Read-only tool calls (search_issues) auto-execute.
Side-effecting tool calls (comment_issue) require explicit user confirmation, with the exact arguments shown.

This is not in the protocol — it’s host policy. But the protocol gives the host enough metadata (tool name + structured args) to enforce it cleanly.

How MCP and OpenAPI relate

Common question. Short answer:

	OpenAPI	MCP
Transport	HTTP	JSON-RPC over stdio or HTTP+SSE
Caller	any HTTP client	an LLM (via the host)
Discovery	spec doc	`tools/list`, `resources/list` at runtime
Granularity	endpoints	tools / resources / prompts
Streaming	varies	first-class (SSE notifications, partial results)
Auth	any	server’s choice; stdio uses process boundary

A good rule: wrap an OpenAPI service with an MCP server when an LLM should call it. The MCP server adds the model-friendly schema, the descriptions the model actually reads, and the host-side policy hooks. It doesn’t replace your REST API.

How MCP and A2A relate

Even shorter:

MCP is vertical: agent → its tools.
A2A is horizontal: agent → another agent.

flowchart LR
  user[User] --> A[Agent A]
  A -- MCP --> tA[(A's tools)]
  A -- A2A --> B[Agent B]
  B -- MCP --> tB[(B's tools)]

Most production stacks end up using both. See the A2A post for that side.

Patterns that hold up

A few things I’ve learned the hard way building/integrating MCP servers:

Keep tools narrow. read_file + write_file is better than one omnibus fs_op({op,...}). The model picks tools by name and description; cluttered surfaces lead to wrong picks.
Schema first, prose second. Tools with strict inputSchema and outputSchema survive across model upgrades. Tools that depend on verbose descriptions for “rules” rot fast.
Resources, not tool-returned blobs, for anything bigger than a paragraph. Tool results sit in the LLM’s context; resources are referenceable. A 50KB query result as a resource link is much cheaper than as a tool result.
Side effects must be explicit. Either name the tool with a verb (create_*, delete_*) or set an annotations.destructive: true flag so the host can require confirmation.
Idempotency keys. If a tool can be retried (and it can), accept an idempotencyKey argument and de-duplicate server-side.
Logs over print. Use the logging capability + notifications/message, not stdout. On stdio transport, stray prints corrupt the JSON stream.
Roots are a contract. Honour them. A filesystem server that reads outside the declared roots is a security bug, not a feature.
Version your tools. When a tool’s schema changes incompatibly, ship a new name (search_v2) and keep the old one for a deprecation window. Hosts cache lists; flipping shapes mid-session breaks things.

Failure modes

failure	symptom	mitigation
Server crashes	stdio pipe closes, host shows “disconnected”	Host auto-restarts; server keeps state in a file or db
Tool-call schema mismatch	LLM keeps calling with wrong args	Tighten `inputSchema`; add an example in the description; reject early with a structured error
Prompt injection via resource	Server returns content that hijacks the LLM	Treat resource text as data, not instructions; sandbox HTML / markdown rendering in the host
Sampling abuse	Server spams `sampling/createMessage`	Per-server budget + user approval gating
Streaming back-pressure (HTTP/SSE)	Client consumes slowly, server buffers grow	Cap stream buffer; drop oldest non-essential notifications first
Capability drift	Server quietly stops supporting a feature	Re-`initialize` on reconnect; never assume capabilities persist across sessions

What’s next for MCP

The protocol is moving fast, but a few directions look stable:

Better authz primitives, especially for HTTP transport — standardised scopes per tool, signed tool descriptors so a registry can verify them.
First-class evals: a way for a host to declare “I expect this tool to satisfy these test cases” so server upgrades can be vetted automatically.
Cost/latency telemetry in tool results (already conventional in some servers, not yet in the spec).
Composable servers: an MCP server that itself acts as an MCP client to other servers, presenting a curated subset upward. The “façade” pattern works today but isn’t blessed.

MCP’s whole appeal is that it’s small. The interesting future is in the ecosystem — schema-pinned tool packs, signed registries, and the host-side policies that make this whole thing safe to hand to a real model with real credentials.

If you’re building, start with one server, one transport (stdio), and three tools. Get the host-side approval UX right before you scale the surface. The rest of the spec is there when you need it.