MCP — the USB-C of LLM tools (servers, transports, and patterns that work)
If you’ve used a coding assistant in the last year, you’ve used MCP — even if you didn’t know it. It’s the protocol that lets the editor’s LLM see your filesystem, run your search index, hit your database, and talk to your issue tracker without each integration being bespoke glue.
This post is the protocol-level walkthrough I keep linking people to: what MCP actually is on the wire, what an MCP server has to implement, the three primitives (tools / resources / prompts), the two transports (stdio and HTTP+SSE), capability negotiation, and the patterns that survive contact with real users.
Companion post: A2A — building agent ecosystems. A2A is how agents talk to each other. MCP is how a single agent talks to its tools.
TL;DR — MCP is JSON-RPC 2.0 between an LLM host (the editor / agent) and an MCP server (your tool bundle). It standardises three things: callable tools, readable resources, and parameterised prompts — plus a small dance for capability negotiation and (optionally) letting the server ask the host’s model to do work.
The mental model
flowchart LR
subgraph Host["Host (editor / agent runtime)"]
LLM[LLM]
Client[MCP client]
LLM <--> Client
end
subgraph Servers["MCP servers (separate processes)"]
S1[fs server<br/>tools: read, write, glob<br/>resources: file://...]
S2[github server<br/>tools: list_prs, comment<br/>resources: gh://...]
S3[postgres server<br/>tools: query<br/>resources: pg://table/...]
end
Client -- stdio --> S1
Client -- stdio --> S2
Client -- HTTP+SSE --> S3
Three roles, drawn explicitly because people conflate them constantly:
- Host. The application the user sees (Claude Desktop, an editor, an agent runtime). It owns the LLM and the policy around what the LLM is allowed to do.
- Client. The piece inside the host that speaks MCP — one client instance per connected server.
- Server. A separate process exposing tools, resources, and/or prompts. It does not call the LLM directly (with one exception: sampling, below). It just answers JSON-RPC.
This separation is the whole point. Servers are tiny, single-purpose, and swappable. The host decides which to load and what scopes to grant.
Why MCP exists
Before MCP, every assistant invented its own “tools” interface:
- bespoke JSON schemas per vendor,
- bespoke auth model,
- bespoke streaming format,
- no way to discover what a third-party integration could do without reading its docs.
You could ship a stripe-tool for one assistant; making it work in another
meant rewriting the adapter. MCP is the boring infrastructure fix: agree on
JSON-RPC, agree on three primitive types, and let the ecosystem build sideways.
flowchart LR
subgraph Before["Before MCP"]
A[Assistant A] -- adapter A --> S1[Stripe]
A -- adapter A --> S2[Postgres]
B[Assistant B] -- adapter B --> S1
B -- adapter B --> S2
C[Assistant C] -- adapter C --> S1
C -- adapter C --> S2
end
subgraph After["With MCP"]
A2[Assistant A] -- MCP --> H[(Any MCP server)]
B2[Assistant B] -- MCP --> H
C2[Assistant C] -- MCP --> H
H --> S3[Stripe]
H --> S4[Postgres]
H --> S5[Anything]
end
The wire: JSON-RPC 2.0
Every MCP message is a JSON-RPC 2.0 envelope. Three shapes:
// Request (expects a response)
{ "jsonrpc": "2.0", "id": 1, "method": "tools/list" }
// Response
{ "jsonrpc": "2.0", "id": 1, "result": { "tools": [...] } }
// Notification (fire-and-forget, no id)
{ "jsonrpc": "2.0", "method": "notifications/tools/list_changed" }
Everything else — capability negotiation, tool listings, tool invocations,
resource reads, sampling — is just specific method names on top.
The handshake
Before any real work, client and server negotiate capabilities. This is what lets the host know “this server has tools and resources but not prompts, and it speaks protocol version 2025-06-18”.
sequenceDiagram
autonumber
participant H as Host (client)
participant S as MCP server
H->>S: initialize { protocolVersion, clientCapabilities, clientInfo }
S-->>H: result { protocolVersion, serverCapabilities, serverInfo, instructions? }
H->>S: notifications/initialized
Note over H,S: connection ready
H->>S: tools/list
S-->>H: { tools: [...] }
H->>S: resources/list
S-->>H: { resources: [...] }
H->>S: prompts/list
S-->>H: { prompts: [...] }
initialize is the only request the server must answer first. After
notifications/initialized, either side may send notifications — including
notifications/tools/list_changed when the server’s tool surface
changes (e.g. a plugin loaded), so the host can re-list without polling.
The three primitives
flowchart TB
subgraph MCP["What an MCP server can expose"]
T[Tools<br/>model-invoked actions<br/>side-effecting OK]
R[Resources<br/>app-readable data<br/>files, rows, blobs]
P[Prompts<br/>parameterised templates<br/>user-invoked workflows]
end
T -. examples .- te[create_issue, run_query, send_email]
R -. examples .- re[file://README.md, gh://repo/owner/issues, pg://table/users]
P -. examples .- pe[/summarise this PR/, /draft a release note/]
1. Tools
Tools are functions the model decides to call. Each declares a JSON Schema for its input and (optionally) output:
{
"name": "create_issue",
"description": "Open a GitHub issue in the configured repo.",
"inputSchema": {
"type": "object",
"required": ["title"],
"properties": {
"title": { "type": "string" },
"body": { "type": "string" },
"labels": { "type": "array", "items": { "type": "string" } }
}
}
}
Calling one:
{
"jsonrpc": "2.0", "id": 7, "method": "tools/call",
"params": {
"name": "create_issue",
"arguments": { "title": "Flaky test in payments", "labels": ["bug"] }
}
}
Result is content-blocks, like a chat message:
{
"jsonrpc": "2.0", "id": 7,
"result": {
"content": [
{ "type": "text", "text": "Opened #482" },
{ "type": "resource", "resource": { "uri": "gh://repo/acme/app/issues/482", "mimeType": "application/json" } }
],
"isError": false
}
}
The model sees the text block; the host can also follow the resource
link to fetch structured data.
2. Resources
Resources are things the host application reads — usually so it can pin them into the conversation as context. They have URIs and mime types; the server lists them and reads them on demand.
// resources/list (paginated)
{ "resources": [
{ "uri": "file:///repo/README.md", "name": "README.md", "mimeType": "text/markdown" },
{ "uri": "pg://table/users", "name": "users table", "mimeType": "application/json" }
]}
// resources/read
{ "method": "resources/read", "params": { "uri": "file:///repo/README.md" } }
Servers can also publish resource templates with URI placeholders
(pg://table/{name}) so the host knows the shape of dynamic URIs.
A subtle but important rule: tools have side effects, resources don’t. A resource read should be idempotent and safe to call repeatedly. This is what lets hosts cache and prefetch them aggressively.
3. Prompts
Prompts are parameterised templates a user explicitly picks (think slash commands). The server lists them; the host shows them in a menu; selecting one returns a fully-formed message sequence the host then sends to its LLM.
// prompts/list
{ "prompts": [
{
"name": "summarise_pr",
"description": "Summarise a pull request for the release notes.",
"arguments": [{ "name": "pr_number", "required": true }]
}
]}
// prompts/get
{ "method": "prompts/get",
"params": { "name": "summarise_pr", "arguments": { "pr_number": "482" } } }
// response
{ "messages": [
{ "role": "user", "content": { "type": "text", "text": "Summarise PR #482..." } }
]}
Tools, resources, prompts — model-driven, app-driven, user-driven. Different trust boundaries, different UX, same underlying server.
Transports
MCP defines two transports. Both carry the same JSON-RPC messages.
stdio (the default for local servers)
flowchart LR Host -- spawn --> Proc[Server process] Host -- stdin (newline-delimited JSON) --> Proc Proc -- stdout (newline-delimited JSON) --> Host Proc -- stderr (logs) --> Host
The host spawns the server as a child process and pipes JSON over stdin/stdout. That’s it. No ports, no auth — the OS process boundary is the security boundary. This is what every “local” MCP server uses (filesystem, git, sqlite, etc.).
Streamable HTTP (remote servers)
sequenceDiagram
participant H as Host
participant S as MCP server
H->>S: POST /mcp (initialize)
S-->>H: 200 { result: ... }
H->>S: POST /mcp (tools/list)
S-->>H: 200 { result: ... }
H->>S: GET /mcp (Accept: text/event-stream)
S-->>H: SSE: server-initiated notifications<br/>(tools/list_changed, log messages, sampling requests)
Each client → server message is a normal POST. The reverse direction — server-initiated messages like change notifications, log messages, or sampling requests — flows over a long-lived SSE stream the client opens with a GET. This is what hosted MCP servers use (think a SaaS exposing its tools to any MCP-aware editor).
Two practical knobs:
- Auth: HTTP transport supports OAuth2 / bearer / mTLS — the spec doesn’t pick one, your deployment does.
- Multiplexing: one HTTP server can serve many sessions; each session has its own SSE stream tagged with a session id.
Capability negotiation
The handshake exchanges capability flags. These tell the other side what optional protocol features are supported, so neither side has to call methods that will 404.
| capability | meaning |
|---|---|
tools.listChanged | server will emit notifications/tools/list_changed |
resources.subscribe | client may subscribe to resource updates |
resources.listChanged | server will emit list-change notifications |
prompts.listChanged | same, for prompts |
sampling | client is willing to fulfil server-initiated sampling/createMessage |
roots.listChanged | client will tell the server when its workspace roots change |
logging | server will emit structured log notifications |
Capabilities are how the protocol stays small but grows safely. New features add a flag; old peers ignore them.
Sampling — the server asks the client’s LLM
Sampling is the one place the arrows reverse: a server can ask the host to run an LLM completion on its behalf.
sequenceDiagram
autonumber
participant H as Host (has the LLM)
participant S as MCP server
Note over S: tool needs LLM help mid-execution
S->>H: sampling/createMessage { messages, modelPreferences, maxTokens }
H->>H: human approves (UI prompt)
H->>H: pick model, run completion
H-->>S: result { role:"assistant", content:[...] }
S-->>S: continue tool execution
Why? It lets a server be agentic without bundling its own model and API
key. A code_review tool can ask the host’s LLM to summarise a diff using
the user’s existing model entitlements. The host stays in control: it shows
the request to the user, picks the model, enforces rate limits, and bills
the right account.
This is also the most important security surface in MCP. A malicious server could try to use sampling to extract secrets or pivot. Hosts MUST gate it behind explicit user approval per call (or per session, per server, with clear opt-in).
Roots — telling the server what’s in scope
A client may publish roots — the directories or URIs it considers “the current workspace”:
{ "method": "roots/list", "params": {} }
// →
{ "roots": [
{ "uri": "file:///home/y/repos/portfolio", "name": "portfolio" }
]}
A filesystem MCP server uses roots to know which paths it’s allowed to read or write. Without roots, the server has to guess (or refuse). With them, the host enforces a clean sandbox.
A worked example: an “issues” MCP server
Imagine a server that exposes a project’s issue tracker. It would advertise:
flowchart TB
subgraph Server["issues MCP server"]
direction TB
T1[tool: create_issue]
T2[tool: comment_issue]
T3[tool: search_issues]
R1[resource: issue://acme/app/482]
R2[resource template: issue://{owner}/{repo}/{number}]
P1[prompt: triage_inbox]
end
Server -. stdio .- Host
A typical conversation:
sequenceDiagram
autonumber
participant U as User
participant H as Host (editor)
participant L as LLM
participant S as issues server
U->>H: "Find recent payments bugs and triage them"
H->>L: user msg + tool list (from S)
L->>H: tool_call: search_issues({"q":"label:payments state:open"})
H->>S: tools/call search_issues
S-->>H: content [text, data]
H->>L: tool_result
L->>H: tool_call: comment_issue({"id":482,"body":"Looks like a duplicate of #401"})
H->>U: ⚠️ confirm? "Comment on #482"
U->>H: yes
H->>S: tools/call comment_issue
S-->>H: content [text "ok"]
H->>L: tool_result
L-->>H: final user-facing summary
H-->>U: "Found 7 issues; commented duplicate hint on #482."
Two host-policy rules visible in the diagram:
- Read-only tool calls (search_issues) auto-execute.
- Side-effecting tool calls (comment_issue) require explicit user confirmation, with the exact arguments shown.
This is not in the protocol — it’s host policy. But the protocol gives the host enough metadata (tool name + structured args) to enforce it cleanly.
How MCP and OpenAPI relate
Common question. Short answer:
| OpenAPI | MCP | |
|---|---|---|
| Transport | HTTP | JSON-RPC over stdio or HTTP+SSE |
| Caller | any HTTP client | an LLM (via the host) |
| Discovery | spec doc | tools/list, resources/list at runtime |
| Granularity | endpoints | tools / resources / prompts |
| Streaming | varies | first-class (SSE notifications, partial results) |
| Auth | any | server’s choice; stdio uses process boundary |
A good rule: wrap an OpenAPI service with an MCP server when an LLM should call it. The MCP server adds the model-friendly schema, the descriptions the model actually reads, and the host-side policy hooks. It doesn’t replace your REST API.
How MCP and A2A relate
Even shorter:
- MCP is vertical: agent → its tools.
- A2A is horizontal: agent → another agent.
flowchart LR user[User] --> A[Agent A] A -- MCP --> tA[(A's tools)] A -- A2A --> B[Agent B] B -- MCP --> tB[(B's tools)]
Most production stacks end up using both. See the A2A post for that side.
Patterns that hold up
A few things I’ve learned the hard way building/integrating MCP servers:
- Keep tools narrow.
read_file+write_fileis better than one omnibusfs_op({op,...}). The model picks tools by name and description; cluttered surfaces lead to wrong picks. - Schema first, prose second. Tools with strict
inputSchemaandoutputSchemasurvive across model upgrades. Tools that depend on verbose descriptions for “rules” rot fast. - Resources, not tool-returned blobs, for anything bigger than a paragraph. Tool results sit in the LLM’s context; resources are referenceable. A 50KB query result as a resource link is much cheaper than as a tool result.
- Side effects must be explicit. Either name the tool with a verb
(
create_*,delete_*) or set anannotations.destructive: trueflag so the host can require confirmation. - Idempotency keys. If a tool can be retried (and it can), accept an
idempotencyKeyargument and de-duplicate server-side. - Logs over print. Use the
loggingcapability +notifications/message, not stdout. On stdio transport, stray prints corrupt the JSON stream. - Roots are a contract. Honour them. A filesystem server that reads outside the declared roots is a security bug, not a feature.
- Version your tools. When a tool’s schema changes incompatibly, ship a
new name (
search_v2) and keep the old one for a deprecation window. Hosts cache lists; flipping shapes mid-session breaks things.
Failure modes
| failure | symptom | mitigation |
|---|---|---|
| Server crashes | stdio pipe closes, host shows “disconnected” | Host auto-restarts; server keeps state in a file or db |
| Tool-call schema mismatch | LLM keeps calling with wrong args | Tighten inputSchema; add an example in the description; reject early with a structured error |
| Prompt injection via resource | Server returns content that hijacks the LLM | Treat resource text as data, not instructions; sandbox HTML / markdown rendering in the host |
| Sampling abuse | Server spams sampling/createMessage | Per-server budget + user approval gating |
| Streaming back-pressure (HTTP/SSE) | Client consumes slowly, server buffers grow | Cap stream buffer; drop oldest non-essential notifications first |
| Capability drift | Server quietly stops supporting a feature | Re-initialize on reconnect; never assume capabilities persist across sessions |
What’s next for MCP
The protocol is moving fast, but a few directions look stable:
- Better authz primitives, especially for HTTP transport — standardised scopes per tool, signed tool descriptors so a registry can verify them.
- First-class evals: a way for a host to declare “I expect this tool to satisfy these test cases” so server upgrades can be vetted automatically.
- Cost/latency telemetry in tool results (already conventional in some servers, not yet in the spec).
- Composable servers: an MCP server that itself acts as an MCP client to other servers, presenting a curated subset upward. The “façade” pattern works today but isn’t blessed.
MCP’s whole appeal is that it’s small. The interesting future is in the ecosystem — schema-pinned tool packs, signed registries, and the host-side policies that make this whole thing safe to hand to a real model with real credentials.
If you’re building, start with one server, one transport (stdio), and three tools. Get the host-side approval UX right before you scale the surface. The rest of the spec is there when you need it.