14.03 — Essentials of the Protocol with Tool Calling¶

Overview¶

This lesson walks through the complete MCP interaction flow — from application startup to final answer delivery. Understanding this flow is critical because it reveals exactly how the MCP protocol orchestrates communication between the user, the application, the LLM, the MCP client, and the MCP server.

By the end of this lesson, you'll understand every step of what happens when an MCP-connected AI application processes a user's tool-requiring query.

The MCP Component Map¶

Before we trace the flow, let's identify every component involved:

flowchart LR
    User["👤 User"]

    subgraph Host["🖥️ AI Application (Host)"]
        direction TB
        AppLogic["Application Logic"]
        LLM["🤖 LLM"]
        Client["MCP Client"]
    end

    Server["⚙️ MCP Server"]

    User <--> AppLogic
    AppLogic <--> LLM
    AppLogic <--> Client
    Client <-->|"MCP Protocol"| Server

    style Host fill:#1e3a5f,color:#fff
    style Server fill:#10b981,color:#fff

Component	Role	Example
User	The person making queries	You, typing in Cursor or Claude Desktop
Host / Application	The AI application that hosts everything	Cursor, Windsurf, Claude Desktop, or your custom agent
LLM	The language model that processes queries and generates responses/tool calls	GPT-4, Claude 3.5, Gemini Pro
MCP Client	A component inside the host that speaks the MCP protocol to communicate with servers	Built into the host application — one client per server connection
MCP Server	An external service that exposes tools, resources, and prompts via MCP	Weather server, Slack server, database server

[!IMPORTANT] The MCP Client lives inside the host application. It's not a separate application — it's a component of the AI application itself. The host may contain multiple clients, each connected to a different MCP server. However, each client connects to exactly one server (1:1 relationship).

Phase 1: Initialization (Application Startup)¶

The MCP lifecycle begins before any user interaction — it happens when the AI application starts up.

sequenceDiagram
    participant App as 🖥️ Application (Host)
    participant C1 as MCP Client 1
    participant C2 as MCP Client 2
    participant S1 as ⚙️ Weather Server
    participant S2 as ⚙️ Slack Server

    Note over App: Application starts up

    App->>C1: Initialize client for Weather server
    C1->>S1: Connect (MCP handshake)
    S1-->>C1: Acknowledge + send available capabilities

    Note over C1,S1: Server reports:<br/>Tools: [get_forecast, get_alerts]<br/>Resources: [weather_data]

    App->>C2: Initialize client for Slack server
    C2->>S2: Connect (MCP handshake)
    S2-->>C2: Acknowledge + send available capabilities

    Note over C2,S2: Server reports:<br/>Tools: [send_message, list_channels]

    Note over App: Application now knows all<br/>available tools from all servers

What Happens During Initialization¶

The application reads its MCP configuration — which MCP servers to connect to (configured by the user, usually in a JSON config file)
For each configured server, the application creates an MCP Client — one client per server
Each client connects to its server using the MCP protocol — this can be via:
stdio (standard input/output) — for locally running servers
SSE (Server-Sent Events) — for remote servers running over HTTP
The MCP handshake happens — the client and server negotiate protocol version, capabilities, etc.
The server reports its available capabilities — tools, resources, and prompts

After initialization, the application knows exactly what tools are available across all connected MCP servers. This tool catalog is stored in memory and will be injected into LLM prompts when users make queries.

What the Server Reports¶

During initialization, each MCP server sends its capability manifest to the client. For tools, this includes:

{
  "tools": [
    {
      "name": "get_forecast",
      "description": "Get weather forecast for a specific location",
      "inputSchema": {
        "type": "object",
        "properties": {
          "city": {"type": "string", "description": "City name"},
          "days": {"type": "integer", "description": "Number of forecast days"}
        },
        "required": ["city"]
      }
    },
    {
      "name": "get_alerts",
      "description": "Get active weather alerts for a region",
      "inputSchema": {
        "type": "object",
        "properties": {
          "state": {"type": "string", "description": "US state code (e.g., CA)"}
        },
        "required": ["state"]
      }
    }
  ]
}

Notice that each tool definition includes a name, a description (which will be shown to the LLM so it can decide when to use the tool), and an input schema (which tells the LLM what arguments to provide). This is the information that bridges the gap between the MCP server's capabilities and the LLM's understanding.

Phase 2: User Query Processing¶

Once initialization is complete, the application is ready to handle user queries. Here's the complete flow for a query that requires tool use:

sequenceDiagram
    participant U as 👤 User
    participant App as 🖥️ Application
    participant LLM as 🤖 LLM
    participant Client as MCP Client
    participant Server as ⚙️ MCP Server

    U->>App: "What's the weather forecast for California?"

    Note over App: Augment query with<br/>available tools from all<br/>connected MCP servers
    App->>LLM: User query + tool descriptions<br/>(from MCP initialization)

    Note over LLM: Decides: I need the<br/>get_forecast tool
    LLM-->>App: Tool call: get_forecast(city="California")

    Note over App: Detects tool call in<br/>LLM response
    App->>Client: Forward tool call to appropriate client
    Client->>Server: Execute: get_forecast(city="California")

    Note over Server: Server runs the actual<br/>tool code (API call, etc.)
    Server-->>Client: Result: {"forecast": "Sunny, 75°F..."}
    Client-->>App: Tool result

    Note over App: Send tool result back<br/>to LLM for final answer
    App->>LLM: Original query + tool result
    LLM-->>App: "The forecast for California is sunny with temperatures around 75°F..."

    App-->>U: Display final answer

Step-by-Step Breakdown¶

Step 1 — User Query: The user types a question into the AI application.

Step 2 — Prompt Augmentation: The application takes the user's query and augments it with the tool descriptions that were collected during initialization. This is the same tool-calling mechanism we discussed in the previous lesson, but now the tool descriptions come from MCP servers rather than being hardcoded.

Step 3 — LLM Decision: The LLM receives the augmented prompt (user query + available tools) and makes a decision: - If the query can be answered from its training data → generate a direct answer - If the query requires external information → generate a tool call specifying which tool to invoke and with what arguments

Step 4 — Tool Execution via MCP: This is the key difference between MCP and traditional tool calling:

Traditional Tool Calling	MCP Tool Calling
Application executes the tool locally	Application sends the tool call to the MCP server
Tool function runs in the application process	Tool function runs in the server process
Tool code is part of the application	Tool code is decoupled from the application

The application routes the tool call through the MCP Client, which sends it to the appropriate MCP Server over the MCP protocol. The server executes the actual tool code (makes the API call, queries the database, etc.) and returns the result.

Step 5 — Result Return: The MCP Server sends the tool result back through the MCP Client to the application.

Step 6 — Final LLM Call: The application makes a second LLM call with the original query plus the tool result. The LLM generates a natural language answer grounded in the real data.

Step 7 — User Receives Answer: The application displays the final answer to the user.

The Key Difference: Where Tools Execute¶

The most important architectural difference between traditional tool calling and MCP tool calling is where the tool code runs:

flowchart TD
    subgraph Traditional["Traditional: Tools in Application"]
        A1["🖥️ Application"] --> A2["🔧 Tool: get_weather()"]
        A2 --> A3["🌐 Weather API"]
        A1 --> A4["🔧 Tool: send_email()"]
        A4 --> A5["📧 Gmail API"]
    end

    subgraph MCP_Flow["MCP: Tools in Servers"]
        B1["🖥️ Application"] --> B2["MCP Client"]
        B2 -->|"MCP Protocol"| B3["⚙️ Weather Server"]
        B3 --> B4["🌐 Weather API"]
        B2 -->|"MCP Protocol"| B5["⚙️ Email Server"]
        B5 --> B6["📧 Gmail API"]
    end

    style Traditional fill:#ef4444,color:#fff
    style MCP_Flow fill:#10b981,color:#fff

Why does this matter? Because decoupling tool execution from the application provides several major advantages:

Advantage	Explanation
Independent scaling	MCP servers can be deployed, scaled, and monitored independently from the AI application. Run them on Kubernetes, serverless, Docker — whatever makes sense.
Independent updates	Update, fix, or add new tools to an MCP server without redeploying the AI application. The client re-initializes and discovers the new capabilities automatically.
Separation of concerns	The AI application handles orchestration (when to call tools). The MCP server handles execution (how to call tools). Clean architecture.
Debugging & logging	Monitor tool execution separately from application logic. Track which tools are being called, how often, with what arguments, and what they return.
Dynamic tool discovery	The client can periodically re-initialize, discovering new tools that have been added to the server since the last check. The agent gains new capabilities without code changes.
Security isolation	Tools run in a separate process (or even a separate machine), limiting the blast radius if something goes wrong.

Dynamic Tool Discovery¶

One of the most powerful features enabled by MCP's architecture is dynamic tool discovery. Because the client discovers tools by querying the server at initialization time, you can:

Add a new tool to an MCP server (e.g., add get_humidity alongside get_forecast)
Restart the MCP server (or wait for the client to re-initialize)
The AI application automatically discovers the new tool — no code changes, no redeployment

This means your AI agents can evolve their capabilities over time just by updating the MCP servers they connect to. You don't need to redeploy the agent itself.

[!TIP] In a production setup, you can configure the MCP client to re-initialize periodically (e.g., every hour), so new tools are discovered automatically. This gives you the behavior of dynamic tool calling — the agent's capabilities grow without any downtime or redeployment.

Summary¶

The MCP interaction flow has two phases:

Phase 1: Initialization (at startup)¶

Application creates MCP Clients (one per configured server)
Each client connects to its server via the MCP protocol
Servers report their available tools, resources, and prompts
Application stores the complete tool catalog in memory

Phase 2: Query Processing (per user query)¶

User sends a query
Application augments the query with available tool descriptions (from initialization)
LLM generates a tool call (or a direct answer)
Application routes the tool call through MCP Client → MCP Server
Server executes the tool and returns the result
Application sends the result back to the LLM for final answer generation
Final answer is displayed to the user

The key architectural insight: tool execution is decoupled from the application. The LLM decides what to call, the MCP server executes how to call it, and the MCP protocol connects them.