Gram — The MCP Cloud

Your fast path to production MCP. build, deploy and scale your MCP servers with ease using Gram's cloud platform.

What is MCP? An overview of the Model Context Protocol

The AI ecosystem has evolved so much that today’s LLMs are no longer just completion engines. Coupled with an interface, LLMs can become clients capable of taking actions, querying systems, and interpreting structured content.

In modern architectures, LLMs often operate alongside databases, APIs, or custom functions. To support this, developers frequently inject external data directly into their prompts. The simplest way to do this is to pull data from a service and pass it as plain text to the model:


import requests
from openai import OpenAI
 
client = OpenAI()
 
# Step 1: Get real-world data
response = requests.get("https://api.weatherapi.com/v1/current.json?q=Paris&key=demo")
weather = response.json()
 
# Step 2: Inject it into a prompt manually
prompt = f"""
You are a weather assistant.
Here's the current weather in Paris:
Temperature: {weather['current']['temp_c']}°C
Condition: {weather['current']['condition']['text']}
 
Write a friendly weather summary.
"""
 
# Step 3: Call the model
completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": prompt}
    ],
)

This works, but only on a small scale. The model has to interpret the meaning of the data from plain text, which bloats token usage and loses structure.

To minimize token use and loss of structure, native toolchains emerged. Using toolchains such as OpenAI function calling, Anthropic tools, and Mistral functions, you can define structured input and output schemas, register tools, and let the model call them in a controlled way.


from openai import OpenAI
 
client = OpenAI()
 
tools = [{
    "type": "function",
    "function": {
        "name": "query_user_count",
        "description": "Query the database to get the total number of users.",
        "parameters": {
            "type": "object",
            "properties": {
                "table": {
                    "type": "string",
                    "description": "Name of the table to query, e.g. 'users'"
                }
            },
            "required": ["table"],
            "additionalProperties": False
        }
    }
}]
 
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {
            "role": "user",
            "content": "How many users are currently in the system?"
        }
    ],
    tools=tools,
    tool_choice="auto"
)
 
# Output the tool call generated by the model
print(response.choices[0].message.tool_calls)

However, using these toolchains comes with some issues:

You are limited to a single model provider.
Prompts also remain brittle, undocumented, and tied to proprietary formats.

This is where the Model Context Protocol (MCP) comes in.

What is MCP?

MCP is a low-level JSON-RPC protocol originally proposed by Anthropic. It standardizes communication between LLMs and real-world environments.

At its core, MCP defines two roles: the server and the client.

The MCP server is the backend that exposes three primitives:

Tools: Functions that perform actions (such as writing a file or placing an order)
Resources: Read-only, URI-addressable context (for example, logs, config, and APIs)
Prompts: Reusable message templates that guide interactions with the MCP server

The MCP client is the component that understands the protocol and communicates with the MCP server by sending the following requests and interpreting the server’s structured responses:

tools/call
resources/read
prompts/get

Here’s where confusion often arises: Who actually talks to the MCP server? Is it the LLM or the agent?

The answer is that neither the LLM nor the agent talks directly to the MCP server.

The MCP client acts as the middle layer between the server and the model. It may be embedded in:

A desktop application (such as Claude Desktop)
An agent runtime
A custom LLM client

These systems use the MCP client to interact with the MCP server, but they are still responsible for integrating results into LLM prompts or workflows.

For example, in an agentic architecture, the flow looks like this:

The user inputs, “How many users signed up this month?”
The agent parses the request and calls a tool.
The agent’s embedded MCP client sends a tools/call request to the MCP server.
The MCP server executes the tool function, which makes a database query and returns a result. The MCP server returns the result with a structured response to the MCP client.
The agent takes that response and incorporates it into the next LLM prompt or uses it to make a decision.

How does MCP differ from APIs and tool calling?

Unlike many client-server protocols where clients only make requests and servers only respond, MCP supports full bidirectional communication.

Client-to-server: The client can request resources, call tools, and get prompts.
Server-to-client: The server can request LLM sampling, ask for root listings, and send notifications.
Notifications: Both sides can send one-way messages without expecting responses.

This bidirectional design enables workflows in which servers can actively participate in decision-making by requesting LLM assistance through the client.

MCP Architecture

MCP offers SDKs in multiple languages (including Python and JavaScript) to simplify building and exposing MCP-compliant servers.

MCP Server Example

Here is an example of an MCP server in Python that exposes a single tool to write notes to a local directory:

mcp-server.py


from mcp.server.fastmcp import FastMCP
from mcp.server.stdio import stdio_server
from mcp.server import InitializationOptions, NotificationOptions
from pathlib import Path
import asyncio
 
NOTES_DIR = Path("notes")
NOTES_DIR.mkdir(exist_ok=True)
 
app = FastMCP("MCP Notes Server")
 
#### TOOLS ####
@app.tool()
def write_note(slug: str, content: str) -> str:
  note_path = NOTES_DIR / f"{slug}.txt"
  note_path.write_text(content.strip(), encoding="utf-8")
  return f"Note '{slug}' saved."
 
#### RESOURCES ####
@app.resource("note://{slug}")
def read_note(slug: str) -> str:
  note_path = NOTES_DIR / f"{slug}.txt"
  if not note_path.exists():
      return "Note not found."
  return note_path.read_text(encoding="utf-8")
 
#### PROMPTS ####
@app.prompt()
def suggest_note_prompt(topic: str) -> str:
  return f"""
      Write a short, thoughtful note for someone who needs advice about: {topic}
  """
 
#### ROOTS ####
@app.tool()
def list_notes(root: str = None) -> list[str]:
  all_slugs = [f"note://{f.stem}" for f in NOTES_DIR.glob("*.txt")]
  if root:
      return [slug for slug in all_slugs if slug.startswith(root)]
  return all_slugs
 
# -----------------------------
# ENTRYPOINT (Explicit Transport)
# -----------------------------
async def main():
  async with stdio_server() as (read_stream, write_stream):
      await app.run(
          read_stream,
          write_stream,
          InitializationOptions(
              server_name="mcp-notes-server",
              server_version="0.1.0",
              capabilities=app.get_capabilities(
                  notification_options=NotificationOptions(),
                  experimental_capabilities={},
              ),
          ),
      )
 
if __name__ == "__main__":
  asyncio.run(main())

Let’s break it down into chunks:

Tools

A tool is a function exposed by the server that an AI agent can call using JSON-RPC. Each tool includes its name, input parameters, output schema, and a description.


#### TOOLS ####
@app.tool()
def write_note(slug: str, content: str) -> str:
  note_path = NOTES_DIR / f"{slug}.txt"
  note_path.write_text(content.strip(), encoding="utf-8")
  return f"Note '{slug}' saved."

Resources

Resources are named, read-only data references addressable via URIs. Resources are used to load contextual read-only data into the MCP session.


#### RESOURCES ####
@app.resource("note://{slug}")
def read_note(slug: str) -> str:
  note_path = NOTES_DIR / f"{slug}.txt"
  if not note_path.exists():
      return "Note not found."
  return note_path.read_text(encoding="utf-8")

Prompts

Prompts are predefined message templates that servers expose to clients. When invoked with arguments, they return a list of messages (messages[]) that the client sends to the LLM to perform a specific task.


#### PROMPTS ####
@app.prompt()
def suggest_note_prompt(topic: str) -> str:
  return f"""
      Write a short, thoughtful note for someone who needs advice about: {topic}
  """

Roots

Roots are a set of scoped access boundaries provided by the client at handshake. Roots allow the server to limit the visibility of tools, resources, or data based on a given namespace.


#### ROOTS ####
@app.tool()
def list_notes(root: str = None) -> list[str]:
  all_slugs = [f"note://{f.stem}" for f in NOTES_DIR.glob("*.txt")]
  if root:
      return [slug for slug in all_slugs if slug.startswith(root)]
  return all_slugs

Transports

A transport defines the communication method the server uses to handle client connections. MCP supports multiple transports, including stdio for local execution and http for web-based deployment.


async def main():
  async with stdio_server() as (read_stream, write_stream):
      await app.run(
          read_stream,
          write_stream,
          InitializationOptions(
              server_name="mcp-notes-server",
              server_version="0.1.0",
              capabilities=app.get_capabilities(
                  notification_options=NotificationOptions(),
                  experimental_capabilities={},
              ),
          ),
      )
 
if __name__ == "__main__":
  asyncio.run(main())

Last updated on September 16, 2025