mcpbridge-ai

A production-grade Python package that connects any LLM provider to any MCP server. One config dict, one await bridge.run(), and the model can call MCP tools autonomously until it has an answer.

mcpbridge handles the entire lifecycle: transport negotiation, MCP protocol handshake, tool discovery, schema conversion, multi-turn tool calling, argument validation, history management, and graceful error recovery -- so your application code does not have to.

Install: pip install mcpbridge-ai Import: from mcpbridge import MCPBridge

Why mcpbridge

The Model Context Protocol (MCP) defines how tools, resources, and prompts are exposed over JSON-RPC. Every LLM provider has a different format for tool definitions, tool call extraction, and tool result insertion. Writing the glue between one provider and one MCP server is tedious. Writing it for eleven providers and four transport types is a maintenance problem.

mcpbridge solves this by sitting in the middle:

Your app  -->  MCPBridge  -->  LLM provider  (tool definitions, chat calls)
                   |
                   +--------->  MCP server   (tool discovery, tool execution)

You describe the LLM and the MCP server in a config dict, and mcpbridge does the rest: discovers tools, converts schemas, runs the agentic loop, validates arguments, retries on failure, trims history, and returns a clean result.

Installation

Install the base package (no LLM provider SDKs):

pip install mcpbridge-ai

Install with a specific provider:

pip install "mcpbridge-ai[openai]"
pip install "mcpbridge-ai[anthropic]"
pip install "mcpbridge-ai[gemini]"
pip install "mcpbridge-ai[groq]"
pip install "mcpbridge-ai[mistral]"
pip install "mcpbridge-ai[cohere]"
pip install "mcpbridge-ai[ollama]"
pip install "mcpbridge-ai[together]"
pip install "mcpbridge-ai[bedrock]"
pip install "mcpbridge-ai[azure]"

Install everything:

pip install "mcpbridge-ai[all]"

Requires Python 3.10 or later.

Quickstart

import asyncio
from mcpbridge import MCPBridge


async def main():
    bridge = await MCPBridge(
        {
            "llm": {
                "provider": "openai",
                "model": "gpt-4o",
                "api_key": "sk-...",
            },
            "mcp": {
                "url": "http://localhost:3000",
                "transport": "streamable_http",
            },
            "prompt": {
                "system": "You are a helpful assistant.",
            },
            "loop": {
                "max_iterations": 10,
            },
        }
    ).connect()

    result = await bridge.run("What is the weather in Tokyo?")
    print(result.text)
    print(result.finish_reason)   # "done" or "max_iterations"
    print(result.tool_calls_made) # list of tool calls with results

    await bridge.close()


if __name__ == "__main__":
    asyncio.run(main())

MCPBridge also works as an async context manager:

async with MCPBridge(config) as bridge:
    result = await bridge.run("Hello")

How It Works

Connect -- MCPBridge.connect() opens the MCP transport, performs the JSON-RPC initialize / initialized handshake, and discovers all tools, resources, and prompts exposed by the server.
Build prompt -- The PromptBuilder assembles a system prompt from internal instructions, the user-defined system prompt (with {var} interpolation), and auto-generated tool descriptions.
Call the LLM -- The adapter converts discovered MCP tools into the provider-specific schema format and sends them alongside the message history.
Extract tool calls -- If the LLM response contains tool calls, the adapter normalizes them into ToolCall(id, name, arguments) objects.
Execute tools -- Each tool call is validated against the JSON schema, dispatched to the correct MCP server transport, and timed out if it takes too long. Failed tool calls can be retried once.
Append results -- Tool results are inserted back into the message history in the provider-specific format, and the loop returns to step 3.
Return -- When the LLM produces a final answer (no tool calls), or when max_iterations is reached, the loop returns a LoopResult with the assistant text, the full tool call log, token estimates, and a finish_reason.

Configuration Reference

The config is a plain Python dict with five top-level keys.

`llm`

| Field | Type | Default | Description | |---|---|---|---| | provider | str | (required) | anthropic, openai, gemini, mistral, cohere, groq, ollama, together, bedrock, azure_openai, openai_compatible | | model | str | "" | Model name or short alias (see Model Aliases) | | api_key | str or null | null | Falls back to provider env var if not set | | base_url | str or null | null | Override the provider API endpoint | | temperature | float | 0.7 | Sampling temperature | | max_tokens | int | 4096 | Maximum output tokens | | top_p | float or null | null | Nucleus sampling | | top_k | int or null | null | Top-k sampling (providers that support it) | | stop_sequences | list | [] | Stop sequences | | stream | bool | false | Accepted for forward-compatibility; falls back to non-streaming in this version | | timeout | int | 60 | HTTP timeout in seconds for the LLM call | | extra_params | dict | {} | Provider-specific parameters passed through verbatim | | thinking | object | {"enabled": false, "budget_tokens": 1024} | Anthropic extended thinking | | azure_deployment | str | "" | Azure OpenAI deployment name | | azure_api_version | str | "2024-02-01" | Azure API version | | aws_region | str | "us-east-1" | Bedrock region | | aws_profile | str | "" | Bedrock named profile |

`mcp`

| Field | Type | Default | Description | |---|---|---|---| | url | str or null | null | Remote MCP server URL | | transport | str | "auto" | auto, http, sse, streamable_http, ws, stdio | | headers | dict | {} | HTTP headers for authentication and other purposes | | timeout | int | 30 | MCP transport timeout in seconds | | command | str or null | null | Subprocess command for stdio transport | | args | list | [] | Subprocess arguments for stdio transport | | env | dict | {} | Environment variables passed to stdio subprocess | | servers | list or null | null | Multi-server config; overrides top-level url/command | | namespace_strategy | str | "prefix" | prefix, error, last_wins |

`prompt`

| Field | Type | Default | Description | |---|---|---|---| | system | str | "" | System prompt text | | interpolate | bool | true | Enable {variable} interpolation in the system prompt | | context_vars | dict | {} | Initial interpolation variables | | inject_tool_descriptions | bool | true | Append tool descriptions to the system prompt | | inject_internal_instructions | bool | true | Prepend internal tool-use rules to the system prompt | | user_prefix | str | "" | Prepended to every user query | | user_suffix | str | "" | Appended to every user query |

`loop`

| Field | Type | Default | Description | |---|---|---|---| | max_iterations | int | 10 | Maximum tool-calling iterations before returning | | max_tokens_total | int | 32000 | Best-effort total token budget for the loop | | tool_timeout | int | 30 | Timeout in seconds for each MCP tool call | | parallel_tool_calls | bool | true | Execute multiple tool calls concurrently | | on_tool_call | callable or null | null | async def on_tool_call(name, args) | | on_tool_result | callable or null | null | async def on_tool_result(name, tool_result) | | on_iteration | callable or null | null | async def on_iteration(iteration, messages) | | retry_on_tool_error | bool | true | Retry a failed tool call once | | error_strategy | str | "return_error_to_llm" | raise, return_error_to_llm, skip |

`session`

| Field | Type | Default | Description | |---|---|---|---| | persist_history | bool | true | Keep conversation history across runs | | max_history_tokens | int | 32000 | Trim history when it exceeds this token count | | history_trim_strategy | str | "oldest_first" | oldest_first or summarize |

Supported LLM Providers

| Provider | Pip extra | Env var | Tool format | |---|---|---|---| | Anthropic Claude | mcpbridge-ai[anthropic] | ANTHROPIC_API_KEY | input_schema content blocks | | OpenAI | mcpbridge-ai[openai] | OPENAI_API_KEY | OpenAI tool_calls | | Google Gemini | mcpbridge-ai[gemini] | GOOGLE_API_KEY | function_declarations | | Mistral | mcpbridge-ai[mistral] | MISTRAL_API_KEY | OpenAI-compatible | | Cohere | mcpbridge-ai[cohere] | COHERE_API_KEY | Flat parameter_definitions | | Groq | mcpbridge-ai[groq] | GROQ_API_KEY | OpenAI-compatible | | Ollama | mcpbridge-ai[ollama] | (none) | OpenAI-compatible, local | | Together AI | mcpbridge-ai[together] | TOGETHER_API_KEY | OpenAI-compatible | | AWS Bedrock | mcpbridge-ai[bedrock] | AWS credential chain | Converse API toolSpec | | Azure OpenAI | mcpbridge-ai[azure] | AZURE_OPENAI_API_KEY | OpenAI-compatible | | OpenAI-compatible | (none) | OPENAI_API_KEY (optional) | Any /chat/completions endpoint |

If a provider SDK is not installed, mcpbridge raises an ImportError at adapter init time with the exact pip install command needed.

MCP Transport Types

| Transport | When to use | Config fields | |---|---|---| | stdio | Local MCP server as a subprocess | command, args, env | | streamable_http | Preferred for remote MCP servers | url, headers | | http_sse | Legacy HTTP + SSE servers | url, headers | | ws / websocket | WebSocket MCP servers | url, headers |

Auto-detection rules (when transport is "auto"):

command is set -- stdio
URL starts with ws:// or wss:// -- websocket
URL ends with /sse -- http_sse
Otherwise -- streamable_http

MCP Server Authentication

MCP servers may require authentication. mcpbridge handles this at the transport layer, not inside JSON-RPC payloads.

HTTP / SSE / Streamable HTTP / WebSocket -- pass credentials via mcp.headers:

"mcp": {
    "url": "https://secure-mcp-server.example.com",
    "transport": "streamable_http",
    "headers": {
        "Authorization": "Bearer YOUR_TOKEN",
    },
}

stdio -- pass credentials via environment variables to the subprocess:

"mcp": {
    "command": "npx",
    "args": ["@some/mcp-server"],
    "env": {
        "MCP_API_KEY": "YOUR_TOKEN",
    },
}

For multi-server setups, each server entry supports its own headers and env. Top-level mcp.headers are merged into every server's headers automatically.

Multi-Server Setup

Connect to multiple MCP servers and let the LLM choose tools from any of them. Tool names are automatically namespaced to prevent collisions.

config = {
    "llm": {"provider": "groq", "model": "llama-3.3-70b-versatile"},
    "mcp": {
        "servers": [
            {
                "name": "files",
                "command": "npx",
                "args": ["@modelcontextprotocol/server-filesystem"],
                "transport": "stdio",
            },
            {
                "name": "db",
                "url": "http://localhost:8080",
                "transport": "streamable_http",
                "headers": {"Authorization": "Bearer DB_TOKEN"},
            },
        ],
        "namespace_strategy": "prefix",
    },
    "prompt": {"system": "Use tools when needed."},
    "loop": {"max_iterations": 10},
}

Namespace strategies:

prefix (default) -- tool names become servername__toolname
error -- raise ConfigValidationError on name collision
last_wins -- later server overwrites earlier tools with the same name

You can also add and remove servers at runtime:

await bridge.add_server("analytics", {"url": "http://localhost:9090"})
await bridge.remove_server("analytics")

System Prompt and Context Injection

The system prompt supports {variable} interpolation. Variables can be set at config time or at runtime.

config = {
    "llm": {"provider": "openai", "model": "gpt-4o", "api_key": "sk-..."},
    "mcp": {"url": "http://localhost:3000", "transport": "streamable_http"},
    "prompt": {
        "system": "You are a translator. Translate to {language}.",
        "context_vars": {"language": "Hindi"},
    },
    "loop": {"max_iterations": 10},
}

At runtime:

bridge.set_context(language="Japanese")
result = await bridge.run("Translate: Good morning")

Session Manager

For multi-user applications (web servers, APIs), SessionManager pools one MCPBridge instance per session and handles idle cleanup.

from fastapi import FastAPI
from mcpbridge import SessionManager

app = FastAPI()

base_config = {
    "llm": {"provider": "groq", "model": "llama-3.3-70b-versatile"},
    "mcp": {"url": "http://localhost:3000", "transport": "streamable_http"},
    "prompt": {"system": "You are a helpful assistant."},
    "loop": {"max_iterations": 10},
    "session": {"persist_history": True, "max_history_tokens": 32000},
}
manager = SessionManager(base_config, max_sessions=1000)

@app.on_event("startup")
async def startup():
    await manager.start_cleanup_task(ttl_seconds=3600)

@app.post("/chat")
async def chat(session_id: str, message: str):
    bridge = await manager.get_or_create(session_id)
    result = await bridge.run(message)
    return {"text": result.text, "finish_reason": result.finish_reason}

Key methods:

get_or_create(session_id) -- returns an existing or new connected bridge
destroy(session_id) -- closes and removes a session
destroy_all() -- shuts down all sessions
active_count() -- number of active sessions
start_cleanup_task(ttl_seconds, interval_seconds) -- background cleanup

Callback Hooks

Monitor tool calls and loop iterations in real time.

async def on_tool_call(name, args):
    print(f"Calling tool: {name} with {args}")

async def on_tool_result(name, result):
    print(f"Tool result: {name} -> {result.content} (error={result.is_error})")

async def on_iteration(iteration, messages):
    print(f"Loop iteration {iteration}")

config = {
    "llm": {"provider": "openai", "model": "gpt-4o", "api_key": "sk-..."},
    "mcp": {"url": "http://localhost:3000", "transport": "streamable_http"},
    "prompt": {"system": "Use tools when needed."},
    "loop": {
        "max_iterations": 10,
        "on_tool_call": on_tool_call,
        "on_tool_result": on_tool_result,
        "on_iteration": on_iteration,
    },
}

Callbacks can be synchronous or asynchronous. mcpbridge will await coroutines automatically.

Discovery Helpers

After connecting, you can inspect MCP server capabilities directly.

async with MCPBridge(config) as bridge:
    tools = bridge.list_tools()
    for t in tools:
        print(t.namespaced_name, t.description)

    resources = await bridge.list_resources()
    content = await bridge.read_resource("file:///path/to/resource")

    prompts = await bridge.list_prompts()
    rendered = await bridge.get_prompt("prompt_name", {"arg": "value"})

Custom LLM Adapter

To add a provider that mcpbridge does not support out of the box, subclass BaseLLMAdapter and implement the required methods.

from mcpbridge.adapters.base import BaseLLMAdapter, ToolCall, ToolResult


class MyAdapter(BaseLLMAdapter):
    @property
    def provider_name(self) -> str:
        return "my_provider"

    @property
    def supports_parallel_tool_calls(self) -> bool:
        return True

    @property
    def supports_system_prompt(self) -> bool:
        return True

    async def chat(self, messages, tools, system=None, **kwargs):
        # Call your provider and return the raw response.
        ...

    def extract_tool_calls(self, response) -> list[ToolCall]:
        # Parse tool calls from the raw response.
        ...

    def extract_text(self, response) -> str:
        # Extract the final assistant text.
        ...

    def is_done(self, response) -> bool:
        # True when there are no pending tool calls.
        ...

    def append_tool_results(self, messages, response, tool_results):
        # Insert tool results into the message history.
        ...

    def count_tokens(self, messages, system) -> int:
        # Best-effort token estimate for history trimming.
        ...

The adapter contract is simple: append_tool_results() must produce a message history that chat() can accept on the next call. Everything else follows.

Error Handling

mcpbridge defines a structured exception hierarchy. Every exception inherits from MCPBridgeError, so you can catch broadly or narrowly.

Configuration

| Exception | When | |---|---| | ConfigValidationError | Invalid config dict | | ProviderNotFoundError | Unknown llm.provider value | | APIKeyMissingError | No API key found (config or env var) |

Transport

| Exception | When | |---|---| | TransportConnectionError | Cannot connect to MCP server | | TransportDisconnectedError | Connection dropped mid-session | | TransportTimeoutError | MCP request timed out |

MCP Protocol

| Exception | When | |---|---| | MCPProtocolError | Malformed JSON-RPC or unexpected server response | | MCPToolNotFoundError | Tool name not in discovered registry | | MCPToolCallError | MCP server returned an error for tools/call | | MCPSchemaValidationError | Tool arguments failed JSON Schema validation |

LLM

| Exception | When | |---|---| | LLMRateLimitError | Provider returned 429 | | LLMAuthError | Provider authentication failed | | LLMContextLengthError | Context window exceeded | | ToolsNotSupportedError | Model does not support tool calling |

Loop

| Exception | When | |---|---| | LoopTimeoutError | Total token budget exceeded | | BridgeInUseError | Concurrent run() on the same bridge instance | | SessionLimitError | SessionManager exceeded max_sessions |

Note: max_iterations no longer raises an exception. When the loop reaches the configured limit, it returns a LoopResult with finish_reason="max_iterations" and best-effort text (extracted from the last LLM response or the most recent tool result). This prevents service crashes when a model gets stuck in a tool-calling loop.

Best Practices

Keep max_iterations reasonable. A value between 5 and 15 covers most real-world tool-calling workflows. If the model consistently hits the limit, the system prompt likely needs to be more explicit about when to stop calling tools.

Use error_strategy: "return_error_to_llm" in production. This is the default. When a tool call fails (schema mismatch, timeout, server error), the error message is returned to the LLM as a tool result so it can decide how to recover. The "raise" strategy is better for development and debugging.

Set persist_history: false for stateless endpoints. If each request is independent, disable history to avoid unbounded memory growth. For conversational use cases, set max_history_tokens to a value that fits within your model's context window.

Use namespace_strategy: "prefix" for multi-server setups. This is the default and prevents tool name collisions between servers. The LLM sees tool names like serverA__search and serverB__search and can distinguish them.

Authenticate MCP servers via mcp.headers or mcp.env. Do not embed credentials in URLs. For HTTP transports, use the Authorization header. For stdio transports, pass tokens through environment variables.

Use the on_tool_call and on_tool_result callbacks for logging and observability. They fire synchronously during the loop and give you full visibility into what the model is doing without modifying the loop behavior.

Check result.finish_reason after every run() call. A value of "done" means the model produced a final answer. A value of "max_iterations" means the loop was capped before the model finished. Your application should handle both cases.

Prefer short model aliases for readability. Instead of writing "claude-opus-4-20250514", write "opus". mcpbridge resolves aliases automatically (see Model Aliases).

Do not share a single MCPBridge instance across concurrent requests. Each run() call acquires an internal lock. For concurrent users, use SessionManager which creates one bridge per session.

LoopResult Reference

bridge.run() returns a LoopResult dataclass:

| Field | Type | Description | |---|---|---| | text | str | Final assistant text (or best-effort fallback) | | tool_calls_made | list[dict] | Each entry: {id, name, arguments, result, is_error} | | iterations | int | Number of tool-calling iterations completed | | total_input_tokens | int | Best-effort input token estimate | | total_output_tokens | int | Best-effort output token estimate | | finish_reason | str | "done" or "max_iterations" |

Model Aliases

Short aliases can be used in place of full model identifiers.

| Provider | Alias | Resolves to | |---|---|---| | anthropic | opus | claude-opus-4-20250514 | | anthropic | sonnet | claude-sonnet-4-20250514 | | anthropic | haiku | claude-haiku-4-5-20251001 | | openai | gpt4o | gpt-4o | | openai | gpt4o-mini | gpt-4o-mini | | openai | o3 | o3 | | openai | o3-mini | o3-mini | | openai | o4-mini | o4-mini | | gemini | flash | gemini-1.5-flash | | gemini | flash2 | gemini-2.0-flash | | gemini | pro | gemini-1.5-pro | | gemini | pro2 | gemini-2.0-pro | | mistral | large | mistral-large-latest | | mistral | small | mistral-small-latest | | groq | llama3 | llama-3.3-70b-versatile | | groq | llama3-small | llama-3.1-8b-instant | | cohere | command | command-r-plus | | together | llama3 | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo |

If the model string is not recognized as an alias, it is passed through as-is.

Limitations and Roadmap

Token streaming is not implemented. The stream parameter is accepted and will not raise an error, but the loop runs in non-streaming mode. A future version will add incremental token delivery via async generators.
LLM-level retry/backoff is not built in. If the LLM provider returns 429 or 5xx, the adapter raises immediately. Rate-limit retry logic should be handled at the application level or in a custom adapter.
History trimming is best-effort. Token estimates use tiktoken when available (OpenAI family) and fall back to a rough chars / 4 heuristic for other providers.

Running Tests

pip install "mcpbridge-ai[dev]"
pytest -q

The test suite uses fakes and mocks for all external dependencies (LLM providers, MCP servers). No API keys or running servers are needed.

License

MIT

MCP Servers