MCP server by Aaditya17032002
mcpbridge-ai
A production-grade Python package that connects any LLM provider to any MCP server.
One config dict, one await bridge.run(), and the model can call MCP tools
autonomously until it has an answer.
mcpbridge handles the entire lifecycle: transport negotiation, MCP protocol handshake, tool discovery, schema conversion, multi-turn tool calling, argument validation, history management, and graceful error recovery -- so your application code does not have to.
Install:
pip install mcpbridge-aiImport:from mcpbridge import MCPBridge
Table of Contents
- Why mcpbridge
- Installation
- Quickstart
- How It Works
- Configuration Reference
- Supported LLM Providers
- MCP Transport Types
- MCP Server Authentication
- Multi-Server Setup
- System Prompt and Context Injection
- Session Manager (Multi-User / FastAPI)
- Callback Hooks
- Discovery Helpers
- Custom LLM Adapter
- Error Handling
- Best Practices
- LoopResult Reference
- Model Aliases
- Limitations and Roadmap
- Running Tests
- Related Links
- License
Why mcpbridge
The Model Context Protocol (MCP) defines how tools, resources, and prompts are exposed over JSON-RPC. Every LLM provider has a different format for tool definitions, tool call extraction, and tool result insertion. Writing the glue between one provider and one MCP server is tedious. Writing it for eleven providers and four transport types is a maintenance problem.
mcpbridge solves this by sitting in the middle:
Your app --> MCPBridge --> LLM provider (tool definitions, chat calls)
|
+---------> MCP server (tool discovery, tool execution)
You describe the LLM and the MCP server in a config dict, and mcpbridge does the rest: discovers tools, converts schemas, runs the agentic loop, validates arguments, retries on failure, trims history, and returns a clean result.
Installation
Install the base package (no LLM provider SDKs):
pip install mcpbridge-ai
Install with a specific provider:
pip install "mcpbridge-ai[openai]"
pip install "mcpbridge-ai[anthropic]"
pip install "mcpbridge-ai[gemini]"
pip install "mcpbridge-ai[groq]"
pip install "mcpbridge-ai[mistral]"
pip install "mcpbridge-ai[cohere]"
pip install "mcpbridge-ai[ollama]"
pip install "mcpbridge-ai[together]"
pip install "mcpbridge-ai[bedrock]"
pip install "mcpbridge-ai[azure]"
Install everything:
pip install "mcpbridge-ai[all]"
Requires Python 3.10 or later.
Quickstart
import asyncio
from mcpbridge import MCPBridge
async def main():
bridge = await MCPBridge(
{
"llm": {
"provider": "openai",
"model": "gpt-4o",
"api_key": "sk-...",
},
"mcp": {
"url": "http://localhost:3000",
"transport": "streamable_http",
},
"prompt": {
"system": "You are a helpful assistant.",
},
"loop": {
"max_iterations": 10,
},
}
).connect()
result = await bridge.run("What is the weather in Tokyo?")
print(result.text)
print(result.finish_reason) # "done" or "max_iterations"
print(result.tool_calls_made) # list of tool calls with results
await bridge.close()
if __name__ == "__main__":
asyncio.run(main())
MCPBridge also works as an async context manager:
async with MCPBridge(config) as bridge:
result = await bridge.run("Hello")
How It Works
-
Connect --
MCPBridge.connect()opens the MCP transport, performs the JSON-RPCinitialize/initializedhandshake, and discovers all tools, resources, and prompts exposed by the server. -
Build prompt -- The
PromptBuilderassembles a system prompt from internal instructions, the user-defined system prompt (with{var}interpolation), and auto-generated tool descriptions. -
Call the LLM -- The adapter converts discovered MCP tools into the provider-specific schema format and sends them alongside the message history.
-
Extract tool calls -- If the LLM response contains tool calls, the adapter normalizes them into
ToolCall(id, name, arguments)objects. -
Execute tools -- Each tool call is validated against the JSON schema, dispatched to the correct MCP server transport, and timed out if it takes too long. Failed tool calls can be retried once.
-
Append results -- Tool results are inserted back into the message history in the provider-specific format, and the loop returns to step 3.
-
Return -- When the LLM produces a final answer (no tool calls), or when
max_iterationsis reached, the loop returns aLoopResultwith the assistant text, the full tool call log, token estimates, and afinish_reason.
Configuration Reference
The config is a plain Python dict with five top-level keys.
llm
| Field | Type | Default | Description |
|---|---|---|---|
| provider | str | (required) | anthropic, openai, gemini, mistral, cohere, groq, ollama, together, bedrock, azure_openai, openai_compatible |
| model | str | "" | Model name or short alias (see Model Aliases) |
| api_key | str or null | null | Falls back to provider env var if not set |
| base_url | str or null | null | Override the provider API endpoint |
| temperature | float | 0.7 | Sampling temperature |
| max_tokens | int | 4096 | Maximum output tokens |
| top_p | float or null | null | Nucleus sampling |
| top_k | int or null | null | Top-k sampling (providers that support it) |
| stop_sequences | list | [] | Stop sequences |
| stream | bool | false | Accepted for forward-compatibility; falls back to non-streaming in this version |
| timeout | int | 60 | HTTP timeout in seconds for the LLM call |
| extra_params | dict | {} | Provider-specific parameters passed through verbatim |
| thinking | object | {"enabled": false, "budget_tokens": 1024} | Anthropic extended thinking |
| azure_deployment | str | "" | Azure OpenAI deployment name |
| azure_api_version | str | "2024-02-01" | Azure API version |
| aws_region | str | "us-east-1" | Bedrock region |
| aws_profile | str | "" | Bedrock named profile |
mcp
| Field | Type | Default | Description |
|---|---|---|---|
| url | str or null | null | Remote MCP server URL |
| transport | str | "auto" | auto, http, sse, streamable_http, ws, stdio |
| headers | dict | {} | HTTP headers for authentication and other purposes |
| timeout | int | 30 | MCP transport timeout in seconds |
| command | str or null | null | Subprocess command for stdio transport |
| args | list | [] | Subprocess arguments for stdio transport |
| env | dict | {} | Environment variables passed to stdio subprocess |
| servers | list or null | null | Multi-server config; overrides top-level url/command |
| namespace_strategy | str | "prefix" | prefix, error, last_wins |
prompt
| Field | Type | Default | Description |
|---|---|---|---|
| system | str | "" | System prompt text |
| interpolate | bool | true | Enable {variable} interpolation in the system prompt |
| context_vars | dict | {} | Initial interpolation variables |
| inject_tool_descriptions | bool | true | Append tool descriptions to the system prompt |
| inject_internal_instructions | bool | true | Prepend internal tool-use rules to the system prompt |
| user_prefix | str | "" | Prepended to every user query |
| user_suffix | str | "" | Appended to every user query |
loop
| Field | Type | Default | Description |
|---|---|---|---|
| max_iterations | int | 10 | Maximum tool-calling iterations before returning |
| max_tokens_total | int | 32000 | Best-effort total token budget for the loop |
| tool_timeout | int | 30 | Timeout in seconds for each MCP tool call |
| parallel_tool_calls | bool | true | Execute multiple tool calls concurrently |
| on_tool_call | callable or null | null | async def on_tool_call(name, args) |
| on_tool_result | callable or null | null | async def on_tool_result(name, tool_result) |
| on_iteration | callable or null | null | async def on_iteration(iteration, messages) |
| retry_on_tool_error | bool | true | Retry a failed tool call once |
| error_strategy | str | "return_error_to_llm" | raise, return_error_to_llm, skip |
session
| Field | Type | Default | Description |
|---|---|---|---|
| persist_history | bool | true | Keep conversation history across runs |
| max_history_tokens | int | 32000 | Trim history when it exceeds this token count |
| history_trim_strategy | str | "oldest_first" | oldest_first or summarize |
Supported LLM Providers
| Provider | Pip extra | Env var | Tool format |
|---|---|---|---|
| Anthropic Claude | mcpbridge-ai[anthropic] | ANTHROPIC_API_KEY | input_schema content blocks |
| OpenAI | mcpbridge-ai[openai] | OPENAI_API_KEY | OpenAI tool_calls |
| Google Gemini | mcpbridge-ai[gemini] | GOOGLE_API_KEY | function_declarations |
| Mistral | mcpbridge-ai[mistral] | MISTRAL_API_KEY | OpenAI-compatible |
| Cohere | mcpbridge-ai[cohere] | COHERE_API_KEY | Flat parameter_definitions |
| Groq | mcpbridge-ai[groq] | GROQ_API_KEY | OpenAI-compatible |
| Ollama | mcpbridge-ai[ollama] | (none) | OpenAI-compatible, local |
| Together AI | mcpbridge-ai[together] | TOGETHER_API_KEY | OpenAI-compatible |
| AWS Bedrock | mcpbridge-ai[bedrock] | AWS credential chain | Converse API toolSpec |
| Azure OpenAI | mcpbridge-ai[azure] | AZURE_OPENAI_API_KEY | OpenAI-compatible |
| OpenAI-compatible | (none) | OPENAI_API_KEY (optional) | Any /chat/completions endpoint |
If a provider SDK is not installed, mcpbridge raises an ImportError at adapter
init time with the exact pip install command needed.
MCP Transport Types
| Transport | When to use | Config fields |
|---|---|---|
| stdio | Local MCP server as a subprocess | command, args, env |
| streamable_http | Preferred for remote MCP servers | url, headers |
| http_sse | Legacy HTTP + SSE servers | url, headers |
| ws / websocket | WebSocket MCP servers | url, headers |
Auto-detection rules (when transport is "auto"):
commandis set -- stdio- URL starts with
ws://orwss://-- websocket - URL ends with
/sse-- http_sse - Otherwise -- streamable_http
MCP Server Authentication
MCP servers may require authentication. mcpbridge handles this at the transport layer, not inside JSON-RPC payloads.
HTTP / SSE / Streamable HTTP / WebSocket -- pass credentials via mcp.headers:
"mcp": {
"url": "https://secure-mcp-server.example.com",
"transport": "streamable_http",
"headers": {
"Authorization": "Bearer YOUR_TOKEN",
},
}
stdio -- pass credentials via environment variables to the subprocess:
"mcp": {
"command": "npx",
"args": ["@some/mcp-server"],
"env": {
"MCP_API_KEY": "YOUR_TOKEN",
},
}
For multi-server setups, each server entry supports its own headers and env.
Top-level mcp.headers are merged into every server's headers automatically.
Multi-Server Setup
Connect to multiple MCP servers and let the LLM choose tools from any of them. Tool names are automatically namespaced to prevent collisions.
config = {
"llm": {"provider": "groq", "model": "llama-3.3-70b-versatile"},
"mcp": {
"servers": [
{
"name": "files",
"command": "npx",
"args": ["@modelcontextprotocol/server-filesystem"],
"transport": "stdio",
},
{
"name": "db",
"url": "http://localhost:8080",
"transport": "streamable_http",
"headers": {"Authorization": "Bearer DB_TOKEN"},
},
],
"namespace_strategy": "prefix",
},
"prompt": {"system": "Use tools when needed."},
"loop": {"max_iterations": 10},
}
Namespace strategies:
prefix(default) -- tool names becomeservername__toolnameerror-- raiseConfigValidationErroron name collisionlast_wins-- later server overwrites earlier tools with the same name
You can also add and remove servers at runtime:
await bridge.add_server("analytics", {"url": "http://localhost:9090"})
await bridge.remove_server("analytics")
System Prompt and Context Injection
The system prompt supports {variable} interpolation. Variables can be set at
config time or at runtime.
config = {
"llm": {"provider": "openai", "model": "gpt-4o", "api_key": "sk-..."},
"mcp": {"url": "http://localhost:3000", "transport": "streamable_http"},
"prompt": {
"system": "You are a translator. Translate to {language}.",
"context_vars": {"language": "Hindi"},
},
"loop": {"max_iterations": 10},
}
At runtime:
bridge.set_context(language="Japanese")
result = await bridge.run("Translate: Good morning")
Session Manager
For multi-user applications (web servers, APIs), SessionManager pools one
MCPBridge instance per session and handles idle cleanup.
from fastapi import FastAPI
from mcpbridge import SessionManager
app = FastAPI()
base_config = {
"llm": {"provider": "groq", "model": "llama-3.3-70b-versatile"},
"mcp": {"url": "http://localhost:3000", "transport": "streamable_http"},
"prompt": {"system": "You are a helpful assistant."},
"loop": {"max_iterations": 10},
"session": {"persist_history": True, "max_history_tokens": 32000},
}
manager = SessionManager(base_config, max_sessions=1000)
@app.on_event("startup")
async def startup():
await manager.start_cleanup_task(ttl_seconds=3600)
@app.post("/chat")
async def chat(session_id: str, message: str):
bridge = await manager.get_or_create(session_id)
result = await bridge.run(message)
return {"text": result.text, "finish_reason": result.finish_reason}
Key methods:
get_or_create(session_id)-- returns an existing or new connected bridgedestroy(session_id)-- closes and removes a sessiondestroy_all()-- shuts down all sessionsactive_count()-- number of active sessionsstart_cleanup_task(ttl_seconds, interval_seconds)-- background cleanup
Callback Hooks
Monitor tool calls and loop iterations in real time.
async def on_tool_call(name, args):
print(f"Calling tool: {name} with {args}")
async def on_tool_result(name, result):
print(f"Tool result: {name} -> {result.content} (error={result.is_error})")
async def on_iteration(iteration, messages):
print(f"Loop iteration {iteration}")
config = {
"llm": {"provider": "openai", "model": "gpt-4o", "api_key": "sk-..."},
"mcp": {"url": "http://localhost:3000", "transport": "streamable_http"},
"prompt": {"system": "Use tools when needed."},
"loop": {
"max_iterations": 10,
"on_tool_call": on_tool_call,
"on_tool_result": on_tool_result,
"on_iteration": on_iteration,
},
}
Callbacks can be synchronous or asynchronous. mcpbridge will await coroutines automatically.
Discovery Helpers
After connecting, you can inspect MCP server capabilities directly.
async with MCPBridge(config) as bridge:
tools = bridge.list_tools()
for t in tools:
print(t.namespaced_name, t.description)
resources = await bridge.list_resources()
content = await bridge.read_resource("file:///path/to/resource")
prompts = await bridge.list_prompts()
rendered = await bridge.get_prompt("prompt_name", {"arg": "value"})
Custom LLM Adapter
To add a provider that mcpbridge does not support out of the box, subclass
BaseLLMAdapter and implement the required methods.
from mcpbridge.adapters.base import BaseLLMAdapter, ToolCall, ToolResult
class MyAdapter(BaseLLMAdapter):
@property
def provider_name(self) -> str:
return "my_provider"
@property
def supports_parallel_tool_calls(self) -> bool:
return True
@property
def supports_system_prompt(self) -> bool:
return True
async def chat(self, messages, tools, system=None, **kwargs):
# Call your provider and return the raw response.
...
def extract_tool_calls(self, response) -> list[ToolCall]:
# Parse tool calls from the raw response.
...
def extract_text(self, response) -> str:
# Extract the final assistant text.
...
def is_done(self, response) -> bool:
# True when there are no pending tool calls.
...
def append_tool_results(self, messages, response, tool_results):
# Insert tool results into the message history.
...
def count_tokens(self, messages, system) -> int:
# Best-effort token estimate for history trimming.
...
The adapter contract is simple: append_tool_results() must produce a message
history that chat() can accept on the next call. Everything else follows.
Error Handling
mcpbridge defines a structured exception hierarchy. Every exception inherits
from MCPBridgeError, so you can catch broadly or narrowly.
Configuration
| Exception | When |
|---|---|
| ConfigValidationError | Invalid config dict |
| ProviderNotFoundError | Unknown llm.provider value |
| APIKeyMissingError | No API key found (config or env var) |
Transport
| Exception | When |
|---|---|
| TransportConnectionError | Cannot connect to MCP server |
| TransportDisconnectedError | Connection dropped mid-session |
| TransportTimeoutError | MCP request timed out |
MCP Protocol
| Exception | When |
|---|---|
| MCPProtocolError | Malformed JSON-RPC or unexpected server response |
| MCPToolNotFoundError | Tool name not in discovered registry |
| MCPToolCallError | MCP server returned an error for tools/call |
| MCPSchemaValidationError | Tool arguments failed JSON Schema validation |
LLM
| Exception | When |
|---|---|
| LLMRateLimitError | Provider returned 429 |
| LLMAuthError | Provider authentication failed |
| LLMContextLengthError | Context window exceeded |
| ToolsNotSupportedError | Model does not support tool calling |
Loop
| Exception | When |
|---|---|
| LoopTimeoutError | Total token budget exceeded |
| BridgeInUseError | Concurrent run() on the same bridge instance |
| SessionLimitError | SessionManager exceeded max_sessions |
Note: max_iterations no longer raises an exception. When the loop reaches the
configured limit, it returns a LoopResult with finish_reason="max_iterations"
and best-effort text (extracted from the last LLM response or the most recent
tool result). This prevents service crashes when a model gets stuck in a
tool-calling loop.
Best Practices
Keep max_iterations reasonable. A value between 5 and 15 covers most
real-world tool-calling workflows. If the model consistently hits the limit,
the system prompt likely needs to be more explicit about when to stop calling
tools.
Use error_strategy: "return_error_to_llm" in production. This is the
default. When a tool call fails (schema mismatch, timeout, server error), the
error message is returned to the LLM as a tool result so it can decide how to
recover. The "raise" strategy is better for development and debugging.
Set persist_history: false for stateless endpoints. If each request is
independent, disable history to avoid unbounded memory growth. For
conversational use cases, set max_history_tokens to a value that fits within
your model's context window.
Use namespace_strategy: "prefix" for multi-server setups. This is the
default and prevents tool name collisions between servers. The LLM sees tool
names like serverA__search and serverB__search and can distinguish them.
Authenticate MCP servers via mcp.headers or mcp.env. Do not embed
credentials in URLs. For HTTP transports, use the Authorization header. For
stdio transports, pass tokens through environment variables.
Use the on_tool_call and on_tool_result callbacks for logging and
observability. They fire synchronously during the loop and give you full
visibility into what the model is doing without modifying the loop behavior.
Check result.finish_reason after every run() call. A value of "done"
means the model produced a final answer. A value of "max_iterations" means
the loop was capped before the model finished. Your application should handle
both cases.
Prefer short model aliases for readability. Instead of writing
"claude-opus-4-20250514", write "opus". mcpbridge resolves aliases
automatically (see Model Aliases).
Do not share a single MCPBridge instance across concurrent requests.
Each run() call acquires an internal lock. For concurrent users, use
SessionManager which creates one bridge per session.
LoopResult Reference
bridge.run() returns a LoopResult dataclass:
| Field | Type | Description |
|---|---|---|
| text | str | Final assistant text (or best-effort fallback) |
| tool_calls_made | list[dict] | Each entry: {id, name, arguments, result, is_error} |
| iterations | int | Number of tool-calling iterations completed |
| total_input_tokens | int | Best-effort input token estimate |
| total_output_tokens | int | Best-effort output token estimate |
| finish_reason | str | "done" or "max_iterations" |
Model Aliases
Short aliases can be used in place of full model identifiers.
| Provider | Alias | Resolves to |
|---|---|---|
| anthropic | opus | claude-opus-4-20250514 |
| anthropic | sonnet | claude-sonnet-4-20250514 |
| anthropic | haiku | claude-haiku-4-5-20251001 |
| openai | gpt4o | gpt-4o |
| openai | gpt4o-mini | gpt-4o-mini |
| openai | o3 | o3 |
| openai | o3-mini | o3-mini |
| openai | o4-mini | o4-mini |
| gemini | flash | gemini-1.5-flash |
| gemini | flash2 | gemini-2.0-flash |
| gemini | pro | gemini-1.5-pro |
| gemini | pro2 | gemini-2.0-pro |
| mistral | large | mistral-large-latest |
| mistral | small | mistral-small-latest |
| groq | llama3 | llama-3.3-70b-versatile |
| groq | llama3-small | llama-3.1-8b-instant |
| cohere | command | command-r-plus |
| together | llama3 | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo |
If the model string is not recognized as an alias, it is passed through as-is.
Limitations and Roadmap
-
Token streaming is not implemented. The
streamparameter is accepted and will not raise an error, but the loop runs in non-streaming mode. A future version will add incremental token delivery via async generators. -
LLM-level retry/backoff is not built in. If the LLM provider returns 429 or 5xx, the adapter raises immediately. Rate-limit retry logic should be handled at the application level or in a custom adapter.
-
History trimming is best-effort. Token estimates use
tiktokenwhen available (OpenAI family) and fall back to a roughchars / 4heuristic for other providers.
Running Tests
pip install "mcpbridge-ai[dev]"
pytest -q
The test suite uses fakes and mocks for all external dependencies (LLM providers, MCP servers). No API keys or running servers are needed.
Related Links
MCP Specification
- Architecture: https://modelcontextprotocol.io/docs/concepts/architecture
- Transports: https://modelcontextprotocol.io/docs/concepts/transports
- Tools: https://modelcontextprotocol.io/docs/concepts/tools
- Resources: https://modelcontextprotocol.io/docs/concepts/resources
- Prompts: https://modelcontextprotocol.io/docs/concepts/prompts
- Full spec: https://spec.modelcontextprotocol.io/specification/2024-11-05/
LLM Provider Documentation
- Anthropic Messages API: https://docs.anthropic.com/en/api/messages
- Anthropic Tool Use: https://docs.anthropic.com/en/docs/build-with-claude/tool-use
- OpenAI Chat Completions: https://platform.openai.com/docs/api-reference/chat/create
- OpenAI Function Calling: https://platform.openai.com/docs/guides/function-calling
- Google Gemini: https://ai.google.dev/api/generate-content
- Gemini Function Calling: https://ai.google.dev/gemini-api/docs/function-calling
- Mistral Chat: https://docs.mistral.ai/api/#tag/chat
- Mistral Function Calling: https://docs.mistral.ai/capabilities/function_calling/
- Cohere Chat: https://docs.cohere.com/reference/chat
- Cohere Tool Use: https://docs.cohere.com/docs/tool-use
- Groq: https://console.groq.com/docs/openai
- Groq Tool Use: https://console.groq.com/docs/tool-use
- Ollama API: https://github.com/ollama/ollama/blob/main/docs/api.md
- Together AI: https://docs.together.ai/docs/chat-overview
- AWS Bedrock Converse: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html
- Azure OpenAI: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference
License
MIT