Progressive MCP Server Loading Proxy

Theoretical reference / 理论参考: https://www.anthropic.com/engineering/code-execution-with-mcp

Further write-ups and updates are published on WeChat Official Account VibeStudy — follow for more discussions. 更多介绍与后续更新同步发布在微信公众号 VibeStudy，欢迎关注交流。

English

A stdio MCP server that acts as a proxy in front of other MCP servers, implementing progressive disclosure of tool definitions via a local filesystem cache.

Instead of loading all tool definitions into the model's context upfront, agents discover and load tools on demand — reducing token usage by up to 98%.

Motivation

MCP servers inject their full tool schemas into the model context at session start. Unused tools still consume tokens. When multiple servers are registered, this overhead compounds linearly.

This proxy adds a context-management layer: the agent registers only one server (mcp-proxy), while all real MCP servers are managed under the proxy's config directory. Tool schemas stay out of context until explicitly requested.

How it works

Your prompt
  └── Claude (discovers and calls tools via 5 meta-tools)
        └── mcp-proxy (stdio MCP server)
              └── Real MCP servers (tavily, markitdown, ...)

On startup, the proxy fetches all tool definitions from real MCP servers and writes them to ~/.mcp-proxy/cache/<server>/<tool>.json. At runtime, the agent reads from cache on demand via meta-tools; calls are proxied to the real server for execution.

The cache persists across restarts. Real servers are only contacted on first run or when refresh_cache is called.

Direct mount vs. mcp-proxy:

Direct mount: [session start] → schema permanently occupies context → tokens consumed whether tool is used or not
mcp-proxy: [session start] → only 5 meta-tools → schema fetched on demand → done

Installation

This project uses flat Python modules (no package directory), so you need to tell hatchling which files to ship. The pyproject.toml already includes this configuration.

With uv (recommended)

cd path/to/this/repo
uv venv .venv
uv pip install -e .

uv pip install -e . reads pyproject.toml, registers the current directory as a Python package, and generates a mcp-proxy entry script at .venv/bin/mcp-proxy pointing to the main_sync function in proxy.py. The -e (editable) flag means source files are referenced directly — no reinstall needed after code changes.

The mcp-proxy executable will be at .venv/bin/mcp-proxy.

With pip

pip install -e .

Note: On macOS with Homebrew Python, use pip install -e . --user or use uv instead to avoid PEP 668 errors.

Configuration

1. Create the proxy config

mkdir -p ~/.mcp-proxy

Create ~/.mcp-proxy/config.json and fill in your servers:

{
  "servers": {
    "my-server": {
      "command": "uvx",
      "args": ["my-mcp-server"],
      "env": {
        "API_KEY": "your-key-here"
      }
    }
  }
}

The config format mirrors the MCP server definition format used in Claude Code's settings.json and ~/.claude.json, so you can copy entries directly from there.

Example — proxying Tavily MCP:

{
  "servers": {
    "tavily": {
      "command": "npx",
      "args": ["-y", "tavily-mcp@0.1.2"],
      "env": {
        "TAVILY_API_KEY": "your-tavily-api-key"
      }
    }
  }
}

2. Verify the cache initializes

Run the proxy once to confirm it can reach your configured servers and populate the cache:

.venv/bin/mcp-proxy &
sleep 10
kill %1

You should see output like:

[proxy] Caching tavily...
[proxy] tavily: 2 tools cached

The cache is written to ~/.mcp-proxy/cache/.

3. Register with Claude Code

Add to ~/.claude.json under mcpServers, using the absolute path to the venv executable:

{
  "mcpServers": {
    "mcp-proxy": {
      "command": "/absolute/path/to/repo/.venv/bin/mcp-proxy",
      "args": [],
      "env": {}
    }
  }
}

Note: Use ~/.claude.json, not ~/.claude/settings.json. The latter's mcpServers field is not used for MCP server registration.

Once mcp-proxy is registered in ~/.claude.json, all future MCP servers are added exclusively in ~/.mcp-proxy/config.json — no changes to Claude Code config required.

Configuration files reference

| File | Purpose | |------|---------| | ~/.mcp-proxy/config.json | Backend server list (command, args, env). Only file you need to edit when adding new servers. | | ~/.mcp-proxy/cache/ | Cached tool definitions, reused across restarts | | ~/.claude.json | Registers mcp-proxy itself with Claude Code (one-time setup) |

Meta-tools exposed to the agent

| Tool | Trigger example | Description | |------|----------------|-------------| | list_servers | "What MCP servers do you have?" | List all proxied servers | | list_tools(server) | "What can tavily do?" | List tool names + short descriptions (low token cost, read from local cache) | | read_tool(server, tool) | "Show me the full schema for tavily-search" | Read full input schema for a tool | | call_tool(server, tool, args) | "Search for X using tavily" | Execute a tool | | refresh_cache(server?) | "Refresh tavily's cache" | Force re-fetch definitions from a server or all servers |

Claude Code only has these 5 tool definitions in context. The real schemas for tavily, markitdown, etc. are never in context until you explicitly call read_tool.

Recommended agent workflow

list_servers
  → list_tools("target-server")
    → read_tool("target-server", "specific-tool")   ← required on first use
      → call_tool("target-server", "specific-tool", { ... })

This way only the definitions the agent actually needs enter the context window.

Telling the agent about the workflow

The proxy only exposes 5 meta-tools — the agent has no direct knowledge of the underlying servers or their tools. Add the following to your project's CLAUDE.md or system prompt:

## MCP tool usage

All external tools are accessed through a proxy. Do not assume any tool exists directly.
Follow this workflow whenever you need to use an external tool:

1. Call list_servers to see available servers.
2. Call list_tools(<server>) to browse tool names and short descriptions.
3. Call read_tool(<server>, <tool>) to get the full input schema before calling.
4. Call call_tool(<server>, <tool>, <args>) to execute.

Never skip step 3 — always read the schema before calling a tool you haven't used yet in this session.

Full call trace example — "Search for X using tavily"

Your prompt
│
▼
Claude Code (context only has 5 meta-tools)
│  decides search is needed → selects call_tool
▼
MCP protocol → mcp-proxy process
call_tool({ server: "tavily", tool: "tavily-search", args: { query: "X" } })
│
▼  proxy.py: handle_call_tool()
1. load_config()  ← reads ~/.mcp-proxy/config.json, gets tavily command/args/env
2. connector.call_tool(server_config, "tavily-search", args)
│
▼  connector.py: _open_session()
3. starts subprocess via StdioServerParameters:
   npx -y tavily-mcp@0.1.2  (env injects TAVILY_API_KEY)
4. ClientSession.initialize()  ← MCP handshake
5. session.call_tool("tavily-search", { query: "X" })
│
▼  real tavily-mcp process
6. calls Tavily API (HTTPS)
7. returns result
│
▼  returns via same path
connector → proxy.py → MCP protocol → Claude Code context

Note: Each call to a real MCP server opens an async with session and closes it when done.

Cache structure

~/.mcp-proxy/
  config.json              ← backend server config (only file you need to edit)
  cache/
    servers.json           ← server index + hash
    tavily/
      index.json           ← source for list_tools
      tavily-search.json   ← source for read_tool
      tavily-extract.json
    io.github.microsoft/markitdown/
      index.json
      ...

Currently configured servers (example)

| Server | Tool | Purpose | |--------|------|---------| | tavily | tavily-search | Web search | | tavily | tavily-extract | Extract content from a URL | | io.github.microsoft/markitdown | (query with list_tools) | Convert file/URL to Markdown |

Key rules

Before calling a tool for the first time, always run read_tool to fetch its schema, then call_tool.
To add a new MCP server, only edit ~/.mcp-proxy/config.json and run refresh_cache — no changes to Claude Code config.
Cache is reused across sessions — no need to re-fetch definitions every time.

To-do / Known limitations

Real MCP servers are connected via async with (open on call, close when done). Per-call connection overhead has not been benchmarked.
HTTP/SSE transport for MCP servers has not been tested yet. The current implementation assumes stdio.
If rewriting as an HTTP server, FastAPI + uvicorn would be a natural fit for replacing the stdio transport.

中文

一个以 stdio MCP 服务器形式运行的代理，位于其他 MCP 服务器的前面，通过本地文件系统缓存实现工具定义的按需加载。

不再将所有工具定义预先加载进模型 context，而是让 agent 按需发现并加载工具 —— 最多可减少 98% 的 token 消耗。

动机

MCP 服务器会在会话开始时将完整的工具 schema 注入模型 context。即使工具未被使用，也会持续占用 token。当注册了多个服务器时，这种开销会随数量线性叠加。

该代理增加了一个 context 管理中间层：agent 只注册一个服务器（mcp-proxy），所有真实的 MCP 服务器都纳入代理的配置目录统一管理。工具 schema 完全不进入 context，只有被显式请求时才注入。

工作原理

你的提示词
  └── Claude（通过 5 个元工具发现和调用工具）
        └── mcp-proxy（stdio MCP 服务器）
              └── 真实 MCP 服务器（tavily、markitdown……）

启动时，代理从真实 MCP 服务器拉取所有工具定义并写入 ~/.mcp-proxy/cache/<server>/<tool>.json。运行时，agent 通过元工具按需从缓存读取；实际调用被代理转发给真实服务器执行。

缓存跨会话持久化。真实服务器仅在首次运行或调用 refresh_cache 时才会被联系。

直接挂载 vs. mcp-proxy：

直接挂载： [会话启动] → schema 永久占位 → 无论是否使用都消耗 token
mcp-proxy： [会话启动] → 仅 5 个元工具 → 按需拉取 schema → 完成

安装

本项目使用扁平 Python 模块（无包目录），需要告知 hatchling 打包哪些文件。pyproject.toml 已包含相关配置。

使用 uv（推荐）

cd path/to/this/repo
uv venv .venv
uv pip install -e .

uv pip install -e . 读取 pyproject.toml，将当前目录注册为 Python 包，并在 .venv/bin/mcp-proxy 生成一个指向 proxy.py 中 main_sync 函数的可执行入口脚本。-e（editable）模式意味着源文件被直接引用，修改代码后无需重新 install。

mcp-proxy 可执行文件位于 .venv/bin/mcp-proxy。

使用 pip

pip install -e .

注意： 在使用 Homebrew Python 的 macOS 上，请使用 pip install -e . --user，或改用 uv，以避免 PEP 668 错误。

配置

1. 创建代理配置

mkdir -p ~/.mcp-proxy

创建 ~/.mcp-proxy/config.json，填入你的服务器：

{
  "servers": {
    "my-server": {
      "command": "uvx",
      "args": ["my-mcp-server"],
      "env": {
        "API_KEY": "your-key-here"
      }
    }
  }
}

配置格式与 Claude Code 的 settings.json 和 ~/.claude.json 中的 MCP 服务器定义格式一致，可直接复制粘贴。

示例 —— 代理 Tavily MCP：

{
  "servers": {
    "tavily": {
      "command": "npx",
      "args": ["-y", "tavily-mcp@0.1.2"],
      "env": {
        "TAVILY_API_KEY": "your-tavily-api-key"
      }
    }
  }
}

2. 验证缓存初始化

运行一次代理，确认它能连接到配置的服务器并填充缓存：

.venv/bin/mcp-proxy &
sleep 10
kill %1

你应该看到类似输出：

[proxy] Caching tavily...
[proxy] tavily: 2 tools cached

缓存写入 ~/.mcp-proxy/cache/。

3. 注册到 Claude Code

在 ~/.claude.json 的 mcpServers 下添加，使用 venv 可执行文件的绝对路径：

{
  "mcpServers": {
    "mcp-proxy": {
      "command": "/absolute/path/to/repo/.venv/bin/mcp-proxy",
      "args": [],
      "env": {}
    }
  }
}

这里的 command 就是 install 后生成的入口脚本，Claude Code 启动时直接执行，项目就以 stdio MCP server 的形式跑起来了。

注意： 使用 ~/.claude.json，而非 ~/.claude/settings.json。后者的 mcpServers 字段不用于 MCP 服务器注册。

mcp-proxy 在 ~/.claude.json 中注册一次后，所有后续 MCP 服务器只需在 ~/.mcp-proxy/config.json 中添加 —— 无需再修改 Claude Code 配置。

配置文件说明

| 文件 | 用途 | |------|------| | ~/.mcp-proxy/config.json | 后端服务器列表（command、args、env）。添加新服务器时唯一需要编辑的文件。 | | ~/.mcp-proxy/cache/ | 缓存的工具定义，跨会话复用 | | ~/.claude.json | 将 mcp-proxy 本身注册到 Claude Code（一次性配置） |

暴露给 Agent 的元工具

| 工具 | 触发示例 | 说明 | |------|---------|------| | list_servers | "你现在有哪些 MCP 服务器？" | 列出所有被代理的服务器 | | list_tools(server) | "tavily 能做什么？" | 列出工具名称 + 简短描述（低 token 消耗，从本地缓存读取） | | read_tool(server, tool) | "告诉我 tavily-search 的完整参数格式" | 读取工具完整 JSON Schema | | call_tool(server, tool, args) | "用 tavily 搜索 XXX" | 执行工具 | | refresh_cache(server?) | "刷新 tavily 的缓存" | 强制从服务器重新拉取定义 |

Claude Code 的 context 里只有这 5 个工具的定义。tavily、markitdown 等真实工具的 schema 完全不在 context 里，直到显式调用 read_tool 才会注入。

这正是这个代理节省 token 的核心机制：直接注册 tavily 会一次性把所有工具 schema 注入 context，而 mcp-proxy 方案把真实 schema 挡在 context 门外，按需拉取。

告知 Agent 工作流

代理只暴露 5 个元工具 —— agent 对底层服务器及其工具一无所知。请将以下内容添加到项目的 CLAUDE.md 或系统提示中：

## MCP 工具使用方式

所有外部工具通过代理访问，不要假设任何工具直接存在。
每次需要使用外部工具时，按以下流程操作：

1. 调用 list_servers 查看可用服务器。
2. 调用 list_tools(<server>) 浏览工具名称和简短描述。
3. 调用 read_tool(<server>, <tool>) 获取完整 input schema。
4. 调用 call_tool(<server>, <tool>, <args>) 执行。

不要跳过第 3 步 —— 在本次会话中首次调用某工具前，必须先读取其 schema。

完整调用链路示例 —— "用 tavily 搜索 XXX"

你的提示词
│
▼
Claude Code（context 里只有 5 个元工具）
│  判断需要搜索 → 选择 call_tool
▼
MCP 协议调用 → mcp-proxy 进程
call_tool({ server: "tavily", tool: "tavily-search", args: { query: "XXX" } })
│
▼  proxy.py: handle_call_tool()
1. load_config()  ← 读 ~/.mcp-proxy/config.json，取出 tavily 的 command/args/env
2. connector.call_tool(server_config, "tavily-search", args)
│
▼  connector.py: _open_session()
3. 用 StdioServerParameters 启动子进程：
   npx -y tavily-mcp@0.1.2（env 注入 TAVILY_API_KEY）
4. ClientSession.initialize()  ← MCP 握手
5. session.call_tool("tavily-search", { query: "XXX" })
│
▼  真实 tavily-mcp 进程
6. 调用 Tavily API（HTTPS）
7. 返回结果
│
▼  原路返回
connector → proxy.py → MCP 协议 → Claude Code context

注意： 每次调用真实 MCP 服务器都会以 async with 方式开启会话，用完即关。

缓存文件结构

~/.mcp-proxy/
  config.json              ← 后端服务器配置（唯一需要编辑的文件）
  cache/
    servers.json           ← 服务器索引 + hash
    tavily/
      index.json           ← list_tools 读取源
      tavily-search.json   ← read_tool 读取源
      tavily-extract.json
    io.github.microsoft/markitdown/
      index.json
      ...

当前配置的服务器（示例）

| 服务器 | 工具 | 用途 | |--------|------|------| | tavily | tavily-search | 网络搜索 | | tavily | tavily-extract | 提取指定 URL 的网页内容 | | io.github.microsoft/markitdown | (需 list_tools 查询) | 文件/URL 转 Markdown |

关键规则

首次调用新工具前，必须先 read_tool 获取 schema，再 call_tool。
添加新 MCP 服务器，只需编辑 ~/.mcp-proxy/config.json，然后 refresh_cache，无需改 Claude Code 配置。
缓存跨会话复用，不必每次重新拉取定义。

待办 / 已知限制

真实 MCP 服务器通过 async with 连接（调用时开启，完成后关闭）。单次调用的连接开销尚未做基准测试。
HTTP/SSE 传输方式的 MCP 服务器尚未测试，当前实现假设使用 stdio。
若改写为 HTTP 服务器，可考虑用 FastAPI + uvicorn 替代 stdio 传输层。