Consuming MCP servers with SKILL style via registering mcp servers under this mcp proxy server.
Progressive MCP Server Loading Proxy
Theoretical reference / 理论参考: https://www.anthropic.com/engineering/code-execution-with-mcp
Further write-ups and updates are published on WeChat Official Account VibeStudy — follow for more discussions. 更多介绍与后续更新同步发布在微信公众号 VibeStudy,欢迎关注交流。
English
A stdio MCP server that acts as a proxy in front of other MCP servers, implementing progressive disclosure of tool definitions via a local filesystem cache.
Instead of loading all tool definitions into the model's context upfront, agents discover and load tools on demand — reducing token usage by up to 98%.
Motivation
MCP servers inject their full tool schemas into the model context at session start. Unused tools still consume tokens. When multiple servers are registered, this overhead compounds linearly.
This proxy adds a context-management layer: the agent registers only one server (mcp-proxy), while all real MCP servers are managed under the proxy's config directory. Tool schemas stay out of context until explicitly requested.
How it works
Your prompt
└── Claude (discovers and calls tools via 5 meta-tools)
└── mcp-proxy (stdio MCP server)
└── Real MCP servers (tavily, markitdown, ...)
On startup, the proxy fetches all tool definitions from real MCP servers and writes them to ~/.mcp-proxy/cache/<server>/<tool>.json. At runtime, the agent reads from cache on demand via meta-tools; calls are proxied to the real server for execution.
The cache persists across restarts. Real servers are only contacted on first run or when refresh_cache is called.
Direct mount vs. mcp-proxy:
- Direct mount:
[session start]→ schema permanently occupies context → tokens consumed whether tool is used or not - mcp-proxy:
[session start]→ only 5 meta-tools → schema fetched on demand → done
Installation
This project uses flat Python modules (no package directory), so you need to tell hatchling which files to ship. The pyproject.toml already includes this configuration.
With uv (recommended)
cd path/to/this/repo
uv venv .venv
uv pip install -e .
uv pip install -e . reads pyproject.toml, registers the current directory as a Python package, and generates a mcp-proxy entry script at .venv/bin/mcp-proxy pointing to the main_sync function in proxy.py. The -e (editable) flag means source files are referenced directly — no reinstall needed after code changes.
The mcp-proxy executable will be at .venv/bin/mcp-proxy.
With pip
pip install -e .
Note: On macOS with Homebrew Python, use
pip install -e . --useror use uv instead to avoid PEP 668 errors.
Configuration
1. Create the proxy config
mkdir -p ~/.mcp-proxy
Create ~/.mcp-proxy/config.json and fill in your servers:
{
"servers": {
"my-server": {
"command": "uvx",
"args": ["my-mcp-server"],
"env": {
"API_KEY": "your-key-here"
}
}
}
}
The config format mirrors the MCP server definition format used in Claude Code's settings.json and ~/.claude.json, so you can copy entries directly from there.
Example — proxying Tavily MCP:
{
"servers": {
"tavily": {
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": "your-tavily-api-key"
}
}
}
}
2. Verify the cache initializes
Run the proxy once to confirm it can reach your configured servers and populate the cache:
.venv/bin/mcp-proxy &
sleep 10
kill %1
You should see output like:
[proxy] Caching tavily...
[proxy] tavily: 2 tools cached
The cache is written to ~/.mcp-proxy/cache/.
3. Register with Claude Code
Add to ~/.claude.json under mcpServers, using the absolute path to the venv executable:
{
"mcpServers": {
"mcp-proxy": {
"command": "/absolute/path/to/repo/.venv/bin/mcp-proxy",
"args": [],
"env": {}
}
}
}
Note: Use
~/.claude.json, not~/.claude/settings.json. The latter'smcpServersfield is not used for MCP server registration.
Once mcp-proxy is registered in ~/.claude.json, all future MCP servers are added exclusively in ~/.mcp-proxy/config.json — no changes to Claude Code config required.
Configuration files reference
| File | Purpose |
|------|---------|
| ~/.mcp-proxy/config.json | Backend server list (command, args, env). Only file you need to edit when adding new servers. |
| ~/.mcp-proxy/cache/ | Cached tool definitions, reused across restarts |
| ~/.claude.json | Registers mcp-proxy itself with Claude Code (one-time setup) |
Meta-tools exposed to the agent
| Tool | Trigger example | Description |
|------|----------------|-------------|
| list_servers | "What MCP servers do you have?" | List all proxied servers |
| list_tools(server) | "What can tavily do?" | List tool names + short descriptions (low token cost, read from local cache) |
| read_tool(server, tool) | "Show me the full schema for tavily-search" | Read full input schema for a tool |
| call_tool(server, tool, args) | "Search for X using tavily" | Execute a tool |
| refresh_cache(server?) | "Refresh tavily's cache" | Force re-fetch definitions from a server or all servers |
Claude Code only has these 5 tool definitions in context. The real schemas for tavily, markitdown, etc. are never in context until you explicitly call read_tool.
Recommended agent workflow
list_servers
→ list_tools("target-server")
→ read_tool("target-server", "specific-tool") ← required on first use
→ call_tool("target-server", "specific-tool", { ... })
This way only the definitions the agent actually needs enter the context window.
Telling the agent about the workflow
The proxy only exposes 5 meta-tools — the agent has no direct knowledge of the underlying servers or their tools. Add the following to your project's CLAUDE.md or system prompt:
## MCP tool usage
All external tools are accessed through a proxy. Do not assume any tool exists directly.
Follow this workflow whenever you need to use an external tool:
1. Call list_servers to see available servers.
2. Call list_tools(<server>) to browse tool names and short descriptions.
3. Call read_tool(<server>, <tool>) to get the full input schema before calling.
4. Call call_tool(<server>, <tool>, <args>) to execute.
Never skip step 3 — always read the schema before calling a tool you haven't used yet in this session.
Full call trace example — "Search for X using tavily"
Your prompt
│
▼
Claude Code (context only has 5 meta-tools)
│ decides search is needed → selects call_tool
▼
MCP protocol → mcp-proxy process
call_tool({ server: "tavily", tool: "tavily-search", args: { query: "X" } })
│
▼ proxy.py: handle_call_tool()
1. load_config() ← reads ~/.mcp-proxy/config.json, gets tavily command/args/env
2. connector.call_tool(server_config, "tavily-search", args)
│
▼ connector.py: _open_session()
3. starts subprocess via StdioServerParameters:
npx -y tavily-mcp@0.1.2 (env injects TAVILY_API_KEY)
4. ClientSession.initialize() ← MCP handshake
5. session.call_tool("tavily-search", { query: "X" })
│
▼ real tavily-mcp process
6. calls Tavily API (HTTPS)
7. returns result
│
▼ returns via same path
connector → proxy.py → MCP protocol → Claude Code context
Note: Each call to a real MCP server opens an
async withsession and closes it when done.
Cache structure
~/.mcp-proxy/
config.json ← backend server config (only file you need to edit)
cache/
servers.json ← server index + hash
tavily/
index.json ← source for list_tools
tavily-search.json ← source for read_tool
tavily-extract.json
io.github.microsoft/markitdown/
index.json
...
Currently configured servers (example)
| Server | Tool | Purpose | |--------|------|---------| | tavily | tavily-search | Web search | | tavily | tavily-extract | Extract content from a URL | | io.github.microsoft/markitdown | (query with list_tools) | Convert file/URL to Markdown |
Key rules
- Before calling a tool for the first time, always run
read_toolto fetch its schema, thencall_tool. - To add a new MCP server, only edit
~/.mcp-proxy/config.jsonand runrefresh_cache— no changes to Claude Code config. - Cache is reused across sessions — no need to re-fetch definitions every time.
To-do / Known limitations
- Real MCP servers are connected via
async with(open on call, close when done). Per-call connection overhead has not been benchmarked. - HTTP/SSE transport for MCP servers has not been tested yet. The current implementation assumes stdio.
- If rewriting as an HTTP server, FastAPI + uvicorn would be a natural fit for replacing the stdio transport.
中文
一个以 stdio MCP 服务器形式运行的代理,位于其他 MCP 服务器的前面,通过本地文件系统缓存实现工具定义的按需加载。
不再将所有工具定义预先加载进模型 context,而是让 agent 按需发现并加载工具 —— 最多可减少 98% 的 token 消耗。
动机
MCP 服务器会在会话开始时将完整的工具 schema 注入模型 context。即使工具未被使用,也会持续占用 token。当注册了多个服务器时,这种开销会随数量线性叠加。
该代理增加了一个 context 管理中间层:agent 只注册一个服务器(mcp-proxy),所有真实的 MCP 服务器都纳入代理的配置目录统一管理。工具 schema 完全不进入 context,只有被显式请求时才注入。
工作原理
你的提示词
└── Claude(通过 5 个元工具发现和调用工具)
└── mcp-proxy(stdio MCP 服务器)
└── 真实 MCP 服务器(tavily、markitdown……)
启动时,代理从真实 MCP 服务器拉取所有工具定义并写入 ~/.mcp-proxy/cache/<server>/<tool>.json。运行时,agent 通过元工具按需从缓存读取;实际调用被代理转发给真实服务器执行。
缓存跨会话持久化。真实服务器仅在首次运行或调用 refresh_cache 时才会被联系。
直接挂载 vs. mcp-proxy:
- 直接挂载:
[会话启动]→ schema 永久占位 → 无论是否使用都消耗 token - mcp-proxy:
[会话启动]→ 仅 5 个元工具 → 按需拉取 schema → 完成
安装
本项目使用扁平 Python 模块(无包目录),需要告知 hatchling 打包哪些文件。pyproject.toml 已包含相关配置。
使用 uv(推荐)
cd path/to/this/repo
uv venv .venv
uv pip install -e .
uv pip install -e . 读取 pyproject.toml,将当前目录注册为 Python 包,并在 .venv/bin/mcp-proxy 生成一个指向 proxy.py 中 main_sync 函数的可执行入口脚本。-e(editable)模式意味着源文件被直接引用,修改代码后无需重新 install。
mcp-proxy 可执行文件位于 .venv/bin/mcp-proxy。
使用 pip
pip install -e .
注意: 在使用 Homebrew Python 的 macOS 上,请使用
pip install -e . --user,或改用 uv,以避免 PEP 668 错误。
配置
1. 创建代理配置
mkdir -p ~/.mcp-proxy
创建 ~/.mcp-proxy/config.json,填入你的服务器:
{
"servers": {
"my-server": {
"command": "uvx",
"args": ["my-mcp-server"],
"env": {
"API_KEY": "your-key-here"
}
}
}
}
配置格式与 Claude Code 的 settings.json 和 ~/.claude.json 中的 MCP 服务器定义格式一致,可直接复制粘贴。
示例 —— 代理 Tavily MCP:
{
"servers": {
"tavily": {
"command": "npx",
"args": ["-y", "tavily-mcp@0.1.2"],
"env": {
"TAVILY_API_KEY": "your-tavily-api-key"
}
}
}
}
2. 验证缓存初始化
运行一次代理,确认它能连接到配置的服务器并填充缓存:
.venv/bin/mcp-proxy &
sleep 10
kill %1
你应该看到类似输出:
[proxy] Caching tavily...
[proxy] tavily: 2 tools cached
缓存写入 ~/.mcp-proxy/cache/。
3. 注册到 Claude Code
在 ~/.claude.json 的 mcpServers 下添加,使用 venv 可执行文件的绝对路径:
{
"mcpServers": {
"mcp-proxy": {
"command": "/absolute/path/to/repo/.venv/bin/mcp-proxy",
"args": [],
"env": {}
}
}
}
这里的 command 就是 install 后生成的入口脚本,Claude Code 启动时直接执行,项目就以 stdio MCP server 的形式跑起来了。
注意: 使用
~/.claude.json,而非~/.claude/settings.json。后者的mcpServers字段不用于 MCP 服务器注册。
mcp-proxy 在 ~/.claude.json 中注册一次后,所有后续 MCP 服务器只需在 ~/.mcp-proxy/config.json 中添加 —— 无需再修改 Claude Code 配置。
配置文件说明
| 文件 | 用途 |
|------|------|
| ~/.mcp-proxy/config.json | 后端服务器列表(command、args、env)。添加新服务器时唯一需要编辑的文件。 |
| ~/.mcp-proxy/cache/ | 缓存的工具定义,跨会话复用 |
| ~/.claude.json | 将 mcp-proxy 本身注册到 Claude Code(一次性配置) |
暴露给 Agent 的元工具
| 工具 | 触发示例 | 说明 |
|------|---------|------|
| list_servers | "你现在有哪些 MCP 服务器?" | 列出所有被代理的服务器 |
| list_tools(server) | "tavily 能做什么?" | 列出工具名称 + 简短描述(低 token 消耗,从本地缓存读取) |
| read_tool(server, tool) | "告诉我 tavily-search 的完整参数格式" | 读取工具完整 JSON Schema |
| call_tool(server, tool, args) | "用 tavily 搜索 XXX" | 执行工具 |
| refresh_cache(server?) | "刷新 tavily 的缓存" | 强制从服务器重新拉取定义 |
Claude Code 的 context 里只有这 5 个工具的定义。tavily、markitdown 等真实工具的 schema 完全不在 context 里,直到显式调用 read_tool 才会注入。
这正是这个代理节省 token 的核心机制:直接注册 tavily 会一次性把所有工具 schema 注入 context,而 mcp-proxy 方案把真实 schema 挡在 context 门外,按需拉取。
推荐的 Agent 工作流
list_servers
→ list_tools("目标服务器")
→ read_tool("目标服务器", "具体工具") ← 首次使用必做
→ call_tool("目标服务器", "具体工具", { 参数 })
这样只有 agent 实际需要的定义才会进入 context 窗口。
告知 Agent 工作流
代理只暴露 5 个元工具 —— agent 对底层服务器及其工具一无所知。请将以下内容添加到项目的 CLAUDE.md 或系统提示中:
## MCP 工具使用方式
所有外部工具通过代理访问,不要假设任何工具直接存在。
每次需要使用外部工具时,按以下流程操作:
1. 调用 list_servers 查看可用服务器。
2. 调用 list_tools(<server>) 浏览工具名称和简短描述。
3. 调用 read_tool(<server>, <tool>) 获取完整 input schema。
4. 调用 call_tool(<server>, <tool>, <args>) 执行。
不要跳过第 3 步 —— 在本次会话中首次调用某工具前,必须先读取其 schema。
完整调用链路示例 —— "用 tavily 搜索 XXX"
你的提示词
│
▼
Claude Code(context 里只有 5 个元工具)
│ 判断需要搜索 → 选择 call_tool
▼
MCP 协议调用 → mcp-proxy 进程
call_tool({ server: "tavily", tool: "tavily-search", args: { query: "XXX" } })
│
▼ proxy.py: handle_call_tool()
1. load_config() ← 读 ~/.mcp-proxy/config.json,取出 tavily 的 command/args/env
2. connector.call_tool(server_config, "tavily-search", args)
│
▼ connector.py: _open_session()
3. 用 StdioServerParameters 启动子进程:
npx -y tavily-mcp@0.1.2(env 注入 TAVILY_API_KEY)
4. ClientSession.initialize() ← MCP 握手
5. session.call_tool("tavily-search", { query: "XXX" })
│
▼ 真实 tavily-mcp 进程
6. 调用 Tavily API(HTTPS)
7. 返回结果
│
▼ 原路返回
connector → proxy.py → MCP 协议 → Claude Code context
注意: 每次调用真实 MCP 服务器都会以
async with方式开启会话,用完即关。
缓存文件结构
~/.mcp-proxy/
config.json ← 后端服务器配置(唯一需要编辑的文件)
cache/
servers.json ← 服务器索引 + hash
tavily/
index.json ← list_tools 读取源
tavily-search.json ← read_tool 读取源
tavily-extract.json
io.github.microsoft/markitdown/
index.json
...
当前配置的服务器(示例)
| 服务器 | 工具 | 用途 | |--------|------|------| | tavily | tavily-search | 网络搜索 | | tavily | tavily-extract | 提取指定 URL 的网页内容 | | io.github.microsoft/markitdown | (需 list_tools 查询) | 文件/URL 转 Markdown |
关键规则
- 首次调用新工具前,必须先
read_tool获取 schema,再call_tool。 - 添加新 MCP 服务器,只需编辑
~/.mcp-proxy/config.json,然后refresh_cache,无需改 Claude Code 配置。 - 缓存跨会话复用,不必每次重新拉取定义。
待办 / 已知限制
- 真实 MCP 服务器通过
async with连接(调用时开启,完成后关闭)。单次调用的连接开销尚未做基准测试。 - HTTP/SSE 传输方式的 MCP 服务器尚未测试,当前实现假设使用 stdio。
- 若改写为 HTTP 服务器,可考虑用 FastAPI + uvicorn 替代 stdio 传输层。