MCP proxy in front of the GitHub MCP that returns JSON schemas instead of raw responses. Built on json-schema-sketch.
github-mcp-sketch
A local MCP server that proxies the GitHub MCP and returns the JSON's inferred schema instead of the raw response. Adds a query_response tool the agent uses to pull specific fields it actually needs.
The point: most GitHub API responses are 80%+ metadata the agent doesn't read. list_pull_requests for 30 PRs is ~80KB; the agent usually needs 4–5 fields per PR. Returning a schema first lets the model see the shape and decide what to fetch back.
Architecture
flowchart TD
Agent["Agent<br/>(Claude Code, Codex, custom SDK app)"]
Proxy["github-mcp-sketch<br/>• caches raw upstream JSON<br/>• returns schema + cache_id<br/>• serves query_response"]
Upstream["GitHub MCP<br/>api.githubcopilot.com/mcp"]
Agent -- stdio MCP --> Proxy
Proxy -- Streamable HTTPS --> Upstream
The proxy sits between any MCP-speaking agent and GitHub's upstream MCP. Every upstream tool gets forwarded transparently; the proxy intercepts the response, caches the raw JSON in memory, infers a compact schema via json-schema-sketch, and returns cache_id + schema to the agent.
The query loop
The agent doesn't re-fetch the upstream response to extract additional fields. After the first tool call returns the schema, the agent runs a query loop — calling query_response(cache_id, path) as many times as it needs against the locally cached JSON. No upstream round-trip per query.
sequenceDiagram
participant Agent
participant Proxy as github-mcp-sketch
participant Upstream as GitHub MCP (upstream)
Agent->>Proxy: list_issues(...)
Proxy->>Upstream: forwards
Upstream-->>Proxy: raw JSON (~80KB)
Note over Proxy: caches JSON locally
Proxy-->>Agent: schema + cache_id (~600B)
rect rgba(120, 200, 255, 0.15)
Note over Agent,Proxy: query loop — no upstream calls
Agent->>Proxy: query_response(cache_id, "[*].title")
Proxy-->>Agent: extracted titles
Agent->>Proxy: query_response(cache_id, "[*].state")
Proxy-->>Agent: extracted states
Agent->>Proxy: query_response(cache_id, [batch paths])
Proxy-->>Agent: { results: { path: values, ... } }
end
This is what makes schema-first MCPs efficient. The expensive part — pulling JSON over the network — happens once per upstream tool call. The agent's iterative drill-down ("now I need titles", "now I need the closed ones", "now I need the author logins") all runs locally against the cache, and each drill-down only puts the fields the agent asked for into the context.
What the numbers look like
Benchmarked across 13 agentic tasks on facebook/react and kubernetes/kubernetes, Anthropic SDK with prompt caching enabled (matching how Claude Code uses the API). N=3 runs per case per side. Sum of medians across all 13 cases:
| Metric | Baseline (github) | Sketch (github-sketch) | Δ |
|---|---|---|---|
| Final context size (tokens) | 844,136 | 348,292 | −58.7% |
| API cost (Claude Opus 4.7) | $5.59 | $2.73 | −51.2% |
| Total tokens billed | 1.84M | 1.27M | −30.8% |
Where the win lives — the 13 cases grouped by the shape of response they exercise:
| Response shape | Cases | Baseline ctx | Δ context | Δ cost | |---|---|---|---|---| | Single-object fetches (one issue, one PR) | 2 | 34,740 | +3% | +52% | | List endpoints with rich metadata (30+ items, lots of URL/reaction noise) | 3 | 222,827 | −74% | −71% | | Comment-heavy threads (KEP review, long React discussion) | 3 | 387,913 | −69% | −57% | | Multi-step agentic workflows (triage, investigation, drill-down) | 3 | 158,329 | −40% | −27% | | File contents and commit history | 2 | 40,327 | −2% | 0% |
The first and last rows are where the proxy doesn't help — but notice they're also the rows where baseline context is already small (~35–40K tokens). A small percentage loss on a small payload barely moves anything in practice. The big absolute numbers are in tiers 2–4 (158K–388K baseline), which is exactly where the proxy's percentage savings translate into 100K+ tokens of headroom in real sessions. Both honest-loss rows were included intentionally so the data isn't cherry-picked.
Full report and reproducible bench: json-schema-sketch-bench.
How it works
Every upstream tool call (list_issues, pull_request_read, etc.) goes through this pipeline:
- The proxy forwards the call to
https://api.githubcopilot.com/mcpwith the agent-supplied PAT. - The raw JSON response is cached in memory under a generated
cache_id. - The response shape is inferred via
json-schema-sketchinto a compact text-form schema. - The proxy returns
cache_id: gh-1-abc (pass to query_response)followed by the schema. That's the entire tool result the agent sees.
When the agent wants actual values, it calls query_response:
query_response(cache_id="gh-1-abc", path="items[*].title")
Path syntax supports dot/bracket navigation and [*] wildcards. The agent can pass an array of paths to batch multiple field extractions into one tool call:
query_response(cache_id="gh-1-abc", path=["items[*].title", "items[*].state", "items[*].user.login"])
Cache entries live until the proxy process exits. There's no TTL; the design assumes one agent session per proxy process.
Install
Requires Node 20+.
npm install -g github-mcp-sketch
You'll need a GitHub Personal Access Token (fine-grained, read-only is enough). Create one with content and metadata read on whatever repos you'll use it against.
Configuration
All configuration is via environment variables.
| Variable | Required | Default | Purpose |
|---|---|---|---|
| GITHUB_PAT | yes | — | Bearer token forwarded as the Authorization header on every upstream request. |
| UPSTREAM_MCP_URL | no | https://api.githubcopilot.com/mcp | Upstream MCP endpoint. Override for testing or self-hosted GitHub MCPs. |
| CACHE_SIZE | no | 50 | Max number of cached responses kept in memory before LRU eviction. Each cached entry is one upstream tool result. |
| METRICS_CSV | no | (off) | If set to a file path, the proxy appends per-tool-call metrics (raw vs sketched response size, tokens, timing) to that CSV. Off by default — set it explicitly to opt in. |
How the variables get set depends on how you launch the proxy:
- From an MCP host (Claude Code, Claude Desktop, Cline, etc.) — the host passes env vars to the spawned process. In Claude Code that's the
-e KEY=VALUEflag ofclaude mcp add(shown below). Values are stored in the host's MCP config (~/.claude.jsonfor Claude Code). - Direct shell invocation —
GITHUB_PAT=ghp_... github-mcp-sketch, or export the variable in your shell rc. - Running from a clone of the source — copy
.env.example→.env, fill it in. Dotenv loads it on startup. (.envis gitignored.)
Wire into Claude Code
Minimal — just the required PAT:
claude mcp add github-sketch \
-e GITHUB_PAT=<your_pat> \
-- npx github-mcp-sketch
With optional variables — stack additional -e KEY=VALUE flags before the --:
claude mcp add github-sketch \
-e GITHUB_PAT=<your_pat> \
-e CACHE_SIZE=200 \
-e METRICS_CSV=/tmp/github-sketch-metrics.csv \
-- npx github-mcp-sketch
All -e values are persisted in ~/.claude.json and passed to the proxy on every spawn.
Verify the server is registered:
claude mcp list | grep github-sketch
# github-sketch: npx github-mcp-sketch - ✓ Connected
Restart Claude Code so the new server is registered.
Changing variables after wire-up
Claude Code doesn't expose an in-place edit for MCP env vars. To change one:
claude mcp remove github-sketch
claude mcp add github-sketch -e GITHUB_PAT=<new_pat> -- npx github-mcp-sketch
Or edit ~/.claude.json directly under the mcpServers.github-sketch.env block.
Wire into Codex
Add the server to your Codex config at ~/.codex/config.toml.
Minimal — just the required PAT:
[mcp_servers.github-sketch]
command = "npx"
args = ["github-mcp-sketch"]
[mcp_servers.github-sketch.env]
GITHUB_PAT = "<your_pat>"
With optional variables:
[mcp_servers.github-sketch]
command = "npx"
args = ["github-mcp-sketch"]
[mcp_servers.github-sketch.env]
GITHUB_PAT = "<your_pat>"
CACHE_SIZE = "200"
METRICS_CSV = "/tmp/github-sketch-metrics.csv"
If you installed the package globally, you can launch the binary directly instead:
[mcp_servers.github-sketch]
command = "github-mcp-sketch"
args = []
[mcp_servers.github-sketch.env]
GITHUB_PAT = "<your_pat>"
Restart Codex after editing ~/.codex/config.toml so the new MCP server is registered. The server should then appear as github-sketch and expose the upstream GitHub tools plus query_response.
Wire into other MCP hosts
The proxy is a standard stdio MCP — anything that speaks MCP can host it. The general pattern:
- Spawn
npx github-mcp-sketchas a stdio process - Pass env vars (at minimum
GITHUB_PAT) on that spawn
For example, in a custom Anthropic SDK app using @modelcontextprotocol/sdk:
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StdioClientTransport } from "@modelcontextprotocol/sdk/client/stdio.js";
const transport = new StdioClientTransport({
command: "npx",
args: ["github-mcp-sketch"],
env: {
...process.env,
GITHUB_PAT: process.env.GITHUB_PAT!,
CACHE_SIZE: "100", // optional
},
});
const client = new Client({ name: "my-app", version: "0.1.0" });
await client.connect(transport);
Verify:
claude mcp list | grep github-sketch
# github-sketch: node /.../dist/server.js - ✓ Connected
Restart Claude Code so the new server is registered. The proxy exposes the same tool names as the upstream GitHub MCP (list_issues, pull_request_read, get_file_contents, etc.) plus the added query_response.
Tools
The proxy passes through every tool from the upstream GitHub MCP with identical names, descriptions, and input schemas. The agent calls them exactly as it would call the upstream MCP.
The one added tool:
query_response(cache_id, path, max_items?)
Extract values from a response cached by a previous tool call.
Path syntax:
| Path | Returns |
|---|---|
| "" (empty string) | The whole cached value |
| "user.login" | A nested scalar |
| "[0].title" | First array item's field |
| "[*].title" | A field from every array item |
| "items[*].user.login" | Nested field across an array |
Batch mode — pass an array of paths to get multiple fields in one call:
path=["items[*].number", "items[*].title", "items[*].state"]
Returns { results: { "items[*].number": [...], "items[*].title": [...], ... } }. Per-path errors are reported in that path's entry; other paths still succeed.
max_items — optional cap on items returned per wildcard expansion. Defaults to unlimited; the agent only sets it when it explicitly wants a sample.
When it doesn't help
Looking at the per-tier table above:
- Single-object fetches of small objects (one issue, one user) — the agent queries back most of the fields anyway, so the schema + query overhead exceeds the per-call wire savings.
- File contents — raw file text isn't structured, so there's no schema noise to skip. The proxy passes the file through with minor envelope savings only.
- Tasks where the agent really does need every field — almost never happens for list and search endpoints, but is the failure mode if it does.
Known variance: on t3-06 (a 67-comment review thread on a Kubernetes KEP), one of three sketch runs occasionally consumes ~60K context (vs ~39K median) because the agent issues a doubly-nested wildcard query (review_threads[*].comments[*].body) that materializes most of the original payload. The median is still ~56% better than baseline, but this is a real pattern the agent can fall into. The proxy doesn't currently guard against it.
Reproduce the bench
The benchmark is a separate repo: json-schema-sketch-bench.
git clone https://github.com/markadelnawar/json-schema-sketch-bench
cd json-schema-sketch-bench
npm install
cp .env.example .env # add ANTHROPIC_API_KEY and GITHUB_PAT
npm run bench # 13 cases × 2 sides × 3 runs, ~45 min, ~$5-10
npm run report # generates summary.md
Cases, raw CSVs, methodology details, and the final summary.md from a recent run are all checked in.
Architecture
The proxy is a thin stdio MCP server:
- Connects to
https://api.githubcopilot.com/mcpover Streamable HTTP using the agent-providedGITHUB_PAT. - On startup, calls upstream
listTools()and registers each tool with its original name and description. Addsquery_responseto the manifest. - On
tools/call: if the name isquery_response, resolves the path against the cache; otherwise forwards upstream, caches the parsed JSON, infers the schema, and returns the wrapped response.
There's no rate limiting, no retry, no auth refresh. Upstream errors are returned verbatim.
Caveats
- Stateful in memory. Cache entries live until the proxy process exits. Restart Claude Code = empty cache.
- One PAT per process. The proxy reads
GITHUB_PATonce at startup. To rotate, restart. - Schemas drop list-element variance. If items in an array have different shapes, the schema picks a representative one. Rarely a problem for GitHub responses but can be surprising.
- No write-tool optimization. Tools that perform writes (
create_issue,merge_pull_request) work but the response sketching is wasted on small confirmation payloads.
License
MIT. See LICENSE.
Related
json-schema-sketch— the underlying library that infers compact text-form schemas from JSON values.json-schema-sketch-bench— the benchmark harness used to produce the numbers above.- GitHub MCP server — the upstream this proxies.
- Model Context Protocol — the protocol both speak.