prompt-cache-utils

What Is This?

An MCP (Model Context Protocol) server that helps you save money on LLM API calls by optimizing prompt cache hit rates. It works with any LLM provider that supports prompt caching — DeepSeek, Anthropic Claude, OpenAI, etc.

LLM providers cache the prefix of your prompt (system prompt + tools + few-shot examples). When the prefix is byte-identical between calls, those tokens are served from cache at a fraction of the cost (e.g., DeepSeek charges 1/10 for cached tokens). This tool ensures your prefix stays stable and diagnoses why it doesn't.

┌─ CACHED (cheap) ──────────┐  ┌─ UNCACHED (full price) ─┐
│ system + tools + few-shots │  │ conversation history...  │
└────────────────────────────┘  └──────────────────────────┘

Key Features

Prefix Fingerprinting — SHA-256 fingerprints to track prefix stability across calls
Cache Miss Diagnosis — Pinpoint exactly what changed (system prompt, tool list, few-shots)
Token Estimation — Estimate token counts with context window fullness analysis
Auto-Optimization Proxy — A built-in local HTTP proxy that automatically normalizes requests before forwarding to the upstream API (tool deduplication/sorting, dynamic timestamp sanitization, system prompt section reordering)
Usage Tracking — Capture real API usage data via the proxy, show hit-rate trends and cost savings
8 MCP Tools — compute_fingerprint, diagnose_cache_miss, estimate_tokens, optimize_for_cache, cache_health_checklist, context_status, get_model_info, parse_api_usage

Quick Start

# Try without installing
npx prompt-cache-utils

# Or install globally
npm i -g prompt-cache-utils

One-Click Install (recommended)

npx prompt-cache-utils install

This interactive installer will:

Detect Claude Code on your machine
Register the MCP server in ~/.claude.json
Configure environment variables and a Stop hook in ~/.claude/settings.json
Let you choose display language (Chinese/English)

After installation, restart Claude Code to activate.

Manual Configuration

Add to your MCP config:

{
  "mcpServers": {
    "prompt-cache-utils": {
      "type": "stdio",
      "command": "npx",
      "args": ["prompt-cache-utils"]
    }
  }
}

Config file locations:

Claude Code: ~/.claude.json or ~/.claude/settings.json
VS Code: .vscode/mcp.json
Cursor / Copilot CLI / other MCP clients: check their respective docs

How It Works

The project has three core components working together:

1. MCP Server (8 Tools)

The MCP server provides tools that LLM coding assistants (like Claude Code) call automatically:

compute_fingerprint — Computes SHA-256 hashes of the system prompt, tool specs, and few-shot examples. When sessionId is provided, fingerprints are stored in memory and compared across calls to detect changes.
diagnose_cache_miss — Compares two fingerprints and tells you exactly what changed: "system prompt gained 42 chars", "tool X was hot-added", "few-shot order changed".
estimate_tokens — Estimates token count using character-based heuristics (4 chars/token for English, 1.5 for CJK, 3.5 for code). Reports context window fullness with threshold guidance (safe/warning/critical/overflow).
optimize_for_cache — Scans messages and tools for cache-breaking anti-patterns: system messages out of order, unsorted tools, dynamic timestamps, volatile paths.
parse_api_usage — Parses real usage data from the proxy's log, shows hit rate, cost savings, and historical trends.
context_status / cache_health_checklist / get_model_info — All-in-one health checks and reference data.

2. Local Proxy Server

When the MCP server starts, it also launches a local HTTP proxy on port 10101. The proxy:

Intercepts all API requests from Claude Code (via ANTHROPIC_BASE_URL=http://127.0.0.1:10101)
Auto-optimizes the request payload before forwarding to the upstream (DeepSeek, Anthropic, etc.):
- Deduplicates and alphabetically sorts tools
- Normalizes whitespace in system prompts
- Replaces dynamic timestamps with stable placeholders
- Reorders system sections into a canonical order
Captures usage data from the response (supports both streaming SSE and non-streaming)
Stores hit/miss token counts in a ring buffer (~/.prompt-cache-utils/usage-log.json, max 50 entries)

The proxy only applies rewrites for DeepSeek models (where cache savings are most impactful). For other models, it passes requests through unchanged.

3. Stop Hook

A Stop hook registered in ~/.claude/settings.json runs show-stats.js when Claude Code exits. It displays a session summary:

[Auto-Optimize]
Applied: yes
Rules: sort_tools, sanitize_dynamic_system_lines
Prefix FP: a1b2c3d4 -> e5f6g7h8

[Cache Usage]
This round: hit=12,450 miss=1,200 hitRate=91.2%
Last 3 rounds: 89.5% ↑ (+1.7pp)
Session avg: 87.3%   rounds=14
Session saved tokens: 89,235

[Attribution]
Likely helpful: rewrite applied and hit-rate improved (observed correlation, not causal proof).

Uninstall

One-Click Uninstall

npx prompt-cache-utils uninstall

This removes:

MCP server registration from ~/.claude.json
Environment variables and Stop hook from ~/.claude/settings.json

Manual Uninstall

Remove the prompt-cache-utils entry from:

~/.claude.json → mcpServers.prompt-cache-utils
~/.claude/settings.json → env.ANTHROPIC_BASE_URL, env.PROMPT_CACHE_UTILS_UPSTREAM, and the Stop hook referencing show-stats.js

Remove Data

rm -rf ~/.prompt-cache-utils

Uninstall npm Package

npm uninstall -g prompt-cache-utils

Build from Source

git clone https://github.com/user/prompt-cache-utils.git
cd prompt-cache-utils
npm install
npm run build
node dist/index.js

Switching Display Language

During npx prompt-cache-utils install, you can choose Chinese or English. To change later, edit ~/.prompt-cache-utils/config.json:

{ "locale": "en" }

Or set the environment variable: PROMPT_CACHE_UTILS_LOCALE=en

Requirements

Node.js >= 18
An MCP-compatible client (Claude Code, Cursor, Copilot CLI, etc.)

License

MIT

这是什么？

一个 MCP (Model Context Protocol) 服务器，帮助你节省 LLM API 调用费用，通过优化 prompt 缓存命中率来实现。支持所有提供 prompt 缓存功能的 LLM 服务商——DeepSeek、Anthropic Claude、OpenAI 等。

LLM 服务商会缓存你 prompt 的前缀（system prompt + 工具列表 + few-shot 示例）。当跨轮次调用的前缀完全一致时，这些 token 会从缓存中以极低的成本提供（例如 DeepSeek 缓存命中 token 按 1/10 计费）。本工具确保你的前缀保持稳定，并在缓存失效时精确诊断原因。

┌─ 缓存命中（便宜）─────────┐  ┌─ 缓存未命中（全价）─────┐
│ system + tools + few-shots │  │ 对话历史...             │
└────────────────────────────┘  └─────────────────────────┘

核心功能

前缀指纹 — SHA-256 指纹追踪跨调用的前缀稳定性
缓存 Miss 诊断 — 精确定位变化来源（system prompt、工具列表、few-shot）
Token 估算 — 估算 token 数量，提供上下文窗口占用率分析
自动优化代理 — 内置本地 HTTP 代理，在转发到上游 API 前自动规范化请求（工具去重/排序、动态时间戳替换、system prompt 章节重排）
用量追踪 — 通过代理捕获真实 API usage 数据，展示命中率和节省费用趋势
8 个 MCP 工具 — compute_fingerprint、diagnose_cache_miss、estimate_tokens、optimize_for_cache、cache_health_checklist、context_status、get_model_info、parse_api_usage

快速开始

# 免安装试用
npx prompt-cache-utils

# 或全局安装
npm i -g prompt-cache-utils

一键安装（推荐）

npx prompt-cache-utils install

交互式安装器会：

检测本机 Claude Code
在 ~/.claude.json 中注册 MCP 服务器
在 ~/.claude/settings.json 中配置环境变量和 Stop hook
让你选择展示语言（中文/English）

安装后重启 Claude Code 即可生效。

手动配置

将以下内容加入 MCP 配置：

{
  "mcpServers": {
    "prompt-cache-utils": {
      "type": "stdio",
      "command": "npx",
      "args": ["prompt-cache-utils"]
    }
  }
}

配置文件位置：

Claude Code: ~/.claude.json 或 ~/.claude/settings.json
VS Code: .vscode/mcp.json
Cursor / Copilot CLI / 其他 MCP 客户端: 参见各自文档

实现原理

项目由三个核心组件协同工作：

1. MCP 服务器（8 个工具）

MCP 服务器提供供 LLM 编程助手（如 Claude Code）自动调用的工具：

compute_fingerprint — 计算 system prompt、工具规范、few-shot 示例的 SHA-256 哈希。传入 sessionId 后，指纹会存入内存并在跨调用间对比变化。
diagnose_cache_miss — 对比两个指纹，精确告诉你变化所在："system prompt 增加了 42 字符"、"工具 X 被热添加"、"few-shot 顺序变化"。
estimate_tokens — 基于字符的启发式估算（英文 4 字符/token，中文 1.5，代码 3.5）。报告上下文窗口占用率及阈值指南（安全/警告/危险/溢出）。
optimize_for_cache — 扫描消息和工具中的缓存破坏模式：system 消息位置不对、工具未排序、动态时间戳、易变路径。
parse_api_usage — 解析代理日志中的真实 usage 数据，展示命中率、成本节省和历史趋势。
context_status / cache_health_checklist / get_model_info — 一站式健康检查和参考数据。

2. 本地代理服务器

MCP 服务器启动时，会在端口 10101 上启动一个本地 HTTP 代理。代理的工作流程：

拦截 Claude Code 的所有 API 请求（通过 ANTHROPIC_BASE_URL=http://127.0.0.1:10101）
自动优化 请求体后再转发到上游（DeepSeek、Anthropic 等）：
- 工具去重并按字母序排序
- 规范化 system prompt 中的空白字符
- 将动态时间戳替换为稳定占位符
- 按固定顺序重排 system 章节结构
捕获响应中的 usage 数据（支持流式 SSE 和非流式两种模式）
存储命中/未命中 token 数到环形缓冲区（~/.prompt-cache-utils/usage-log.json，最多保留 50 条）

代理仅对 DeepSeek 模型应用改写（DeepSeek 的缓存节省最显著）。其他模型的请求会原样透传。

3. Stop Hook

在 ~/.claude/settings.json 中注册的 Stop hook 会在 Claude Code 退出时运行 show-stats.js，展示会话统计摘要：

[自动优化]
已应用: 是
规则: sort_tools, sanitize_dynamic_system_lines
前缀指纹: a1b2c3d4 -> e5f6g7h8

[缓存用量]
本轮: hit=12,450 miss=1,200 hitRate=91.2%
近 3 轮: 89.5% ↑ (+1.7pp)
会话平均: 87.3%   rounds=14
会话节省 token: 89,235

[贡献判定]
可能有效：已应用改写且命中率上升（仅为相关性观察，非因果证明）。

卸载

一键卸载

npx prompt-cache-utils uninstall

会移除：

~/.claude.json 中的 MCP 服务器注册
~/.claude/settings.json 中的环境变量和 Stop hook

手动卸载

从以下位置删除 prompt-cache-utils 相关条目：

~/.claude.json → mcpServers.prompt-cache-utils
~/.claude/settings.json → env.ANTHROPIC_BASE_URL、env.PROMPT_CACHE_UTILS_UPSTREAM，以及引用 show-stats.js 的 Stop hook

清除数据

rm -rf ~/.prompt-cache-utils

卸载 npm 包

npm uninstall -g prompt-cache-utils

切换展示语言

npx prompt-cache-utils install 过程中可选择中文或 English。之后要修改，编辑 ~/.prompt-cache-utils/config.json：

{ "locale": "zh" }

或设置环境变量：PROMPT_CACHE_UTILS_LOCALE=zh

环境要求

Node.js >= 18
MCP 兼容客户端（Claude Code、Cursor、Copilot CLI 等）

许可证

MIT

MCP Servers

prompt-cache-utils

What Is This?

Key Features

Quick Start

One-Click Install (recommended)

Manual Configuration

How It Works

1. MCP Server (8 Tools)

2. Local Proxy Server

3. Stop Hook

Uninstall

One-Click Uninstall

Manual Uninstall

Remove Data

Uninstall npm Package

Build from Source

Switching Display Language

Requirements

License

这是什么？

核心功能

快速开始

一键安装（推荐）

手动配置

实现原理

1. MCP 服务器（8 个工具）

2. 本地代理服务器

3. Stop Hook

卸载

一键卸载

手动卸载

清除数据

卸载 npm 包

切换展示语言

环境要求

许可证

Install Package (if required)

Cursor configuration (mcp.json)