使用本地MCP和Apple自带OCR功能进行图片文字识别,用于DeepSeek接入Claude后没法处理图片类文件的解决方案
Apple OCR MCP
MCP (Model Context Protocol) server that extracts text from images using Apple's native Vision framework. Works entirely offline on macOS — no API keys, no internet, no third-party services.
How It Works
MCP Client (Claude Code, etc.) → JSON-RPC → server.py → ocr binary → Apple Vision → text
- server.py — Python MCP server. Receives tool calls via JSON-RPC, validates inputs, delegates to the OCR binary.
- ocr — Swift CLI compiled with
swiftc. Wraps Apple'sVNRecognizeTextRequestAPI.
No cloud services, no API keys. Just macOS system frameworks.
Compatibility
| Component | Requirement | |-----------|-------------| | macOS | 10.15 (Catalina) or later | | Arch | Apple Silicon (arm64) native. Intel (x86_64) — recompile or use pre-built binary from Releases | | Python | 3.8+ (stdlib only, no pip deps) | | Swift | 5.0+ (build only, not runtime) |
Quick Start
1. Clone & Install
git clone https://github.com/kains2866/apple-ocr-mcp.git
cd apple-ocr-mcp
./install.sh --download
This downloads the pre-built ocr binary for your Mac's architecture. No Xcode or compiler needed.
Prefer to build from source? Just run ./install.sh (requires Xcode Command Line Tools for swiftc).
2. Use It
The project includes a .mcp.json file. Claude Code auto-discovers MCP servers from the project root — just open this directory and start asking:
"Read the text from /path/to/image.jpg"
No manual config needed for project-local use.
3. (Optional) Global Install
To make apple-ocr available in all projects, not just this one:
./install.sh --global
Then add to ~/.claude/settings.local.json:
{
"enabledMcpjsonServers": ["apple-ocr"]
}
Tool Schema
| Tool | read_image_text |
|------|-------------------|
| path (required) | Absolute path to image (.jpg, .png, .gif, .webp, .bmp, .tiff) |
| lang (optional) | Language code: zh-Hans (default), en, ja, ko, etc. |
Configuration for Other MCP Clients
Claude Desktop
Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"apple-ocr": {
"command": "python3",
"args": ["/path/to/apple-ocr-mcp/server.py"]
}
}
}
Continue (VS Code / JetBrains)
Add to ~/.continue/config.json:
{
"experimental": {
"mcpServers": {
"apple-ocr": {
"command": "python3",
"args": ["/path/to/apple-ocr-mcp/server.py"]
}
}
}
}
Cursor
Add to .cursor/mcp.json in your project:
{
"mcpServers": {
"apple-ocr": {
"command": "python3",
"args": ["/path/to/apple-ocr-mcp/server.py"]
}
}
}
Project Structure
apple-ocr-mcp/
├── README.md
├── README_zh.md
├── LICENSE
├── .mcp.json # Claude Code auto-discovery config
├── install.sh # One-command setup (--download or compile)
├── build.sh # Build release binaries for both archs
├── server.py # MCP server (Python, stdlib only)
└── ocr.swift # OCR source (compile with swiftc)
Runtime (after ./install.sh):
├── ocr # Compiled binary (gitignored)
├── server.py
└── .mcp.json
Manual Build (without install.sh)
swiftc -o ocr ocr.swift
chmod +x ocr
Then use the .mcp.json already in the repo, or deploy globally:
mkdir -p ~/.claude/mcp-servers/apple-ocr
cp server.py ocr .mcp.json ~/.claude/mcp-servers/apple-ocr/
Creating a Release
For maintainers — build both architectures and publish:
./build.sh
This produces release/ocr-arm64 and release/ocr-x86_64. Create a GitHub Release, tag it (e.g. v1.0.0), and upload both binaries. Users can then run ./install.sh --download to get the right one automatically.
FAQ
Does this require an API key?
No. This project uses Apple Vision — a macOS system framework. It works entirely offline with zero API dependencies.
Will this work on Intel Macs?
Yes. Use ./install.sh --download for the x86_64 binary, or compile from source with swiftc.
What if OCR returns nothing?
Apple Vision needs reasonably clear text. Handwriting, stylized fonts, or low-contrast text may not be recognized. Try increasing image resolution or contrast.
Can I use this with other MCP clients?
Yes. This is a standard MCP server — any client supporting MCP tools can use it, provided the app has permission to execute the ocr binary. See configuration examples above.
Why Swift instead of a Python OCR library?
Apple Vision is faster, more accurate for CJK text, and requires no pip dependencies. The Swift binary is ~70KB and calls the OS directly.
License
MIT — see LICENSE