macpdf-ocr-mcp

macpdf-ocr-mcp is a macOS-native, PDFKit-first stdio MCP server with Apple Vision OCR fallback for scanned or image-heavy PDF pages.

It is designed for local LLM workflows: extract useful text first, add compact images only when they help, and keep large documents batched.

The project is tested with Codex, but it is not Codex-specific. It should work with MCP clients that support local stdio servers, including Claude Desktop, Claude Code, and Gemini CLI. Client-specific setup is intentionally left to each client or AI assistant.

📊 Light Benchmark

Light local A/B test using Codex CLI with gpt-5.5: across three PDF types, MCP preprocessing cut output/reasoning tokens by roughly half or more in all groups, reduced runtime by about one-third to one-half, and cut input tokens by over half on the image-heavy PDF. Mixed-PDF input was approximately unchanged.

| PDF type | Input tokens | Output tokens | Reasoning tokens | Runtime | |---|---:|---:|---:|---:| | Image-heavy | -55% | -73% | -92% | -34% | | Mixed | ~ | -58% | -87% | -53% | | Text-heavy | -24% | -47% | -63% | -56% |

~ means approximately unchanged. Results depend on PDF structure, prompt shape, and whether image previews are included in tool output.

✅ Requirements

macOS 13 or newer
Xcode or Apple Command Line Tools with Swift 6.3 or newer
An MCP client that supports local stdio servers

The implementation uses Apple-native frameworks only: PDFKit, Vision, CoreGraphics, and AppKit.

🛠️ Build

From the project root:

swift build -c release

The release executable is created at:

.build/release/macpdf-ocr-mcp

🚀 Installation

Codex

Codex is the currently tested client.

codex mcp add macpdf-ocr-mcp -- "$(pwd)/.build/release/macpdf-ocr-mcp" mcp-stdio

Restart Codex or open a new Codex session after registration so the MCP server list is reloaded.

Others

Other MCP clients can use the same executable with stdio transport. Point the client at:

<project-root>/.build/release/macpdf-ocr-mcp mcp-stdio

This should work with clients such as Claude Desktop, Claude Code, and Gemini CLI when they are configured for local stdio MCP servers.

⚙️ Runtime Behavior

PDFKit is preferred when a PDF has a usable text layer.
Vision OCR is used for scanned pages or when OCR is explicitly requested.
hybrid mode uses OCR only when it materially improves missing text extraction.
Large reads should use batching instead of returning a full document in one response.
Region boxes use normalized top-left coordinates: [left, top, width, height].

🔧 MCP Tools

pdf_read
- First-pass PDF reading. It returns page-grouped text, optional preview image paths, coarse regions, and continuation metadata for batched reads.
- Arguments and MCP examples
  Required:
  - file: PDF path
  - pages: all, a single page such as 12, or a range such as 12-20
  Optional:
  - mode
    
    balanced (default): PDFKit/OCR page text + compressed page image (scale=5, 1200px)
    
    text_only: PDFKit/OCR page text + text regions
    
    image_only: compressed page image only (scale=5, 1200px)
    
    text_focus: PDFKit/OCR page text + smaller page image (scale=5, 1080px)
    
    image_focus: PDFKit/OCR page text + larger page image (scale=5, 1380px)
  - engine
    
    auto (default): PDFKit if text exists, otherwise Vision OCR
    
    pdfkit: PDFKit text extraction only
    
    ocr: Vision OCR only
    
    hybrid: like auto; reserved for stricter PDFKit+OCR merging
  - batch_size: 4 by default
  - image_scale: 5 by default; 1-2=512px, 3-4=768px, 5-6=1200px, 7-8=1600px, 9-10=2200px
```
{
  "file": "/path/to/document.pdf",
  "pages": "1-3"
}

{
  "file": "/path/to/document.pdf",
  "pages": "1-10",
  "mode": "balanced",
  "engine": "auto",
  "batch_size": 5,
  "image_scale": 6
}
```
pdf_focus
- Second-pass detail reading for one normalized page region. It returns local text, an optional cropped image path, and local region hints.
- Arguments and MCP examples
  Required:
  - file: PDF path
  - page: 1-based page number
  - bbox_norm: normalized top-left-origin box [left, top, width, height]
  Optional:
  - mode
    
    balanced (default): PDFKit/OCR region text + compressed region image (scale=7, 1600px)
    
    text_only: PDFKit/OCR region text only
    
    image_only: compressed region image only (scale=7, 1600px)
    
    text_focus: PDFKit/OCR region text + compressed region image (scale=7, 1600px)
    
    image_focus: PDFKit/OCR region text + larger region image (scale=7, 2000px)
  - engine
    
    auto (default): PDFKit if region text exists, otherwise Vision OCR
    
    pdfkit: PDFKit text extraction only
    
    ocr: Vision OCR only
    
    hybrid: like auto; reserved for stricter PDFKit+OCR merging
  - image_scale: 7 by default; 1-6=1200px, 7-8=1600px, 9-10=2200px
```
{
  "file": "/path/to/document.pdf",
  "page": 4,
  "bbox_norm": [0.10, 0.20, 0.70, 0.25]
}

{
  "file": "/path/to/document.pdf",
  "page": 4,
  "bbox_norm": [0.10, 0.20, 0.70, 0.25],
  "mode": "balanced",
  "engine": "auto",
  "image_scale": 7
}
```
save_region
- Saves a selected region from a PDF page or image to a local file.
- Arguments and MCP examples
  Required:
  - source_type: pdf or image
  - source_path: source PDF or image path
  - output_path: destination image path
  - bbox_norm: normalized top-left-origin box [left, top, width, height]
  Additional PDF arguments:
  - page: required when source_type=pdf
  - short_side_px: 1600px by default
```
{
  "source_type": "pdf",
  "source_path": "/path/to/document.pdf",
  "page": 4,
  "short_side_px": 1600,
  "output_path": "/tmp/region.png",
  "bbox_norm": [0.10, 0.20, 0.70, 0.25]
}

{
  "source_type": "image",
  "source_path": "/path/to/image.png",
  "output_path": "/tmp/region.png",
  "bbox_norm": [0.10, 0.20, 0.70, 0.25]
}
```
ocr_detect_regions
- Runs Vision OCR on an image and returns OCR lines, normalized boxes, and grouped candidate regions.
- Arguments and MCP examples
  Required:
  - image: image path
  Optional:
  - bbox_norm: OCR only this normalized image region
```
{
  "image": "/path/to/image.png"
}

{
  "image": "/path/to/image.png",
  "bbox_norm": [0.10, 0.20, 0.70, 0.25]
}
```

Generated preview and focus images are written under .tmp/runtime/. That directory is a local runtime artifact and should not be committed.

Local CLI debugging

The executable can also be called directly for local checks. MCP clients normally call tools through the protocol, not through these shell commands.

.build/release/macpdf-ocr-mcp pdf-read /path/to/document.pdf 1-3
.build/release/macpdf-ocr-mcp pdf-focus /path/to/document.pdf 4 0.10 0.20 0.70 0.25 balanced auto 7
.build/release/macpdf-ocr-mcp save-region pdf /path/to/document.pdf 4 1600 /tmp/region.png 0.10 0.20 0.70 0.25
.build/release/macpdf-ocr-mcp ocr-detect-regions /path/to/image.png

📦 Distribution

The simplest distribution path is source-first:

git clone <repo-url>
cd macpdf-ocr-mcp
swift build -c release
codex mcp add macpdf-ocr-mcp -- "$(pwd)/.build/release/macpdf-ocr-mcp" mcp-stdio

Prebuilt GitHub Release binaries can be added later once the MCP interface is stable.

🔗 Acknowledgements & Resources

This project is built with Apple-native frameworks and integrates with MCP-compatible clients.

MCP Servers

macpdf-ocr-mcp

📊 Light Benchmark

✅ Requirements

🛠️ Build

🚀 Installation

Codex

Others

⚙️ Runtime Behavior

🔧 MCP Tools

📦 Distribution

🔗 Acknowledgements & Resources

安装命令（包未发布）

Cursor 配置 (mcp.json)

macpdf-ocr-mcp

📊 Light Benchmark

✅ Requirements

🛠️ Build

🚀 Installation

Codex

Others

⚙️ Runtime Behavior

🔧 MCP Tools

📦 Distribution

🔗 Acknowledgements & Resources

安装命令 （包未发布）

Cursor 配置 (mcp.json)

安装命令（包未发布）