image-tiler-mcp-server

MCP server that tiles large images for LLM vision analysis.

Preview of image tiling grid with advised vision models size and token estimates

The server generates an interactive HTML preview for every image, showing per-model tile grids and token estimates

Quick Start

Claude Code

claude mcp add image-tiler -- npx -y image-tiler-mcp-server

image-tiler is a local alias. You can name it anything you like. image-tiler-mcp-server is the npm package that gets downloaded and run.

See Claude Code MCP docs for more info.

Codex CLI

codex mcp add image-tiler -- npx -y image-tiler-mcp-server

Or add to ~/.codex/config.toml:

[mcp_servers.image-tiler]
command = "npx"
args = ["-y", "image-tiler-mcp-server"]

VS Code (Cline / Continue)

Add to your VS Code MCP settings:

{
  "image-tiler": {
    "command": "npx",
    "args": ["-y", "image-tiler-mcp-server"]
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "image-tiler": {
      "command": "npx",
      "args": ["-y", "image-tiler-mcp-server"]
    }
  }
}

Claude Desktop

Add to your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "image-tiler": {
      "command": "npx",
      "args": ["-y", "image-tiler-mcp-server"]
    }
  }
}

Restart Claude Desktop after editing.

Global Install (faster startup)

npm install -g image-tiler-mcp-server

Then use the simpler config in any client:

{
  "command": "image-tiler-mcp-server"
}

From Source

git clone https://github.com/keiver/image-tiler-mcp-server.git
cd image-tiler-mcp-server
npm install
npm run build

Then point your MCP config to the built file:

{
  "command": "node",
  "args": ["/absolute/path/to/image-tiler-mcp-server/dist/index.js"]
}

Usage

Tile an image

lets tile ~/Desktop/source.jpg

The server shows you a comparison of supported vision models with tile counts and token estimates. Pick the model that matches your use case, and the server tiles the image and returns them in batches for analysis.

Capture a web page

capture screenshot of https://example.com and analyze the content

The server launches Chrome, captures a full-page screenshot (scroll-stitching pages over 16,384px), then presents the same model comparison. Choose a model and the server tiles the capture for analysis.

To get only the screenshot without tiling, just ask for a screenshot and stop after the comparison step.

Customize tiling

| What | Example prompt | |------|---------------| | Target a specific model | "Tile hero.png for OpenAI" | | Keep full resolution | "Tile banner.png at full resolution, no downscaling" | | PNG output | "Tile diagram.png as lossless PNG" | | Tile from URL | "Download and tile https://example.com/chart.png" | | Tile from base64 | "Tile this base64 image: iVBORw0KGgo..." |

Supported Models

| Model | Default tile | Tokens/tile | Max tile | ID | |-------|-------------|-------------|----------|-----| | Claude | 1092px | 1590 | 1568px | claude | | OpenAI (GPT-4o/o-series) | 768px | 765 | 2048px | openai | | Gemini | 768px | 258 | 768px | gemini | | Gemini 3 | 1536px | 1120 | 3072px | gemini3 |

OpenAI note: The openai config targets the GPT-4o / o-series vision pipeline (512px tile patches). GPT-4.1 uses a fundamentally different pipeline (32x32 pixel patches) and is not currently supported. It would require a separate model config with a different calculation approach.

Gemini 3 note: Gemini 3 uses a fixed token budget per image (1,120 tokens at default resolution, regardless of dimensions). Tiling increases total token cost but preserves fine detail. For cases where detail isn't critical, consider sending a single image instead.

Why tile? What LLMs do to large images

The Problem

You screenshot a full page, paste it into Claude, and Claude rejects it. Your 20,000px full-page screenshot? Claude won't even look at it. Anything over 8,000px on either dimension gets refused outright.

GPT-4o is more forgiving but still destructive: it first scales your image to fit within 2,048px, then scales the shortest side down to 768px, then tiles internally. An 8,192px-wide NASA panorama becomes ~1,456 x 768 before GPT-4o's own tiling even begins.

Gemini 1.5/2.0 handles large images natively at 768px tiles without downscaling. Gemini 3, however, caps each image at a fixed token budget (1,120 tokens at default resolution) regardless of size. Tiling gives each piece its own budget.

Each tile stays within the model's sweet spot, so the LLM processes it at full resolution.

What Happens Without Tiling

Using assets/portrait.png (3,600 x 20,220, a full-page National Geographic capture) as an example:

| Model | What happens | Impact | |-------|-------------|--------| | Claude | Rejected, exceeds 8,000px dimension limit | Cannot analyze the image at all | | GPT-4o | Downscaled to ~365 x 2,048, then internally tiled | ~1% of original pixels survive the downscale | | Gemini 3 | Capped at 1,120 tokens per image (default) | Fixed token budget regardless of image size |

Gemini 1.5/2.0 tiles large images natively at 768px without downscaling. For Gemini 3, tiling multiplies the total token budget by sending each tile as a separate image.

With Tiling

| Model | Tiles | Result | |-------|-------|--------| | Claude | 76 tiles at 1,092px | Every tile under 8,000px and 1,568px limits, full analysis | | GPT-4o | 135 tiles at 768px | Every tile under 2,048px, no pre-downscale needed | | Gemini 3 | 135 tiles at 768px | Each tile gets its own token budget |

Using assets/landscape.png (8,192 x 4,320, NASA image gallery):

| Model | Without tiling | With tiling | |-------|----------------|-------------| | Claude | Rejected (8,192 > 8,000px limit) | 32 tiles at 1,092px, full analysis | | GPT-4o | Downscaled to ~1,456 x 768 (~3% of pixels survive) | 66 tiles at 768px, full resolution | | Gemini 3 | Capped at 1,120 tokens | 18 tiles at 1,536px, 18x token budget |

Based on published model vision documentation as of Feb 2026: Claude vision limits · OpenAI vision guide · Gemini image understanding · Gemini media resolution

How It Works

This MCP server:

Reads the image dimensions and the target model's vision config
Calculates an optimal grid that keeps every tile within the model's sweet spot
Extracts tiles as individual images (WebP default, PNG optional) and saves them to disk
Returns a metadata summary (grid layout, file paths, token cost, per-tile content hints)
Serves tiles on demand — call with tilesDir + start/end to retrieve batches of up to 5 tiles

Auto-downscaling: Images over 10,000px on their longest side are automatically downscaled before tiling (configurable via maxDimension). This keeps tile counts reasonable and improves LLM comprehension by increasing content density per tile. Set maxDimension=0 to disable, or pass a custom value (e.g., maxDimension=5000) for more aggressive downscaling.

Tool Reference

`tiler`

One unified tool that handles all image tiling operations. The mode is auto-detected from the parameters you provide:

tilesDir present → Tile retrieval mode (read-only pagination)
url or screenshotPath present → URL capture mode (screenshot + tile)
filePath, sourceUrl, dataUrl, or imageBase64 present → Tile-image mode

Mode priority: When multiple mode params are present, the tool resolves by priority: tilesDir > url/screenshotPath > filePath/sourceUrl/dataUrl/imageBase64. Avoid passing params from different modes in the same call.

Workflow:

The tool uses a two-step process to let you choose the right model before tiling:

Compare - Call with only the image source. Returns a comparison table showing tile counts and token estimates for each supported model, plus an interactive HTML preview.
Tile - Call again with the chosen preset + outputDir from step 1, plus:
- Image sources: re-include your original source param (filePath, sourceUrl, etc.)
- Captures: use screenshotPath from step 1 (not the original url)

Skip the comparison step: Provide preset and outputDir on the first call to tile immediately.

Interactive model picker: Clients that support MCP elicitation get a dropdown picker instead of the comparison table.

Parameters - Image Source (tile-image mode)

| Parameter | Type | Required | Default | Description | |---|---|---|---|---| | filePath | string | no* | - | Absolute or relative path to the image file | | sourceUrl | string | no* | - | HTTPS URL to download the image from (max 50MB, 30s timeout) | | dataUrl | string | no* | - | Data URL with base64-encoded image | | imageBase64 | string | no* | - | Raw base64-encoded image data |

*At least one image source is required for tile-image mode.

Parameters - URL Capture (capture mode)

| Parameter | Type | Required | Default | Description | |---|---|---|---|---| | url | string | no | - | URL of the web page to capture. Requires Chrome/Chromium installed (or CHROME_PATH env var). | | screenshotPath | string | no | - | Path to a previously captured screenshot. Skips URL capture when provided. | | viewportWidth | number | no | 1280 | Browser viewport width in pixels (320-3840) | | waitUntil | string | no | "load" | When to consider the page loaded: "load", "networkidle", or "domcontentloaded" | | delay | number | no | 0 | Additional delay in ms after page load (max 30000) |

Supports scroll-stitching for pages taller than 16,384px. Automatically triggers lazy-loaded images (loading="lazy") before capture by scrolling through the page. Pages without lazy images are unaffected.

Parameters - Tile Retrieval (pagination mode)

| Parameter | Type | Required | Default | Description | |---|---|---|---|---| | tilesDir | string | no | - | Path to tiles directory (returned by a previous tiling call as outputDir) | | start | number | no | 0 | Start tile index (0-based, inclusive) | | end | number | no | start + 4 | End tile index (0-based, inclusive). Max 5 tiles per batch. | | skipBlankTiles | boolean | no | true | Skip blank tiles and return a text annotation instead of an image. Set to false to include all tiles. |

Parameters - Tiling Config (shared across modes)

| Parameter | Type | Required | Default | Description | |---|---|---|---|---| | preset | string | no | Auto (cheapest) | Target vision preset: "claude", "openai", "gemini", "gemini3". Auto-selects the most token-efficient preset when omitted. | | tileSize | number | no | Model default | Tile size in pixels. Clamped to model's supported range with a warning if out of bounds. | | maxDimension | number | no | 10000 | Max dimension in px (0 to disable, or 256-65536). Values 1-255 are silently clamped to 256. Pre-downscales the image so its longest side fits within this value before tiling. | | outputDir | string | no | See below | Directory to save tiles. Defaults: for filePath sources, tiles/{name}_vN/ next to source (auto-incrementing: _v1, _v2, ..., _vN); for sourceUrl/dataUrl/imageBase64, {base}/tiles/tiled_{timestamp}_{hex}/; for captures, {base}/tiles/capture_{timestamp}_{hex}/. {base} is ~/Desktop, ~/Downloads, or ~ (first available). | | page | number | no | 0 | Tile page to return (0 = first 5, 1 = next 5, etc.) | | format | string | no | "webp" | Output format: "webp" (smaller, default) or "png" (lossless) | | includeMetadata | boolean | no | true | Analyze each tile using Shannon entropy and return content classification (blank, low-detail, mixed, high-detail) plus entropy and sharpness values per tile | | model | string | no | - | Deprecated. Use preset instead. Still accepted; emits a deprecation warning in the response. |

Behaviors

Source conflict: Multiple image source params → highest-precedence source is used with a warning (filePath > sourceUrl > dataUrl > imageBase64).
Re-entry: If outputDir already has a preview from the comparison step, the server skips straight to tiling.
Elicitation cancellation: Cancelling the model picker returns "Tiling cancelled by user." without tiling.
Versioned output: Repeated tiling of the same source creates _v1, _v2, ..., _vN directories to avoid overwriting.
Tile naming: tile_ROW_COL.{format} with zero-padded 3-digit indices (e.g., tile_000_003.webp), row-by-row, left-to-right.

Supported Formats

PNG, JPEG, WebP, TIFF, GIF

Troubleshooting

"Command not found" - Make sure Node.js 20+ is installed: node --version

"File not found" - Use absolute paths. Relative paths resolve from the MCP server's working directory.

"MCP tools not available" - Restart your MCP client after config changes. In Claude Code, run /mcp to check server status.

"Chrome not found" - Install Google Chrome or set the CHROME_PATH environment variable to the Chrome executable (must be an absolute path).

Running as root / in Docker - Set CHROME_NO_SANDBOX=1 to launch Chrome without sandbox (also enabled automatically when running as root).

Security

Local stdio server - runs with the same filesystem permissions as the MCP client that spawns it. No path sandboxing, no SSRF protection on URL downloads.

If deploying remotely: Add path validation, SSRF protection (block private/internal IP ranges), and authentication. This server is not designed for multi-tenant or network-exposed use.

Requirements

Node.js 20+
Compatible MCP client (Claude Code, Codex CLI, VS Code, Cursor, Claude Desktop)

Contributing

See CONTRIBUTING.md for how to report bugs, suggest changes, and submit PRs.

Acknowledgments

Built with the help of Claude Code as an AI assistant for code drafting, testing, and documentation.

License

MIT

MCP Servers

image-tiler-mcp-server

Quick Start

Claude Code

Usage

Tile an image

Capture a web page

Customize tiling

Supported Models

The Problem

What Happens Without Tiling

With Tiling

How It Works

`tiler`

Parameters - Image Source (tile-image mode)

Parameters - URL Capture (capture mode)

Parameters - Tile Retrieval (pagination mode)

Parameters - Tiling Config (shared across modes)

Behaviors

Supported Formats

Troubleshooting

Security

Requirements

Contributing

Acknowledgments

License

Links

安装包（如果需要）

Cursor 配置 (mcp.json)

image-tiler-mcp-server

Quick Start

Claude Code

Usage

Tile an image

Capture a web page

Customize tiling

Supported Models

The Problem

What Happens Without Tiling

With Tiling

How It Works

tiler

Parameters - Image Source (tile-image mode)

Parameters - URL Capture (capture mode)

Parameters - Tile Retrieval (pagination mode)

Parameters - Tiling Config (shared across modes)

Behaviors

Supported Formats

Troubleshooting

Security

Requirements

Contributing

Acknowledgments

License

Links

安装包 （如果需要）

Cursor 配置 (mcp.json)

`tiler`

安装包（如果需要）