Z.AI Image & Video Generation MCP Server

A Model Context Protocol (MCP) server that provides access to Z.AI's image and video generation models for LLM applications.

Features

Image Generation: GLM-Image and CogView-4 models for high-quality image generation
Video Generation: CogVideoX-3, Vidu Q1, and Vidu 2 models for AI video creation
Multiple Input Modes: Text-to-image/video, image-to-video, start-end frame animation
Asynchronous Processing: Submit long-running tasks and poll for results
Automatic Downloads: Generate and download in a single operation
Automatic Retries: Built-in retry logic with exponential backoff
Comprehensive Validation: Input validation with clear error messages
Type-Safe: Full TypeScript support with detailed type definitions

Installation

npm install GeorgH93/z_ai_image_gen_mcp

Configuration

Set your Z.AI API key as an environment variable:

export ZAI_API_KEY=your_api_key_here

Get your API key from the Z.AI API Keys page or sign up for the GLM Coding Plan.

Optional Configuration

| Environment Variable | Description | Default | |---------------------|-------------|---------| | ZAI_API_BASE_URL | API base URL | https://api.z.ai/api | | ZAI_DEFAULT_MODEL | Default model | glm-image | | ZAI_DEFAULT_SIZE | Default image size | 1280x1280 | | ZAI_REQUEST_TIMEOUT | Request timeout (ms) | 60000 | | ZAI_MAX_RETRIES | Max retry attempts | 3 | | ZAI_RETRY_DELAY | Initial retry delay (ms) | 1000 |

Usage

With Claude Desktop

Add to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "z-ai-image": {
      "command": "npx",
      "args": ["z-ai-image-mcp"],
      "env": {
        "ZAI_API_KEY": "your_api_key_here"
      }
    }
  }
}

With Other MCP Clients

Run the server directly:

npx z-ai-image-mcp

Or programmatically:

import { createServer, loadConfig } from 'z-ai-image-mcp';

const config = loadConfig();
const server = createServer(config);
// Connect to your transport...

With OpenCode

Add to your OpenCode configuration (opencode.json or opencode.jsonc in your project root):

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "z-ai-image": {
      "type": "local",
      "command": ["npx", "z-ai-image-mcp"],
      "enabled": true,
      "environment": {
        "ZAI_API_KEY": "your_api_key_here"
      }
    }
  }
}

Or using an environment variable reference:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "z-ai-image": {
      "type": "local",
      "command": ["npx", "z-ai-image-mcp"],
      "enabled": true,
      "environment": {
        "ZAI_API_KEY": "{env:ZAI_API_KEY}"
      }
    }
  }
}

Using with OpenCode prompts:

Generate a professional logo for a tech startup. use z-ai-image

Or add to your AGENTS.md:

When generating images, use the `z-ai-image` MCP server tools.

Per-agent configuration (optional):

To enable the MCP server only for specific agents:

{
  "$schema": "https://opencode.ai/config.json",
  "mcp": {
    "z-ai-image": {
      "type": "local",
      "command": ["npx", "z-ai-image-mcp"],
      "enabled": true,
      "environment": {
        "ZAI_API_KEY": "{env:ZAI_API_KEY}"
      }
    }
  },
  "tools": {
    "z-ai-image*": false
  },
  "agent": {
    "design-agent": {
      "tools": {
        "z-ai-image*": true
      }
    }
  }
}

Available Tools

1. `list_models`

List all available image generation models and their capabilities.

Use this tool to discover available models, their features, and recommended settings.

2. `generate_image`

Generate an image synchronously from a text prompt.

Parameters:

prompt (required): Text description of the image (max 4000 characters)
model (optional): glm-image or cogview-4-250304 (default: glm-image)
size (optional): Image dimensions, e.g., 1280x1280 (default: 1280x1280)
quality (optional): hd or standard (default: hd for GLM-Image)
user_id (optional): End user ID for abuse prevention (6-128 characters)

Example:

Generate an image of a cute kitten sitting on a windowsill with a sunset background.

3. `generate_image_async`

Start an asynchronous image generation task. Returns a task ID for polling.

Parameters:

prompt (required): Text description of the image
model (optional): Only glm-image supports async (default: glm-image)
size (optional): Image dimensions (default: 1280x1280)
quality (optional): Only hd supported for async (default: hd)
user_id (optional): End user ID for abuse prevention

Example:

Start async generation of a complex poster design.

4. `get_async_result`

Retrieve the result of an asynchronous image generation task.

Parameters:

task_id (required): The task ID from generate_image_async

Example:

Check the status of task ID "task-12345".

5. `download_image`

Download an image from a URL and return it as base64 or save to a file.

Parameters:

url (required): The URL of the image to download (e.g., from generate_image or get_async_result)
output (optional): base64 or file_output (default: base64)
file_output (optional): Absolute path to save the image file (required if output is file_output). Example: /path/to/image.png

Output Modes:

base64: Returns the image data directly as base64 (auto-switches to file if > 1MB)
file_output: Saves the image to disk at the specified path

Example:

Download the generated image and save it to /home/user/images/logo.png

Note: Z.AI image URLs expire after 30 days. Use this tool to download and store images permanently.

6. `generate_and_download_image` ⭐ Recommended

Generate an image and automatically download it in a single operation. This is the most convenient tool when you want the image data immediately.

Parameters:

prompt (required): Text description of the image (max 4000 characters)
model (optional): glm-image or cogview-4-250304 (default: glm-image)
size (optional): Image dimensions, e.g., 1280x1280 (default: 1280x1280)
quality (optional): hd or standard (default: hd for GLM-Image)
user_id (optional): End user ID for abuse prevention (6-128 characters)
output (optional): base64 or file_output (default: base64)
file_output (optional): Absolute path to save the image file (required if output is file_output)
poll_interval (optional): Seconds to wait between polling for async results (default: 3)
max_wait (optional): Maximum seconds to wait for generation (default: 120)

Output Modes:

base64: Returns the image data directly as base64 (auto-switches to file if > 1MB)
file_output: Saves the image to disk at the specified path

Examples:

# Generate and get as base64
Generate a logo for my company and show me the image.

# Generate and save to file
Generate a logo and save it to /home/user/images/logo.png

Behavior:

For GLM-Image: Uses async API with automatic polling until complete
For CogView-4: Uses synchronous API
Automatically downloads the result once generation completes
Returns image as base64 or saves to specified path

Video Generation Tools

7. `list_video_models`

List all available video generation models and their capabilities.

Use this tool to discover available video models, their features, and supported parameters.

8. `generate_video`

Generate a video asynchronously from text or images. Returns a task ID for polling.

Parameters:

model (required): Video generation model
- cogvideox-3: Z.AI flagship model (up to 4K, 5-10s, audio support)
- viduq1-text: Text-to-video, 1080P, 5s
- viduq1-image: Image-to-video, 1080P, 5s
- viduq1-start-end: Start-end frame, 1080P, 5s
- vidu2-image: Image-to-video, 720P, 4s (faster, cheaper)
- vidu2-start-end: Start-end frame, 720P, 4s
- vidu2-reference: Reference-based, 720P, 4s
prompt (optional): Text description (max 512 characters)
image_url (optional): Image URL(s) for image-to-video generation
quality (CogVideoX-3): quality or speed
size (optional): Video resolution
duration (optional): Video duration in seconds
fps (CogVideoX-3): 30 or 60
with_audio (optional): Generate AI sound effects
style (Vidu Q1 text): general or anime
aspect_ratio (Vidu Q1/2): 16:9, 9:16, or 1:1
movement_amplitude (Vidu): auto, small, medium, or large
user_id (optional): End user ID for abuse prevention

Examples:

# Text-to-video
Generate a video of a cat playing with a ball.

# Image-to-video
Animate this image: [image_url]

# Start-end frame
Create a smooth transition from [first_frame] to [last_frame].

9. `get_video_result`

Retrieve the result of an asynchronous video generation task.

Parameters:

task_id (required): The task ID from generate_video

Note: Video generation typically takes 30 seconds to several minutes depending on duration and quality.

10. `generate_and_download_video` ⭐ Recommended

Generate a video and automatically download it. Polls for completion and saves the video file.

Parameters:

All parameters from generate_video plus:
file_output (optional): Absolute path to save the video file
poll_interval (optional): Seconds to wait between polling (default: 10)
max_wait (optional): Maximum seconds to wait (default: 300)

Example:

Generate a video of a sunset over the ocean and save it to /home/user/videos/sunset.mp4

Note: Videos are always saved to file (too large for base64). Video URLs expire after 1 day.

Models

GLM-Image

Z.AI's flagship image generation model with a hybrid autoregressive + diffusion architecture.

Best for: Complex compositions, text rendering, detailed illustrations, commercial posters
Quality options: hd (detailed, ~20s), standard (faster, ~5-10s)
Size range: 1024-2048px per dimension (divisible by 32)
Recommended sizes: 1280×1280, 1568×1056, 1056×1568, 1472×1088, 1088×1472, 1728×960, 960×1728
Async support: Yes

CogView-4-250304

General-purpose image generation with fast text understanding.

Best for: General image generation, quick iterations
Quality options: hd, standard
Size range: 512-2048px per dimension (divisible by 16)
Recommended sizes: 1024×1024, 768×1344, 864×1152, 1344×768, 1152×864, 1440×720, 720×1440
Async support: No

Video Models

CogVideoX-3

Z.AI's flagship video generation model with improved frame stability and clarity.

Best for: Text-to-video, image-to-video, start-end frame animation
Resolution: Up to 4K (3840x2160)
Duration: 5 or 10 seconds
Features: Audio generation, 30/60 FPS, quality/speed modes
Price: $0.20/video

Vidu Q1

High-quality video generation with 1080P output.

| Model | Capability | Duration | Price | |-------|------------|----------|-------| | viduq1-text | Text-to-video | 5s | $0.40 | | viduq1-image | Image-to-video | 5s | $0.40 | | viduq1-start-end | Start-end frame | 5s | $0.40 |

Features: General/anime styles, motion amplitude control

Vidu 2

Fast and cost-effective video generation with 720P output.

| Model | Capability | Duration | Price | |-------|------------|----------|-------| | vidu2-image | Image-to-video | 4s | $0.20 | | vidu2-start-end | Start-end frame | 4s | $0.20 | | vidu2-reference | Reference-based | 4s | $0.40 |

Features: Audio generation, motion amplitude control, multi-image reference

Error Handling

The server handles various error scenarios:

| Error Type | Description | |-----------|-------------| | AUTH_ERROR | Invalid or missing API key | | RATE_LIMIT | Too many requests - will auto-retry | | VALIDATION_ERROR | Invalid parameters | | SERVER_ERROR | Z.AI server issues - will auto-retry | | NETWORK_ERROR | Connection issues - will auto-retry | | TIMEOUT_ERROR | Request timeout - will auto-retry | | CONTENT_FILTER | Prompt blocked by content policy |

Development

Setup

git clone <repo-url>
cd z-ai-image-mcp
npm install
cp .env.example .env
# Edit .env with your API key

Scripts

npm run build        # Build TypeScript
npm run dev          # Run in development mode
npm test             # Run all tests
npm run test:unit    # Run unit tests only
npm run test:integration  # Run integration tests
npm run test:e2e     # Run E2E tests
npm run test:coverage    # Run tests with coverage
npm run typecheck    # Type check without emit

License

MIT

MCP Servers

Z.AI Image & Video Generation MCP Server

Features

Installation

Configuration

Optional Configuration

Usage

With Claude Desktop

With Other MCP Clients

With OpenCode

Available Tools

1. `list_models`

2. `generate_image`

3. `generate_image_async`

4. `get_async_result`

5. `download_image`

6. `generate_and_download_image` ⭐ Recommended

Video Generation Tools

7. `list_video_models`

8. `generate_video`

9. `get_video_result`

10. `generate_and_download_video` ⭐ Recommended

Models

GLM-Image

CogView-4-250304

Video Models

CogVideoX-3

Vidu Q1

Vidu 2

Error Handling

Development

Setup

Scripts

License

Links

安装包（如果需要）

Cursor 配置 (mcp.json)

Z.AI Image & Video Generation MCP Server

Features

Installation

Configuration

Optional Configuration

Usage

With Claude Desktop

With Other MCP Clients

With OpenCode

Available Tools

1. list_models

2. generate_image

3. generate_image_async

4. get_async_result

5. download_image

6. generate_and_download_image ⭐ Recommended

Video Generation Tools

7. list_video_models

8. generate_video

9. get_video_result

10. generate_and_download_video ⭐ Recommended

Models

GLM-Image

CogView-4-250304

Video Models

CogVideoX-3

Vidu Q1

Vidu 2

Error Handling

Development

Setup

Scripts

License

Links

安装包 （如果需要）

Cursor 配置 (mcp.json)

1. `list_models`

2. `generate_image`

3. `generate_image_async`

4. `get_async_result`

5. `download_image`

6. `generate_and_download_image` ⭐ Recommended

7. `list_video_models`

8. `generate_video`

9. `get_video_result`

10. `generate_and_download_video` ⭐ Recommended

安装包（如果需要）