Open Google Image Generator MCP

This project is a Model Context Protocol (MCP) server that exposes Google Cloud Vertex AI and Google GenAI SDK capabilities—Imagen, Gemini, Veo, Lyria, and Chirp models—to MCP-compatible clients. Built with the FastMCP framework.

Current version: 3.0.0 — Full GenAI SDK integration (embed, speech, video analysis, live generation), WebP/AVIF format support, multi-tier model selection, parallel batch generation, sequential pipeline engine, and comprehensive video tools.

Features & Tools

Image Tools

| Tool | Description | Backend | |---|---|---| | tool_generate_image | Text-to-image generation. Supports aspect ratio, negative prompt, seed, watermark, GCS output, and WebP/AVIF output | Imagen 4 (imagen-4.0-fast-generate-001) | | tool_edit_image | Mask-based inpaint/outpaint, background swap, product image, and prompt-driven edit. See Edit modes below | Imagen 3 Capability (imagen-3.0-capability-001) | | tool_transform_image | Free-form image + text → image transformation: style transfer, scene rewriting, multi-reference composition | Gemini multimodal (gemini-2.5-flash-image) | | tool_analyze_image | Multimodal image understanding and Q&A. Supports thinking_level (MINIMAL/LOW/MEDIUM/HIGH) and media_resolution (LOW/MEDIUM/HIGH/ULTRA_HIGH) | Gemini Vision (gemini-2.5-flash) | | tool_upscale_image | Upscale low-resolution images | Imagen | | tool_remove_background | Remove background via EDIT_MODE_BGSWAP | Imagen | | tool_batch_generate | Parallel batch text-to-image generation (up to 10 prompts, max 4 concurrent). balanced tier not supported for batch | Imagen | | tool_run_pipeline | Sequential multi-step image processing pipeline (generate → edit → transform → …) | Mixed |

Video Tools

| Tool | Description | Backend | |---|---|---| | tool_generate_video | Text-to-video generation. Supports audio_enabled for Veo 3+ | Veo (veo-3.1-fast-generate-001) | | tool_image_to_video | Animate a still image into video. Supports optional last_frame_path for first+last frame mode | Veo | | tool_extend_video | Extend an existing video clip by 4, 6, or 8 seconds | Veo | | tool_video_object_edit | Insert or remove an object in a video via operation (insert/remove) and prompt | Veo | | tool_analyze_video | Video understanding and Q&A (max 20MB; mp4, mov, avi, webm, mkv) | Gemini GenAI SDK |

Audio Tools

| Tool | Description | Backend | |---|---|---| | tool_generate_speech | Text-to-speech with voice selection (Aoede, Charon, Fenrir, Kore, Puck). Supports model_tier (fast/quality). Outputs WAV | Gemini TTS (gemini-2.5-flash-preview-tts / gemini-2.5-pro-preview-tts) | | tool_generate_music | Music generation from a text prompt | Lyria 2 / Lyria 3 (GenAI SDK) |

GenAI SDK Tools

| Tool | Description | Backend | |---|---|---| | tool_embed | Text embeddings as float vectors | Gemini Embedding (text-embedding-004 on Vertex AI, gemini-embedding-2 on Gemini API) | | tool_live_generate | Streaming text generation — response is accumulated and returned in full | Gemini Live (gemini-3.5-flash / gemini-2.5-pro) |

Utility Tools

| Tool | Description | |---|---| | tool_list_available_models | Live-probes every candidate model in the configured project/location and returns only those that respond (200/400 = reachable, 404 = excluded). Cached for the server process lifetime; pass force_refresh=true to rescan. Also reports available update versions. | | tool_upload_file | Register a local file for use as a reference image in subsequent tool calls (e.g. tool_transform_image). Returns a file_uri. |

Edit modes (`tool_edit_image`)

| edit_mode | What it does | Mask required? | |---|---|---| | EDIT_MODE_DEFAULT (default) | Prompt-driven full-image edit, no mask | No | | EDIT_MODE_INPAINT_INSERTION | Add an object into the masked region | Yes | | EDIT_MODE_INPAINT_REMOVAL | Remove content in the masked region | Yes | | EDIT_MODE_OUTPAINT | Extend the image beyond its original bounds | Yes | | EDIT_MODE_BGSWAP | Swap the background | No | | EDIT_MODE_PRODUCT_IMAGE | Product reference styling | No |

Use imagen-3.0-capability-001 (default) for all of the above. The legacy imagen-3.0-generate-002 only supports EDIT_MODE_DEFAULT and does not accept a mask.

When to use which "image + text → image" tool

| Need | Use | |---|---| | Mask-based inpaint/outpaint/BG-swap with pixel precision | tool_edit_image (Imagen Capability) | | "Make it look like X" / style transfer / scene rewriting / multi-reference compositions | tool_transform_image (Gemini multimodal) |

Model tiers

Most tools accept a model_tier parameter:

| Tier | Description | |---|---| | fast (default) | Lowest latency, lowest cost | | balanced | Quality / speed trade-off; routes to Gemini for image generation. Not supported for tool_batch_generate | | quality | Higher quality, moderate latency | | ultra | Maximum quality (Imagen 4 Ultra / Veo quality models) |

Model resolution by tier and tool

| Tier | tool_generate_image | tool_transform_image | tool_generate_video | |---|---|---|---| | fast | imagen-4.0-fast-generate-001 | gemini-2.5-flash-image ¹ | veo-3.1-fast-generate-001 | | balanced | gemini-2.5-flash-image ¹ | gemini-2.5-flash-image ¹ | veo-3.1-fast-generate-001 | | quality | imagen-4.0-generate-001 | gemini-2.5-pro-image | veo-3.1-generate-001 | | ultra | imagen-4.0-ultra-generate-001 | gemini-2.5-pro-image | veo-3.1-generate-001 |

GenAI SDK tools (tool_live_generate, tool_analyze_video, tool_generate_speech, tool_embed):

| Tier | tool_live_generate / tool_analyze_video | tool_generate_speech | tool_embed | |---|---|---|---| | fast | gemini-3.5-flash ¹ | gemini-2.5-flash-preview-tts | gemini-embedding-2 / text-embedding-004 ² | | quality | gemini-2.5-pro | gemini-2.5-pro-preview-tts | — |

¹ Generally Available as of May 2026 per Google Models docs. ² gemini-embedding-2 is used when GOOGLE_CLOUD_API_KEY is set; text-embedding-004 is used with Vertex AI ADC.

Preview models available but not set as defaults: gemini-3.1-pro, gemini-3.1-flash-image, gemini-3-flash, gemini-3-pro-image. Pass them via the model / model_name override parameter. See full model list for the latest.

Output formats

tool_generate_image, tool_edit_image, tool_transform_image, and tool_upscale_image accept a save_format / output_format parameter:

| Format | Notes | |---|---| | PNG (default) | Lossless | | JPEG | Smaller files, lossy. compression_quality (0-100, default 85) applies only to JPEG | | WEBP | Modern lossless/lossy, wide browser support | | AVIF | Best compression, requires Pillow>=10 |

Error handling

All tools return a uniform error shape:

{
  "success": false,
  "error": {
    "code": 404,
    "model": "gemini-9.9-nonexistent",
    "endpoint": ":generateContent",
    "message": "Publisher Model `...` is not found.",
    "hint": "Model '...' not found in project '...' / location '...'. Try: gemini-2.5-flash-image.",
    "docs_url": "https://docs.cloud.google.com/...",
    "log_path": ".../logs/vertex_ai_mcp.log",
    "duration_s": 0.42
  }
}

| HTTP code | What you'll see in error.hint | |---|---| | 400 | Vertex's parameter-validation message verbatim | | 401 | "Run gcloud auth application-default login and retry." | | 403 | IAM role hint (roles/aiplatform.user) + Vertex AI API enablement check | | 404 | Live alternatives from the probe cache (tool_list_available_models) | | 429 | Retry after N (from Retry-After header) + quota-increase pointer | | 500/502/503/504 | "Safe to retry once" | | TIMEOUT | After 90s — suggests a -fast- variant | | VALIDATION | Client-side validation failure (mask missing, file not found, etc.); no HTTP call is made |

Full request/response logs are written to logs/vertex_ai_mcp.log.

Resources & Prompts

Local Resources (local://outputs/{filename}): Generated and processed media files are exposed as MCP resources for seamless display in MCP clients (Claude Desktop, Cursor, etc.).
Pre-built Prompts: Includes specialized prompt templates for character_design, logo_concept, and UI_UX_mockup.

Prerequisites & Resources

Python 3.9 or newer
Google Cloud Account with an active project
Vertex AI API enabled in your project
Google Cloud CLI (gcloud) installed and configured

For GenAI SDK tools (tool_embed, tool_analyze_video, tool_generate_speech, tool_live_generate, tool_generate_music), you additionally need one of:

A Cloud API Key (GOOGLE_CLOUD_API_KEY) — created in GCP Console, uses your existing GCP billing. Enables newer models like gemini-3.5-flash. (Recommended)
A Gemini API key (GOOGLE_GENAI_API_KEY) — from AI Studio, separate billing.
Vertex AI ADC — set GOOGLE_GENAI_BACKEND=vertexai, uses gcloud auth application-default login.

Cloud API Key setup: Run gcloud services enable generativelanguage.googleapis.com apikeys.googleapis.com then gcloud services api-keys create --display-name="GeminiKey". The key string appears in the command output — copy it directly to .env.

Installation & Setup

Option A: Install from PyPI

pip install open-google-image-generator-mcp

Option B: Clone the Repository

git clone https://github.com/miracorhan/OpenGoogleImageGeneratorMCP.git
cd OpenGoogleImageGeneratorMCP
pip install -r requirements.txt

Authentication (Critical Step)

The server uses Google Cloud Application Default Credentials (ADC):

gcloud auth application-default login

This opens a browser for login. Use an account with access to your Google Cloud project.

Environment Configuration

Create a .env file in the project root:

# Required
GOOGLE_CLOUD_PROJECT=your-google-cloud-project-id
GOOGLE_CLOUD_LOCATION=us-central1

# Output directory for generated media
DEFAULT_OUTPUT_DIR=./outputs

# --- GenAI SDK (for embed, speech, live, music, video-analysis tools) ---
# Option A: Cloud API Key (recommended — uses GCP billing, enables gemini-3.5-flash)
# Create: gcloud services enable generativelanguage.googleapis.com apikeys.googleapis.com
#         gcloud services api-keys create --display-name="GeminiKey"
GOOGLE_CLOUD_API_KEY=AIza...

# Option B: Gemini API key (AI Studio, separate billing, free tier available)
# GOOGLE_GENAI_BACKEND=gemini_api
# GOOGLE_GENAI_API_KEY=AIza...

# Option C: Vertex AI ADC (no key needed, uses gcloud auth application-default login)
# GOOGLE_GENAI_BACKEND=vertexai

# --- Advanced Vertex AI Authentication (Optional) ---
# Direct OAuth 2.0 Access Token
# GOOGLE_ACCESS_TOKEN=ya29.a0AfB_by...

# Service Account Impersonation
# IMPERSONATE_SERVICE_ACCOUNT=your-service-account@your-project.iam.gserviceaccount.com

Usage

Running as a Standalone Script

python mcp_server.py

Integrating with MCP Clients

For Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "OpenGoogleImageGenerator": {
      "command": "python",
      "args": ["/absolute/path/to/OpenGoogleImageGeneratorMCP/mcp_server.py"],
      "env": {
        "GOOGLE_CLOUD_PROJECT": "your-google-cloud-project-id",
        "GOOGLE_CLOUD_LOCATION": "us-central1",
        "GOOGLE_GENAI_API_KEY": "AIza..."
      }
    }
  }
}

Replace /absolute/path/to/your/... with the actual path, and use the correct Python executable if using a virtual environment.

Example prompts

"Generate an image of a futuristic city at sunset."
"Edit this banner — add a glowing cyan halo around the logo." (tool_edit_image, EDIT_MODE_DEFAULT)
"Transform this photo into a hand-drawn pencil sketch." (tool_transform_image)
"Remove the background from the image I just generated."
"Analyze this image and tell me what objects are present."
"Generate 8 product shots in parallel with different backgrounds." (tool_batch_generate)
"Run a pipeline: generate → remove background → upscale." (tool_run_pipeline)
"Convert this text to speech using the Kore voice." (tool_generate_speech)
"Generate a 30-second ambient music track." (tool_generate_music)
"Embed this sentence for semantic search." (tool_embed)
"Animate this product photo into a 5-second video." (tool_image_to_video)
"Generate a video of a sunset with audio." (tool_generate_video, audio_enabled=true)

Troubleshooting

| Symptom | Cause | Fix | |---------|-------|-----| | RefreshError / "Reauthentication needed" | ADC token expired | gcloud auth application-default login | | available: {} from tool_list_available_models | Same ADC expiry, probe skipped | Re-authenticate (above) | | 401 UNAUTHENTICATED with GOOGLE_CLOUD_API_KEY | Key used with Vertex AI endpoint (rejects keys) | Ensure GOOGLE_GENAI_BACKEND is not set to vertexai when using a Cloud API key | | 403 PERMISSION_DENIED — "Gemini API has not been used" | generativelanguage.googleapis.com not enabled | gcloud services enable generativelanguage.googleapis.com | | 403 PERMISSION_DENIED — "API Keys API disabled" | apikeys.googleapis.com blocked by org policy | gcloud services enable apikeys.googleapis.com | | 404 NOT_FOUND for text-embedding-004 | Vertex AI embedding model not available on Gemini API endpoint | Set GOOGLE_CLOUD_API_KEY (auto-selects gemini-embedding-2) or use ADC | | 404 NOT_FOUND for a model | Model not available in your project/region | Run tool_list_available_models and pick from available list | | gcloud api-keys command not found | Wrong command prefix | Use gcloud services api-keys list/create/delete |

Security: The Cloud API Key string appears in plaintext in gcloud services api-keys create output. Copy it directly to .env — never paste it into a chat, commit, or log. Revoke a leaked key immediately with gcloud services api-keys delete KEY_ID.

Author & License

Developer: Mirac Orhan (mirac.orhan@gmail.com)
License: MIT License (Open Source — Free for everyone to use, modify, and distribute)

MCP Servers

Open Google Image Generator MCP

Features & Tools

Image Tools

Video Tools

Audio Tools

GenAI SDK Tools

Utility Tools

Edit modes (`tool_edit_image`)

When to use which "image + text → image" tool

Model tiers

Model resolution by tier and tool

Output formats

Error handling

Resources & Prompts

Prerequisites & Resources

Installation & Setup

Option A: Install from PyPI

Option B: Clone the Repository

Authentication (Critical Step)

Environment Configuration

Usage

Running as a Standalone Script

Integrating with MCP Clients

Example prompts

Troubleshooting

Author & License

安装包（如果需要）

Cursor 配置 (mcp.json)

Open Google Image Generator MCP

Features & Tools

Image Tools

Video Tools

Audio Tools

GenAI SDK Tools

Utility Tools

Edit modes (tool_edit_image)

When to use which "image + text → image" tool

Model tiers

Model resolution by tier and tool

Output formats

Error handling

Resources & Prompts

Prerequisites & Resources

Installation & Setup

Option A: Install from PyPI

Option B: Clone the Repository

Authentication (Critical Step)

Environment Configuration

Usage

Running as a Standalone Script

Integrating with MCP Clients

Example prompts

Troubleshooting

Author & License

安装包 （如果需要）

Cursor 配置 (mcp.json)

Edit modes (`tool_edit_image`)

安装包（如果需要）