MCP server by rayss868
🎨 Vision Generator MCP
Local-first MCP server for image & video generation through OpenAI-compatible providers
Discover models automatically · generate media locally · save outputs explicitly · avoid context bloat
🚀 Why this exists
Most image/video APIs force clients to understand:
- different endpoints
- inconsistent payloads
- mixed sync/async behavior
- provider-specific output handling
- confusing model lists with text-only models mixed in
Vision Generator MCP gives you one local MCP layer that:
- auto-discovers models from the provider
- filters only image/video-capable models
- normalizes generation flows
- writes outputs to a folder you choose
- keeps chat context clean by not returning huge base64 images
✨ Highlights
| Feature | What you get |
|---|---|
| Auto model discovery | Uses GET /models at runtime |
| Vision-only filtering | Hides irrelevant text-only models via buildVisionModelRegistry() |
| Required output folder | Every generated asset has a clear final_path |
| Async video flow | Submit + poll video jobs cleanly |
| Local-first workflow | Great for desktop / VS Code / Claude workflows |
| Configurable timeouts | Provider and download timeouts are configurable from MCP settings |
| Modular architecture | Clean separation across src/providers/, src/core/, src/tools/, src/validation/ |
🧭 How it works
┌───────────────────────────────────────────────┐
│ MCP Client / Agent │
│ Claude / VS Code / Desktop / Local workflow │
└───────────────────────┬───────────────────────┘
│
│ MCP tools
▼
┌───────────────────────────────────────────────┐
│ Vision Generator MCP │
│-----------------------------------------------│
│ Tool handlers │
│ Validation │
│ Vision service │
│ Model discovery │
│ Capability filtering │
│ Output publishing │
└───────────────────────┬───────────────────────┘
│
│ Adapter abstraction
▼
┌───────────────────────────────────────────────┐
│ OpenAI-compatible provider adapter │
└───────────────────────┬───────────────────────┘
│
│ HTTP
▼
┌───────────────────────────────────────────────┐
│ Provider API │
│ /models │
│ /images/generations │
│ /images/edits │
│ /videos/generations │
└───────────────────────────────────────────────┘
🧱 Project structure
.
├─ README.md
├─ package.json
├─ tsconfig.json
├─ plans/
│ └─ mcp-image-video-architecture-plan.md
├─ src/
│ ├─ index.ts
│ ├─ config/
│ │ └─ providers.ts
│ ├─ core/
│ │ ├─ errors.ts
│ │ ├─ file-output-publisher.ts
│ │ ├─ model-discovery.ts
│ │ └─ vision-service.ts
│ ├─ providers/
│ │ ├─ base-provider.ts
│ │ ├─ openai-compatible.adapter.ts
│ │ └─ provider-factory.ts
│ ├─ tools/
│ │ ├─ animate-image.ts
│ │ ├─ edit-image.ts
│ │ ├─ generate-image.ts
│ │ ├─ generate-video.ts
│ │ ├─ get-job-status.ts
│ │ ├─ get-model-capabilities.ts
│ │ └─ list-models.ts
│ ├─ types/
│ │ └─ contracts.ts
│ ├─ utils/
│ │ ├─ mime.ts
│ │ └─ path.ts
│ └─ validation/
│ ├─ common.ts
│ ├─ image.ts
│ ├─ job.ts
│ ├─ output.ts
│ ├─ schemas.ts
│ └─ video.ts
└─ outputs/
✅ Current supported workflow
Image
- text-to-image via
generateImage() - image editing via
editImage()
Video
- text-to-video via
generateVideo() - image-to-video via
animateImage() - async polling via
getJobStatus()
Discovery
- runtime model discovery via
listModels() - vision filtering via
buildVisionModelRegistry()
📦 Installation
Requirements
- Node.js 18+
- npm
- an OpenAI-compatible provider endpoint
Install from npm
npm install -g vision-generator-mcp
Run installed binary
vision-generator-mcp
Local development install
npm install
Type-check
npm run check
Build
npm run build
Publishable package notes
- CLI entry is exposed via
bin - installable package files are limited via
files - build runs automatically before publish/install from source via
prepare
⚙️ MCP settings
This server reads configuration from MCP settings using:
PROVIDER_BASE_URLPROVIDER_API_KEYPROVIDER_TIMEOUT_MSDOWNLOAD_TIMEOUT_MS
Example configuration:
{
"mcpServers": {
"vision-generator": {
"command": "node",
"args": [
"d:/All_project/own/AI_Coder/Native Tools/vision-generator/build/index.js"
],
"disabled": false,
"timeout": 600,
"alwaysAllow": [],
"disabledTools": [],
"env": {
"PROVIDER_BASE_URL": "https://ai.rayzs.qzz.io/v1",
"PROVIDER_API_KEY": "your-api-key-1",
"PROVIDER_TIMEOUT_MS": "300000",
"DOWNLOAD_TIMEOUT_MS": "300000"
}
}
}
}
Timeout layers
| Timeout | Scope |
|---|---|
| timeout in MCP settings | How long the MCP host waits for the server tool call |
| PROVIDER_TIMEOUT_MS | Timeout for provider API requests |
| DOWNLOAD_TIMEOUT_MS | Timeout for binary asset download |
Current provider timeout config is loaded in loadProviderConfig() and applied in OpenAICompatibleAdapter.
📁 Output strategy
output.directoryis required for image and video tools.
Recommended folders:
outputs/d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs
Why this is better:
- every result has a clear location
- no hidden temp output behavior
- no base64 image spam in chat context
- much better for GitHub-friendly, local-first workflows
🛠️ Tool reference
list_models
Discover image/video-capable models.
Example output
{
"provider": "https://ai.rayzs.qzz.io/v1",
"models": [
{
"id": "gpt-image-2",
"operations": {
"image_generation": true,
"image_editing": true,
"image_variation": false,
"text_to_video": false,
"image_to_video": false
}
}
]
}
get_model_capabilities
Inspect a discovered model.
Example input
{
"model": "gpt-image-2"
}
generate_image
Generate an image and write it to your chosen folder.
Example input
{
"model": "gpt-image-2",
"prompt": "A futuristic Jakarta skyline at sunset, cinematic lighting",
"aspect_ratio": "16:9",
"resolution": "1536x1024",
"output": {
"directory": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs",
"filename_prefix": "jakarta-future-city",
"create_directory": true
}
}
Example output
{
"status": "succeeded",
"provider": "https://ai.rayzs.qzz.io/v1",
"model": "gpt-image-2",
"operation": "image_generation",
"outputs": [
{
"type": "image",
"mime_type": "image/png",
"final_path": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs/jakarta-future-city_2026-05-19T01-00-00-000Z.png",
"width": 1536,
"height": 1024
}
]
}
edit_image
Edit a local image and save the output.
Example input
{
"model": "gpt-image-2",
"prompt": "Replace the background with a neon cyberpunk street",
"image_path": "d:/assets/input.png",
"output": {
"directory": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs",
"filename_prefix": "edited-scene",
"create_directory": true
}
}
generate_video
Submit an async text-to-video job.
Example input
{
"model": "your-video-model",
"prompt": "A cinematic aerial shot flying over a futuristic city",
"duration_seconds": 5,
"fps": 24,
"output": {
"directory": "d:/All_project/own/AI_Coder/Native Tools/vision-generator/outputs",
"filename_prefix": "future-city-video",
"create_directory": true
}
}
Example submit result
{
"status": "submitted",
"provider": "https://ai.rayzs.qzz.io/v1",
"model": "your-video-model",
"operation": "text_to_video",
"job_id": "video_...",
"provider_job_id": "provider_...",
"outputs": []
}
animate_image
Submit an async image-to-video job.
get_job_status
Poll video job status until the final file is downloaded and written to your chosen folder.
🧩 Implementation map
| Concern | Entry point |
|---|---|
| Composition root | src/index.ts |
| Main orchestration | src/core/vision-service.ts |
| Provider contract | src/providers/base-provider.ts |
| OpenAI-compatible provider | src/providers/openai-compatible.adapter.ts |
| Adapter selection | src/providers/provider-factory.ts |
| Model discovery | src/core/model-discovery.ts |
| File output | src/core/file-output-publisher.ts |
| Validation layer | src/validation/ |
| Tool handlers | src/tools/ |
| Utilities | src/utils/ |
🔮 Future provider list
Easiest next additions
- more OpenAI-compatible gateways
- provider-specific quirks layer
Best next adapters
Later / higher-effort adapters
These are roadmap targets, not currently implemented files.
🧪 Development workflow
npm install
npm run check
npm run build
After changing MCP settings or rebuilding:
- reload the MCP runtime / extension
- start a fresh session if needed
✅ Project status
- local MCP server implemented
- OpenAI-compatible provider adapter implemented
- modular structure aligned with the plan
- explicit output directory required
- configurable provider/download timeout support added
- no image base64 context bloat
- build verified
- ready for runtime MCP usage after MCP reload