Petamind MCP: Claude Code MCP server for multi-candidate patch/test + mandatory vision
Petamind MCP
A Claude Code MCP server for a multi-candidate agentic coding loop: reasoner plan → generate patches → deterministic gates → mandatory vision scoring → pick the best winner.
Poetiq-style refinement loop (descriptive, not affiliated): This project uses “Poetiq-style” descriptively to refer to iterative refinement loops (generate → critique → refine → verify). It is not affiliated with Poetiq.
Setup guide: docs/MCP_PETAMIND_MCP.md.
Vertex setup: docs/VERTEX_SETUP.md.
Troubleshooting: docs/TROUBLESHOOTING.md.
MCP Quick Start (Claude Code)
Option A (recommended): install from PyPI via pipx
pipx install petamind-mcp
petamind-setup
Then add the MCP server to Claude Code (user scope):
claude mcp add-json --scope user petamind-mcp '{"command":"petamind-mcp","args":[]}'
Notes:
petamind-setupinstalls Playwright Chromium (required for the mandatory vision loop).- You do not need Google Cloud credentials to use
petamind_eval_patchwithvision_provider=client(default).
Option B: install from a git clone (contributors / hacking)
From this repo root:
./scripts/setup.sh
Then follow docs/MCP_PETAMIND_MCP.md to add the server to Claude Code via .mcp.json or claude mcp add-json.
Minimal Claude Code config (user scope)
claude mcp add-json --scope user petamind-mcp '{
"command": "'"$(pwd)"'/.venv/bin/python",
"args": ["-m", "petamind_mcp.mcp_server"]
}'
Included: Synthetic UI Dataset Factory
This repo also includes a production-grade synthetic dataset generator for UI/UX design tasks (landing pages, directories, dashboards) using Next.js App Router + TypeScript + Tailwind.
Features
- Multi-model pipeline: Uses Vertex AI (DeepSeek, Kimi, MiniMax) and OpenRouter (Devstral, vision models)
- Quality gating: Only winners pass through to training data (build success + vision score threshold)
- Resumable: SQLite caching for model responses, task state persistence
- Two output tracks:
public/(publishable models only) andprivate/(all models) - No contamination: Chain-of-thought/thinking never stored; only structured specs + code
Claude Code MCP (agentic coding)
This repo also ships an MCP server (petamind-mcp) that exposes a multi-candidate
patch/test/vision loop to Claude Code. Setup guide: docs/MCP_PETAMIND_MCP.md.
Quick Start
1. Environment Setup
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -e .
# Or with uv (faster)
uv pip install -e .
# Install Playwright browsers
playwright install chromium
2. Configure Environment Variables
cp .env.example .env
# Edit .env with your credentials
Required:
GOOGLE_CLOUD_PROJECT: Your GCP project IDGOOGLE_CLOUD_REGION: Region for Vertex AI (e.g.,us-central1)OPENROUTER_API_KEY: Your OpenRouter API key
Optional:
GCS_BUCKET: For cloud backup of outputs
3. Authenticate with Google Cloud
gcloud auth application-default login
4. Run
# Smoke test (3 tasks end-to-end)
make smoke
# Full run (public models only)
make run_public
# Full run (all models including private)
make run_private
# Resume a previous run
titan-factory run --resume <run_id>
# Export training data
make export RUN_ID=<run_id>
Configuration
Edit config/config.yaml to customize:
models:
planner:
provider: vertex
model: deepseek-ai/deepseek-v3.2-maas
publishable: true
ui_generators:
- provider: vertex
model: moonshotai/kimi-k2-thinking-maas
publishable: true
variants: 2
- provider: vertex
model: minimaxai/minimax-m2-maas
publishable: true
variants: 2
patcher:
provider: openrouter
model: mistralai/devstral-2512:free
publishable: true
vision_judge:
provider: openrouter
model: null # Falls back to heuristic scorer
publishable: false
pipeline:
vision_score_threshold: 8.0
max_fix_rounds: 2
polish_loop_enabled: true
tasks_per_niche: 7
budget:
concurrency_vertex: 5
concurrency_openrouter: 10
requests_per_min_vertex: 60
requests_per_min_openrouter: 100
max_total_tasks: null # Run all
stop_after_usd: null # No limit
export:
holdout_niches: 12
validation_split: 0.08
Pipeline Stages
- Niche/Task Generation: Creates 100 niches × 7 tasks = 700+ tasks
- Planning: DeepSeek generates UI_SPEC JSON for each task
- UI Generation: Kimi + MiniMax generate code candidates (2 variants each)
- Validation: Next.js build with Devstral-powered fix loops
- Rendering: Playwright captures screenshots at 3 viewport sizes
- Scoring: Vision judge (or heuristic fallback) scores candidates
- Selection: Best candidate per task selected for training
- Export: Winners exported to train.jsonl / valid.jsonl
Output Structure
out/<run_id>/
├── cache.db # SQLite response cache
├── manifest.db # Task state tracking
├── prompts/
│ ├── niches.json
│ └── tasks.jsonl
├── renders/
│ └── <task_id>/
│ └── <candidate_id>/
│ ├── 375x812.png
│ ├── 768x1024.png
│ └── 1440x900.png
├── rich_records.jsonl # All candidates (for audit)
├── selected_records.jsonl # Winners only
├── public/
│ ├── train.jsonl
│ └── valid.jsonl
└── private/
├── train.jsonl
└── valid.jsonl
Training Data Format
Each line in train.jsonl:
{
"messages": [
{"role": "system", "content": "You are Titan 4 Design..."},
{"role": "user", "content": "<task prompt>"},
{"role": "assistant", "content": "{\"ui_spec\": ..., \"files\": [...]}"}
]
}
Page Types Covered
landing: Marketing landing pagesdirectory_home: Directory homepage with searchcity_index: City-specific listing pagescategory_index: Category-specific listing pageslisting_profile: Individual listing detail pagesadmin_dashboard: Admin/analytics dashboardsedit: Refactor/edit tasks (20% of dataset)
Development
# Run tests
pytest tests/
# Type checking
mypy src/
# Format
ruff format src/ tests/
ruff check src/ tests/
Architecture Notes
- Provider abstraction: Clean interface for Vertex AI and OpenRouter
- Deterministic IDs: Tasks have stable IDs from hash(niche_id + page_type + seed)
- JSON strictness: Safe extraction with fallback parsing
- Async throughout: Uses asyncio for concurrent model calls
- No thinking storage: Only structured UI_SPEC and final code stored
License
MIT