Multi-platform UI/mobile automation MCP server + 4 shipped skills for AI coding agents (Claude Code, Cursor, Codex, Gemini). Web via Playwright; mobile (iOS/Android) via Appium scaffolded.
rolepod-uiproof
rolepod-uiproof gives Claude Code, Cursor, Codex CLI, and Gemini CLI a real browser/mobile driver — so the AI can actually click through your UI, audit accessibility, diff screenshots, and scaffold e2e tests instead of guessing.
One MCP server, one tool surface, four skills you invoke from chat. Web is production-ready via Playwright; iOS and Android use Appium (same client as alumnium — needs a local Appium daemon + simulator/emulator, or a real device). No internal LLM — your Lead agent drives every action.
What it helps with
- Verify a UI change in seconds.
/verify-uiopens a real browser, runs your steps, checks your assertions, saves a screenshot + replay bundle. - Catch a11y regressions before merge.
/audit-a11yruns axe-core against WCAG-A / AA / AAA and returns issues grouped by severity, with WCAG references and fix links. - Lock down the visual contract.
/visual-diffcaptures a screenshot and compares against a named baseline under./.rolepod-uiproof/baselines/. First call seeds; subsequent calls diff. - Turn an interactive verify run into a real test file.
/scaffold-e2etranscribes a replay bundle into Playwright Test, Vitest+Playwright, or pytest+selenium. - Reproduce + minimize a bug deterministically.
/verify-uiwithmode: "reproduce"runs ddmin step-elimination to find the shortest still-reproducing sequence.
The four skills
| Skill | Wraps | What it does |
|---|---|---|
| /verify-ui | rolepod_verify_ui_flow | Drive a session through steps, evaluate assertions, save evidence + replay bundle. mode: assert (default) or reproduce with optional ddmin minimization. |
| /audit-a11y | rolepod_audit_a11y | axe-core audit at WCAG-A / AA / AAA. scope: "page" or scope: { ref }. Markdown or JSON report. |
| /visual-diff | rolepod_visual_diff | Pixel diff against a named baseline. Auto-seeds on first call. Configurable threshold + pixelmatch sensitivity. |
| /scaffold-e2e | rolepod_scaffold_e2e | Generate a runnable test file from a scenario + optional replay bundle. Three target frameworks. |
Every skill is single-backend (D-024) — it calls the rolepod-uiproof server and only the rolepod-uiproof server. If the server is unavailable, the skill fails with a clear diagnostic. Multi-backend routing belongs in the parent rolepod plugin's phase skills, not here.
Install
Pick your CLI. All install paths share the same MCP server (@rolepod/uiproof on npm) and the same skill set.
Claude Code (recommended)
# Install
claude plugin marketplace add nuttaruj/rolepod-uiproof
claude plugin install rolepod-uiproof@rolepod-uiproof
# Update
claude plugin marketplace update rolepod-uiproof
claude plugin install rolepod-uiproof@rolepod-uiproof
# Uninstall
claude plugin uninstall rolepod-uiproof@rolepod-uiproof
claude plugin marketplace remove rolepod-uiproof
The plugin auto-registers the four /verify-ui / /audit-a11y / /visual-diff / /scaffold-e2e skills AND spawns the MCP server (npx -y @rolepod/uiproof) on session start.
Cursor IDE
Cursor's plugin marketplace is enterprise-only (Free / Pro plans cannot install marketplace plugins). For everyone else, drop the workspace MCP config:
# Per project — copy from this repo, or run:
mkdir -p .cursor
curl -fsSL https://raw.githubusercontent.com/nuttaruj/rolepod-uiproof/main/.cursor/mcp.json -o .cursor/mcp.json
# Or global (across every project)
mkdir -p ~/.cursor
curl -fsSL https://raw.githubusercontent.com/nuttaruj/rolepod-uiproof/main/.cursor/mcp.json -o ~/.cursor/mcp.json
Then fully restart Cursor — MCP servers load only at startup. Verify under Settings → MCP.
Skills are not auto-registered under Cursor (no unified plugin format for skills + MCP in one). The MCP tools are still available; invoke them by name in chat (Use rolepod_verify_ui_flow to …).
Teams / Enterprise: add
https://github.com/nuttaruj/rolepod-uiproofas a team marketplace under Settings → Plugins for one-click install with skills auto-registered.
Codex CLI
# Install
codex plugin marketplace add nuttaruj/rolepod-uiproof
codex plugin install rolepod-uiproof@rolepod-uiproof
# Update
codex plugin marketplace upgrade rolepod-uiproof
codex plugin install rolepod-uiproof@rolepod-uiproof
Codex reads the plugin from .agents/plugins/marketplace.json + .codex-plugin/plugin.json in this repo. Skills install to ~/.codex/skills/ (Codex's plugin loader handles registration).
Gemini CLI
Not yet shipped. The Gemini extension format is not yet stable enough to commit to; we plan to add gemini-extension.json in v0.4. Track issue #N if you need it.
Direct npm (any MCP-aware tool)
Use this when your tool reads a standard mcpServers config (most non-CLI MCP clients):
{
"mcpServers": {
"rolepod-uiproof": {
"command": "npx",
"args": ["-y", "@rolepod/uiproof"]
}
}
}
15 MCP tools (rolepod_browser_* + rolepod_verify_ui_flow + 4 composites) will appear in your client. Skills are not surfaced via this path — call the tools by name.
Quick start
After install, in your Claude Code / Cursor / Codex session:
/verify-ui https://example.com
steps: []
expect: text_visible "Example Domain", text_visible "Learn more"
Returns a run_id, passed: true, and a path under ./.rolepod-uiproof/artifacts/verify_<run_id>/:
.rolepod-uiproof/artifacts/verify_20260524T101512_a1b2c3d4/
├── final.png screenshot at end of run
└── replay.json replay bundle — re-runnable via `npx rolepod-uiproof replay …`
Convert that to a Playwright Test file:
/scaffold-e2e from .rolepod-uiproof/artifacts/verify_…/replay.json using playwright-test
Verify your setup
npx rolepod-uiproof doctor
✓ Node ≥20 24.14.0
✓ Playwright Chromium installed ~/Library/Caches/ms-playwright
✓ webdriverio (mobile client, v0.3)
• Appium server (roadmap v0.3) Not reachable at http://127.0.0.1:4723/status
✓ Xcode (iOS, roadmap v0.3) /Applications/Xcode.app
• Android SDK (roadmap v0.3) Set ANDROID_HOME — needed only for Android
• SeleniumEngine (roadmap v0.4) Not implemented — deferred to v0.4
✓ Artifact root writable
✓ = ready · • = optional / deferred · ✗ = blocker.
What's inside
- 15 MCP tools — 10 atomic browser/mobile primitives (
browser_open,_close,_snapshot,_click,_type,_key,_scroll,_wait_for,_screenshot,_navigate) + 5 composites (verify_ui_flow,audit_a11y,visual_diff,scaffold_e2e,extract_ui_state). All prefixedrolepod_*to namespace away from other MCP servers. - 2 engines behind one interface —
PlaywrightEnginefor web (Chromium / Firefox / WebKit),AppiumEnginefor iOS XCUITest + Android UIAutomator2. The Lead sees one unifiedA11yNodeshape regardless of platform. - Stable refs with explicit invalidation (D-010) — every state-changing call invalidates prior refs; the engine returns a structured
stale_referror if you try to reuse one. No silent locator drift. - Replay bundles — every
/verify-uirun writes a JSON replay you can re-run later withnpx rolepod-uiproof replay <bundle.json>, agent-free. - No internal LLM (D-004) — your Lead agent makes every decision. We don't double-bill you for inference.
Use with parent rolepod
If you also use rolepod (the markdown plugin), its check-work, debug-issue, and review-code skills auto-route to /verify-ui, /audit-a11y, and /visual-diff when the rolepod-uiproof server is present. Nothing breaks if it isn't — parent falls back to Playwright MCP / Chrome DevTools MCP / manual verification.
The two are independent: install rolepod-uiproof standalone and get a complete experience via slash commands, or install both together and let parent's phase router pick the right backend automatically.
Docs
- docs/sessions.md — session lifecycle, stale-ref semantics, multi-session
- docs/artifacts.md —
.rolepod-uiproof/layout, run_id convention, replay bundle format - docs/recipes/ —
verify-a-checkout-flow,audit-a11y-during-review,visual-baseline-workflow - CHANGELOG.md — release history with per-version "Not yet verified" notes mapped to milestones
- CONTRIBUTING.md, SECURITY.md, CODE_OF_CONDUCT.md
MIT licensed — see LICENSE and THIRD_PARTY.md. Mobile AT normalizers are alumnium-inspired (UPSTREAM_TRACKING.md). Feedback + runtime reports for Cursor / Codex / Gemini install paths especially welcome via issues.