Evaluate MCP servers for quality and reliability. Verify tool functionality, test error handling, generate tests, and assess response quality with no dependencies other than curl. Use this when validating MCP server implementations, testing OpenAPI-to-MCP conversions, or assessing API tool quality.
MCP Server Evaluations Skill
API-First for AI.
Systematically evaluate MCP servers for correctness, error handling, and response quality using curl + jq (and optional Bun/Node).
Website · Docs · MCP Servers · Run MCP · CLI Tool
🪝 Why this exists
MCP quality is measurable. This skill provides a repeatable evaluation workflow to ensure:
- Tools are discoverable and correctly described
- Tool calls work with valid inputs
- Errors are actionable for invalid inputs
- Answers are reliable under realistic questions
- Performance stays acceptable
⚡ What you’ll find here
✅ Skill-aligned workflow
- Environment verification (MCP server health + ping)
- Tool discovery and completeness checks
- Functional testing per tool
- Question-based evaluation
- Scoring rubric and pass threshold
✅ Minimal dependencies
curlandjqrequired- Optional:
bunornodefor local automation
✅ Reference guides
- Inspector usage
- Evaluation criteria
- Question templates
🚀 Quickstart (Skill Summary)
Requirements
curljq- Optional:
bunornode(v22+)
1) Verify MCP server health
curl -s http://localhost:3030/health
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"ping"}'
2) List available tools
curl -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}'
3) Call a tool (basic function test)
curl -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {"name": "<tool_name>", "arguments": {<valid_arguments>}},
"id": 2
}'
4) Trigger an error (quality check)
curl -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {"name": "<tool_name>", "arguments": {}},
"id": 3
}'
✅ Evaluation Checklist (Fast Pass)
# Health check
curl -s http://localhost:3030/health | grep -q "" && echo "✓ Health OK" || echo "✗ Health FAILED"
# MCP ping
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"
# Tools list
curl -s -X POST http://localhost:3030/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools | length' | xargs -I {} echo "✓ {} tools discovered"
📊 Scoring rubric
| Category | Weight | Criteria | |----------|--------|----------| | Tool Discovery | 20% | All operations exposed, proper naming | | Basic Functionality | 30% | Valid inputs return correct responses | | Error Handling | 20% | Actionable errors, missing args reported | | Question Accuracy | 20% | Test questions answered correctly | | Performance | 10% | Responses < 5s for standard ops |
Pass threshold: 80% overall score
🧪 Question templates
Use these to generate test prompts:
- List/Query: "Show me all [resources] that match [criteria]"
- Get Details: "What are the details of [resource] with ID [id]?"
- Create: "Create a new [resource] with [properties]"
- Update: "Update [resource] [id] to change [field] to [value]"
- Delete: "Remove [resource] with ID [id]"
- Aggregate: "How many [resources] exist with [status]?"
- Search: "Find [resources] where [field] contains [term]"
- Workflow: "Create a [resource], then update it, then list all"
📚 References