MCP Server Evaluations Skill

API-First for AI.
Systematically evaluate MCP servers for correctness, error handling, and response quality using curl + jq (and optional Bun/Node).

Website · Docs · MCP Servers · Run MCP · CLI Tool

🪝 Why this exists

MCP quality is measurable. This skill provides a repeatable evaluation workflow to ensure:

Tools are discoverable and correctly described
Tool calls work with valid inputs
Errors are actionable for invalid inputs
Answers are reliable under realistic questions
Performance stays acceptable

⚡ What you’ll find here

✅ Skill-aligned workflow

Environment verification (MCP server health + ping)
Tool discovery and completeness checks
Functional testing per tool
Question-based evaluation
Scoring rubric and pass threshold

✅ Minimal dependencies

curl and jq required
Optional: bun or node for local automation

✅ Reference guides

Inspector usage
Evaluation criteria
Question templates

🚀 Quickstart (Skill Summary)

Requirements

curl
jq
Optional: bun or node (v22+)

1) Verify MCP server health

curl -s http://localhost:3030/health
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"ping"}'

2) List available tools

curl -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}'

3) Call a tool (basic function test)

curl -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {"name": "<tool_name>", "arguments": {<valid_arguments>}},
    "id": 2
  }'

4) Trigger an error (quality check)

curl -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {"name": "<tool_name>", "arguments": {}},
    "id": 3
  }'

✅ Evaluation Checklist (Fast Pass)

# Health check
curl -s http://localhost:3030/health | grep -q "" && echo "✓ Health OK" || echo "✗ Health FAILED"

# MCP ping
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"ping"}' | jq -e '.jsonrpc == "2.0" and .result' > /dev/null && echo "✓ Ping OK" || echo "✗ Ping FAILED"

# Tools list
curl -s -X POST http://localhost:3030/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","method":"tools/list","id":1}' | jq '.result.tools | length' | xargs -I {} echo "✓ {} tools discovered"

📊 Scoring rubric

| Category | Weight | Criteria | |----------|--------|----------| | Tool Discovery | 20% | All operations exposed, proper naming | | Basic Functionality | 30% | Valid inputs return correct responses | | Error Handling | 20% | Actionable errors, missing args reported | | Question Accuracy | 20% | Test questions answered correctly | | Performance | 10% | Responses < 5s for standard ops |

Pass threshold: 80% overall score

🧪 Question templates

Use these to generate test prompts:

List/Query: "Show me all [resources] that match [criteria]"
Get Details: "What are the details of [resource] with ID [id]?"
Create: "Create a new [resource] with [properties]"
Update: "Update [resource] [id] to change [field] to [value]"
Delete: "Remove [resource] with ID [id]"
Aggregate: "How many [resources] exist with [status]?"
Search: "Find [resources] where [field] contains [term]"
Workflow: "Create a [resource], then update it, then list all"

MCP Servers