MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

Reusable GitHub Action to run LLM-as-judge evaluations against any MCP server

Created 3/2/2026
Updated about 6 hours ago
Repository documentation and setup instructions

MCP Server Eval Action

A reusable GitHub Action to run LLM-as-judge evaluations against any MCP server.

Usage

- uses: mcp-use/eval-action@v1
  with:
    server_config: '{"command": "python", "args": ["-m", "my_mcp_server", "--transport", "stdio"]}'
    eval_cases: "evals/eval_cases.yaml"
    openrouter_api_key: ${{ secrets.OPENROUTER_API_KEY }}

Remote server

- uses: mcp-use/eval-action@v1
  with:
    server_config: '{"url": "https://my-server.example.com/mcp"}'
    eval_cases: "evals/eval_cases.yaml"
    openrouter_api_key: ${{ secrets.OPENROUTER_API_KEY }}

With environment variables for the server

- uses: mcp-use/eval-action@v1
  with:
    server_config: |
      {
        "command": "python",
        "args": ["-m", "my_mcp_server", "--transport", "stdio"],
        "env": {
          "MONGO_PASSWORD": "${{ secrets.MONGO_PASSWORD }}",
          "USE_FAKE_DB": "false"
        }
      }
    eval_cases: "evals/eval_cases.yaml"
    openrouter_api_key: ${{ secrets.OPENROUTER_API_KEY }}

Post results as PR comment

- uses: mcp-use/eval-action@v1
  id: evals
  with:
    server_config: '{"command": "python", "args": ["-m", "my_server", "--transport", "stdio"]}'
    eval_cases: "evals/eval_cases.yaml"
    openrouter_api_key: ${{ secrets.OPENROUTER_API_KEY }}

- uses: marocchino/sticky-pull-request-comment@v2
  if: always() && github.event_name == 'pull_request'
  with:
    header: mcp-evals
    path: ${{ steps.evals.outputs.report_md }}

- run: cat ${{ steps.evals.outputs.report_md }} >> "$GITHUB_STEP_SUMMARY"
  if: always()

Inputs

| Input | Required | Description | |-------|----------|-------------| | server_config | Yes | MCP server config as JSON ({"command": ...} or {"url": ...}) | | eval_cases | Yes | Path to eval_cases.yaml | | openrouter_api_key | Yes | OpenRouter API key for agent + judge LLM | | filter | No | Filter cases by id substring | | max_steps | No | Max agent steps per case (default: 30) |

Outputs

| Output | Description | |--------|-------------| | results_json | Path to eval results JSON file | | report_md | Path to markdown report file | | passed | true if all evals passed, false otherwise |

Eval cases YAML format

# Model used by the LLM judge to score responses
judge_model: openai/gpt-4o-mini

# Models to evaluate the agent with (OpenRouter format)
models:
  - anthropic/claude-sonnet-4
  - openai/gpt-4o-mini

# System prompts — each case runs once per prompt
system_prompts:
  neutral: "You are a helpful assistant."
  domain: "You are a domain expert. Use the available tools."

# Eval cases
cases:
  - id: my_test_case
    prompt: "Ask the agent something"
    rubric: |
      The response should contain relevant information.
      The response should be well-structured.
    threshold: 0.7