MCP server by bcfmtolgahan
Prometheus MCP Server
A production-grade Model Context Protocol (MCP) server for Prometheus, enabling AI agents to query metrics, analyze alerts, and perform SRE operations.
Features
- 38 MCP Tools: Complete Prometheus API coverage plus intelligent analysis tools
- SRE Golden Signals: Built-in tools for error rate, latency, throughput, and saturation
- Intelligent Analysis: Anomaly detection, capacity forecasting, metric correlation
- PromQL Helpers: Query validation, explanation, suggestions, and optimization
- Alertmanager Integration: Manage alerts and silences
- Multiple Transports: stdio (Claude Desktop), HTTP/SSE (Kubernetes, AWS)
- Production Ready: Retries, connection pooling, timeouts, TLS support
- Kubernetes Native: Helm chart, Terraform module, HPA, PDB
Quick Start
Installation
pip install prometheus-mcp-server
Running with Claude Desktop
Add to your Claude Desktop configuration (~/.config/claude/claude_desktop_config.json):
{
"mcpServers": {
"prometheus": {
"command": "prometheus-mcp-server",
"args": ["--url", "http://localhost:9090"]
}
}
}
Running as HTTP Server
# Run with HTTP transport for Kubernetes/AWS deployment
prometheus-mcp-server --transport http --port 8000
# With Prometheus URL
export PROMETHEUS_MCP_URL=http://prometheus:9090
prometheus-mcp-server --transport http
Docker
docker run -p 8000:8000 \
-e PROMETHEUS_MCP_URL=http://prometheus:9090 \
ghcr.io/bcfmtolgahan/prometheus-mcp-server:latest
Available Tools (38 Total)
Core Query Tools (3)
| Tool | Description |
|------|-------------|
| prometheus_query | Execute instant PromQL query |
| prometheus_query_range | Execute range PromQL query over time period |
| prometheus_query_exemplars | Query exemplars for tracing correlation |
Alert & Rule Tools (2)
| Tool | Description |
|------|-------------|
| prometheus_get_alerts | Get active alerts (firing/pending) |
| prometheus_get_rules | Get alerting and recording rules |
Target Tools (2)
| Tool | Description |
|------|-------------|
| prometheus_get_targets | Get scrape targets and health status |
| prometheus_get_target_metadata | Get metadata about target metrics |
Metadata Tools (5)
| Tool | Description |
|------|-------------|
| prometheus_get_metric_names | List all metric names |
| prometheus_get_label_names | Get all label names |
| prometheus_get_label_values | Get values for a specific label |
| prometheus_get_metric_metadata | Get metric type, help, unit |
| prometheus_find_series | Find time series matching selectors |
Status & Health Tools (3)
| Tool | Description |
|------|-------------|
| prometheus_health_check | Check Prometheus health and readiness |
| prometheus_get_status | Get server status (config, flags, runtime, TSDB) |
| prometheus_get_alertmanagers | Get discovered Alertmanager instances |
SRE Golden Signals Tools (6)
| Tool | Description |
|------|-------------|
| prometheus_get_error_rate | Calculate error rate percentage |
| prometheus_get_latency_percentiles | Get p50, p90, p99 latencies |
| prometheus_get_throughput | Calculate requests per second |
| prometheus_get_saturation | Get CPU/memory/disk saturation |
| prometheus_calculate_sli | Calculate SLI and compare to SLO |
| prometheus_check_error_budget | Check error budget consumption |
Analysis Tools (5)
| Tool | Description |
|------|-------------|
| prometheus_analyze_service | Comprehensive service health analysis |
| prometheus_find_anomalies | Statistical anomaly detection |
| prometheus_compare_periods | Compare metrics between time periods |
| prometheus_capacity_forecast | Predict future capacity needs |
| prometheus_find_correlations | Find correlations between metrics |
PromQL Helper Tools (4)
| Tool | Description |
|------|-------------|
| prometheus_validate_query | Validate PromQL syntax |
| prometheus_explain_query | Explain what a query does |
| prometheus_suggest_query | Get query suggestions for use cases |
| prometheus_optimize_query | Get query optimization tips |
Alertmanager Tools (4)
| Tool | Description |
|------|-------------|
| alertmanager_get_alerts | Get alerts from Alertmanager |
| alertmanager_get_silences | List all silences |
| alertmanager_create_silence | Create a new silence |
| alertmanager_delete_silence | Delete an existing silence |
Dangerous/Admin Tools (4)
| Tool | Description |
|------|-------------|
| prometheus_delete_series | Delete time series data |
| prometheus_clean_tombstones | Clean deleted data from disk |
| prometheus_create_snapshot | Create TSDB snapshot |
| prometheus_reload_config | Reload Prometheus configuration |
Note: Dangerous tools are disabled by default. Enable with
--enable-dangerous-toolsflag.
Configuration
Environment Variables
# Prometheus Connection
PROMETHEUS_MCP_URL=http://localhost:9090
# Transport
PROMETHEUS_MCP_TRANSPORT=stdio # stdio, http, sse
PROMETHEUS_MCP_HOST=0.0.0.0
PROMETHEUS_MCP_PORT=8000
# Authentication
PROMETHEUS_MCP_AUTH_TYPE=none # none, basic, bearer
PROMETHEUS_MCP_AUTH_USERNAME=admin
PROMETHEUS_MCP_AUTH_PASSWORD=secret
# Features
PROMETHEUS_MCP_ENABLE_DANGEROUS_TOOLS=false
# Logging
PROMETHEUS_MCP_LOG_LEVEL=INFO
PROMETHEUS_MCP_LOG_FORMAT=json
CLI Options
prometheus-mcp-server --help
Options:
--url TEXT Prometheus server URL
--transport [stdio|http|sse] Transport type
--host TEXT Host for HTTP transport
--port INTEGER Port for HTTP transport
--enable-dangerous-tools Enable dangerous admin tools
--log-level TEXT Log level
Kubernetes Deployment
Using Helm
helm install prometheus-mcp ./deploy/helm/prometheus-mcp-server \
--set prometheus.url=http://prometheus-server:9090 \
--set replicaCount=2
Using Terraform
module "prometheus_mcp_server" {
source = "./modules/prometheus-mcp-server"
prometheus_url = "http://prometheus:9090"
namespace = "mcp-servers"
replica_count = 2
}
Example Usage
With Claude Desktop
Ask Claude:
- "What alerts are currently firing?"
- "Show me the CPU usage for the last hour"
- "What's the error rate for the api-gateway service?"
- "Find anomalies in memory usage"
- "Calculate the SLI for our checkout service"
Programmatic Usage
# Query current cluster health
result = await prometheus_query(query="up")
# Check error rate for a service
error_rate = await prometheus_get_error_rate(job="api-gateway", window="5m")
# Analyze service health
analysis = await prometheus_analyze_service(job="checkout-service")
# Get latency percentiles
latencies = await prometheus_get_latency_percentiles(job="api-gateway")
# Find anomalies
anomalies = await prometheus_find_anomalies(
query="rate(http_requests_total[5m])",
threshold_stddev=2.0
)
# Create a silence for maintenance
silence = await alertmanager_create_silence(
matchers="alertname=HighCPU,job=batch-processor",
duration="2h",
comment="Scheduled maintenance"
)
Architecture
┌─────────────────────────────────────────────────────────┐
│ MCP Client │
│ (Claude Desktop, AI Agent) │
└────────────────────────┬────────────────────────────────┘
│ MCP Protocol (stdio/HTTP)
▼
┌─────────────────────────────────────────────────────────┐
│ Prometheus MCP Server │
│ ┌──────────────────────────────────────────────────┐ │
│ │ 38 MCP Tools │ │
│ │ • Query & Metadata • SRE Golden Signals │ │
│ │ • Alerts & Rules • Analysis Tools │ │
│ │ • PromQL Helpers • Alertmanager │ │
│ └──────────────────────┬───────────────────────────┘ │
│ │ │
│ ┌──────────────────────▼───────────────────────────┐ │
│ │ Prometheus Client │ │
│ │ • Connection pooling • Retry logic │ │
│ │ • Authentication • TLS support │ │
│ └──────────────────────┬───────────────────────────┘ │
└─────────────────────────┼───────────────────────────────┘
│ HTTP/HTTPS
▼
┌─────────────────────────────────────────────────────────┐
│ Prometheus / Alertmanager │
└─────────────────────────────────────────────────────────┘
Development
Setup
git clone https://github.com/bcfmtolgahan/prometheus-mcp-server.git
cd prometheus-mcp-server
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
Running Tests
pytest # All tests
pytest --cov # With coverage
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests
Code Quality
ruff check . # Linting
ruff format . # Formatting
mypy src/ # Type checking
Contributing
Contributions are welcome! Please read our Contributing Guide.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Acknowledgments
- Model Context Protocol by Anthropic
- Prometheus by the Cloud Native Computing Foundation