MCP Servers

模型上下文协议服务器、框架、SDK 和模板的综合目录。

M
MCP Observability
作者 @zavora-ai

Observability MCP — logs, metrics, traces, alerts, incidents, SLOs, dashboards, runbooks for AI agents

创建于 5/25/2026
更新于 about 3 hours ago
Repository documentation and setup instructions

Observability MCP Server

Crates.io License ADK-Rust Enterprise Registry Ready

Full-stack observability for AI agents — logs, metrics, distributed traces, alerts, incidents, SLOs, dashboards, service maps, and runbooks. 28 tools for debugging, monitoring, and incident response.

Architecture

MCP Observability Architecture

Tools (28)

Logs (4)

| Tool | Purpose | |------|---------| | query_logs | Search logs by query, time, service, level | | get_log_stats | Log volume and error rate over time | | get_errors | Recent errors with stack traces | | tail_logs | Live tail (last 50 entries) |

Metrics (4)

| Tool | Purpose | |------|---------| | query_metric | Query metric time-series (CPU, latency, etc.) | | list_metrics | Available metrics for a service | | get_system_health | Current CPU/memory/disk across services | | compare_metrics | Compare metric across services or periods |

Traces (4)

| Tool | Purpose | |------|---------| | search_traces | Find traces by service, duration, status | | get_trace | Full trace with all spans and timings | | get_service_map | Service dependency graph with latencies | | get_latency_breakdown | p50/p95/p99 by operation |

Alerts (4)

| Tool | Purpose | |------|---------| | list_alerts | Active alerts (filter: status, severity, service) | | get_alert | Alert details + history + related metrics | | create_alert | Create alert rule (threshold/anomaly) | | acknowledge_alert | Ack a firing alert |

Incidents (4)

| Tool | Purpose | |------|---------| | list_incidents | Open/investigating/resolved incidents | | get_incident | Timeline, affected services, responders | | create_incident | Declare a new incident | | update_incident | Update status or add resolution |

SLOs (3)

| Tool | Purpose | |------|---------| | list_slos | SLOs with burn rate and error budget | | get_slo | SLO target vs current value | | forecast_slo | When will error budget run out? |

Dashboards & Runbooks (3)

| Tool | Purpose | |------|---------| | list_dashboards | Available dashboards | | get_dashboard | Dashboard with panels and values | | get_runbook | Find runbook for alert/service issue |

Services (2)

| Tool | Purpose | |------|---------| | list_services | All monitored services + health | | get_service | Service overview: health, deps, alerts, SLOs |

Installation

cargo install mcp-observability

Configuration

| Backend | Env Vars | Provides | |---------|----------|----------| | Datadog | DATADOG_API_KEY + DATADOG_APP_KEY | Logs, metrics, traces, monitors, dashboards | | Grafana Cloud | GRAFANA_URL + GRAFANA_API_TOKEN | Loki (logs), Prometheus (metrics), Tempo (traces) | | New Relic | NEWRELIC_API_KEY + NEWRELIC_ACCOUNT_ID | APM, logs, dashboards, alerts | | Custom API | OBSERVABILITY_API_URL + OBSERVABILITY_API_KEY | Your own monitoring stack |

Client Configuration

{
  "mcpServers": {
    "observability": {
      "command": "mcp-observability",
      "args": [],
      "env": {
        "DATADOG_API_KEY": "your-api-key",
        "DATADOG_APP_KEY": "your-app-key"
      }
    }
  }
}

Usage Examples

Debug a production issue

"Why is the API slow?"
→ get_system_health() — CPU normal, memory normal
→ get_latency_breakdown(service="api-gateway") — p99 jumped from 200ms to 2s
→ search_traces(service="api-gateway", min_duration_ms=1000) — find slow traces
→ get_trace(id="trace-abc") — database span taking 1.8s
→ query_logs(query="slow query", service="postgres") — found the culprit

Incident response

"There's a spike in errors"
→ list_alerts(status="firing") — "Error rate > 5% on payment-service"
→ get_errors(service="payment-service") — NullPointerException in checkout
→ create_incident(title="Payment failures", severity="high", service="payment-service")
→ get_runbook(service="payment-service", alert_type="error_rate")
→ acknowledge_alert(alert_id="alert-123", message="Investigating")

SLO monitoring

"Are we meeting our SLOs?"
→ list_slos() — API availability at 99.92% (target 99.9%) ✅, Latency SLO burning fast ⚠️
→ forecast_slo(id="slo-latency") — "Error budget exhausted in 3 days at current rate"

MCP Server Manifest

server_id = "mcp_observability"
display_name = "Observability"
version = "1.0.0"
domain = "infrastructure"
risk_level = "low"
writes_allowed = "gated"

License

Apache-2.0


Part of the ADK-Rust Enterprise MCP server ecosystem.

Built with ❤️ by Zavora AI

快速设置
此服务器的安装指南

安装命令 (包未发布)

git clone https://github.com/zavora-ai/mcp-observability
手动安装: 请查看 README 获取详细的设置说明和所需的其他依赖项。

Cursor 配置 (mcp.json)

{ "mcpServers": { "zavora-ai-mcp-observability": { "command": "git", "args": [ "clone", "https://github.com/zavora-ai/mcp-observability" ] } } }