G
Gpu Cluster Operator MCP Agent
作者 @SkyPhy
An autonomous Site Reliability Engineer (SRE) agent powered by the Model Context Protocol (MCP) and Gemini. This agent connects to your Cherry Studio (or other MCP clients), intelligently diagnoses Linux server issues, performs network scans, and executes remediation steps using an OODA Loop (Observe, Orient, Decide, Act) strategy.
创建于 1/6/2026
更新于 2 days ago
README
Repository documentation and setup instructions
🤖 Linux SRE MCP Agent
An autonomous GPU cluster Reliability Engineer (SRE) agent powered by the Model Context Protocol (MCP) and Gemini. This agent connects to your MCP clients (such as Cherry Studio), intelligently diagnoses Linux server issues, performs network scans, and executes remediation steps using an OODA Loop (Observe, Orient, Decide, Act) strategy.
✨ Features
- 🧠 Batch Diagnostics: Uses Gemini to analyze multiple system states (Processes, Logs, Network) in a single pass.
- 🚀 SSH Multiplexing: Implements
ControlMasterfor millisecond-latency executions. - 🛡️ Hybrid Execution: Automatically detects if the target is Local or Remote.
- 🔑 Key-Based Auth: Secure, password-less operation using SSH keys and
sudo NOPASSWD. - 🕵️ Network Awareness: Capable of scanning local subnets.
- 🔄 Self-Healing: Detects errors and autonomously digs for root causes.
🚀 Installation
-
Install Dependencies
pip install -r requirements.txt -
Configure Environment
cp .env.example .env # Edit .env with your API keys -
Run
python src/server.py
快速设置
此服务器的安装指南
安装包 (如果需要)
uvx gpu-cluster-operator-mcp-agent
Cursor 配置 (mcp.json)
{
"mcpServers": {
"skyphy-gpu-cluster-operator-mcp-agent": {
"command": "uvx",
"args": [
"gpu-cluster-operator-mcp-agent"
]
}
}
}