MCP server for managing Hadoop clusters (HDFS, YARN, Kafka, Flink, ZooKeeper) via SSH through Claude.
Hadoop Cluster MCP Server
An MCP (Model Context Protocol) server that connects Claude to your Hadoop cluster via SSH. Manage HDFS, YARN, Kafka, Flink, Zookeeper, and more — all through natural language.
Features
- 📂 HDFS — ls, du, cat, stat
- 🧶 YARN — list/kill apps, view logs, cluster metrics, node status
- 📨 Kafka — topics, consumer groups, lag monitoring
- 🌊 Flink — list jobs running on YARN
- 🐘 Zookeeper — node status (leader/follower)
- 🖥️ Node Ops — jps, CPU/memory/disk, read files, tail logs
- 🔧 General — execute any shell command on any node
Quick Start
1. Install dependencies
pip install -r requirements.txt
2. Configure your cluster
cp config.example.yaml config.yaml
Edit config.yaml with your cluster details (hostnames, IPs, credentials).
⚠️
config.yamlcontains credentials and is git-ignored. Never commit it.
3. Add to Claude Desktop
Add to your Claude Desktop MCP config (claude_desktop_config.json):
{
"mcpServers": {
"cluster": {
"command": "python",
"args": ["path/to/server.py"]
}
}
}
4. Use it
Talk to Claude:
- "Show me HDFS usage under /data"
- "List running YARN applications"
- "Check Kafka consumer lag for group my-consumer"
- "What's the CPU and memory usage on worker3?"
- "Kill YARN application application_1234567890_0001"
Configuration
config.yaml Structure
default_node: "master" # Default node for commands
kafka_bootstrap: "broker1:9092" # Kafka bootstrap server
yarn_rm_host: "master" # YARN ResourceManager host
clusters:
- name: "production"
nodes:
- name: master # Node name (used in commands)
host: "192.168.1.10" # IP or hostname
user: "hadoop" # SSH user
password: "xxx" # Option A: password
# key_file: "~/.ssh/id_rsa" # Option B: SSH key (recommended)
# port: 22 # SSH port (default: 22)
Authentication
Two methods supported:
- Password: set
passwordfield - SSH Key: set
key_filefield (recommended for production)
Environment Variable
Override config path:
export CLUSTER_MCP_CONFIG=/path/to/my-config.yaml
python server.py
Available Tools
| Tool | Description |
|------|-------------|
| exec_command | Execute any shell command on a node |
| hdfs_ls | List HDFS directory contents |
| hdfs_du | Check HDFS space usage |
| hdfs_cat | View HDFS file content (size-limited) |
| hdfs_stat | View HDFS file/directory metadata |
| yarn_apps | List YARN applications by state |
| yarn_app_detail | View application details |
| yarn_app_log | Get application logs |
| yarn_cluster_metrics | YARN cluster resource overview |
| yarn_nodes | List all YARN nodes |
| yarn_kill_app | Kill a YARN application |
| kafka_topics | List Kafka topics |
| kafka_topic_detail | View topic partitions and replicas |
| kafka_consumer_groups | List consumer groups |
| kafka_consumer_lag | Check consumer group lag |
| service_status | View Java processes (jps) |
| node_resources | Check CPU, memory, disk usage |
| cluster_overview | HDFS + YARN overview |
| flink_jobs | List Flink jobs on YARN |
| zk_status | Check Zookeeper status |
| read_file | Read a file on a node |
| tail_log | Tail a log file |
Security Notes
config.yamlis in.gitignore— credentials stay local- SSH key authentication is recommended over passwords
- The
exec_commandtool can run arbitrary commands — use with caution in shared environments - Consider restricting SSH users to minimal required permissions
License
MIT