MCP Servers

模型上下文协议服务器、框架、SDK 和模板的综合目录。

System Perception MCP
作者 @haji-mi

MCP server by haji-mi

创建于 4/1/2026
更新于 about 4 hours ago
Repository documentation and setup instructions

System Perception MCP Server

English | 简体中文

A high-performance Model Context Protocol (MCP) server designed for AI Agents to perceive and control the Windows operating system with ultra-low latency and zero physical mouse/keyboard interference.

🌟 Core Features

  • Ultra-Low Latency Screen Perception: Bypasses traditional slow screenshot methods. Utilizes dxcam for direct DXGI VRAM capture and OpenCV for memory compression, delivering screen frames to the agent in under ~120ms.
  • Silent Background Control: Eliminates the fragile and disruptive nature of physical mouse/keyboard simulation. Uses win32api and uiautomation to send underlying system messages (PostMessage) and invoke UI elements silently.
  • UI Tree Parsing (get_ui_tree): Instantly reads the accessibility tree of standard Windows applications, bypassing the slow Vision-Language Model (VLM) coordinate calculation bottleneck.
  • Instant Execution (invoke_ui_element): Directly triggers standard OS elements (like desktop icons, buttons, and text fields) in less than a second based on UI definitions rather than screen coordinates.

🛠️ Requirements

  • OS: Windows 10 / 11 (Requires DXGI and Windows UIAutomation APIs)
  • Python: 3.8+
  • Agent Harness: Any MCP-compatible client (e.g., Claude Desktop, DeerFlow)

📦 Installation

  1. Clone this repository:

    git clone <YOUR_GITHUB_REPO_URL>
    cd system-perception-mcp
    
  2. Install the required dependencies:

    pip install -r requirements.txt
    

🚀 Exposed Tools

Once connected to an MCP client, the following tools become available to the LLM/Agent:

  • get_gpu_frame(): Instantly captures the current screen from the GPU frame buffer.
  • get_ui_tree(): Scans and returns the current window's hierarchical UI structure.
  • invoke_ui_element(element_name/id): Directly interacts with a specific UI node without moving the physical cursor.
  • silent_mouse_click(x, y, hwnd): Sends a background click event to specific coordinates within a target window.
  • silent_keyboard_type(text, hwnd): Injects keystrokes directly into a background application's message queue.

💡 Why This Approach?

Traditional visual AI agents rely on taking screenshots, sending them to a VLM, waiting 2-4 seconds for coordinate calculation, and then physically moving the user's cursor. This is slow, fragile, and prevents the user from using their computer while the agent is working.

System Perception MCP solves this by fusing computer vision with native OS UI Automation, allowing the agent to "see" instantly and "act" invisibly.

📝 License

MIT License

快速设置
此服务器的安装指南

安装包 (如果需要)

uvx -system-perception-mcp-

Cursor 配置 (mcp.json)

{ "mcpServers": { "haji-mi-system-perception-mcp": { "command": "uvx", "args": [ "-system-perception-mcp-" ] } } }