MCP server by sarathi-eng
YouTube Intelligence MCP
An autonomous Model Context Protocol (MCP) server that transforms YouTube tutorials into executable workflows and fully replicated projects. It uses a multi-agent system (Playwright, ChatGPT, and Gemini) to extract, analyze, and execute tutorial steps.
Features
- Automated Extraction: Pulls transcripts, metadata, links, and captures screenshots from YouTube videos.
- Deep Analysis: Uses Gemini Pro Vision (via Web UI) to analyze video frames and transcript context.
- Autonomous Execution: ChatGPT Orchestrator plans and executes terminal commands and file operations.
- Self-Healing: Robust browser management with CDP connection and automatic page recovery.
Prerequisites
Before you begin, ensure you have the following installed:
- Node.js: v18.0.0 or higher
- npm: v9.0.0 or higher
- Google Chrome: A recent version (needed for CDP connection)
- Playwright Browsers: Even though it connects via CDP, local drivers are required.
Installation
Follow these steps to set up the YouTube Intelligence MCP server on your machine:
1. Clone the Repository
git clone https://github.com/sarathi-eng/youtube-intelligence-mcp.git cd youtube-intelligence-mcp
2. Install Dependencies
npm install
3. Build the Project
npm run build
Configuration
MCP Client Setup (e.g., Claude Desktop)
Option A: Using npx (Recommended for latest version)
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"youtube-intelligence": {
"command": "npx",
"args": ["-y", "youtube-intelligence-mcp"],
"env": {
"NODE_ENV": "production"
}
}
}
}
Option B: Local Path
Add the following to your claude_desktop_config.json:
{
"mcpServers": {
"youtube-intelligence": {
"command": "node",
"args": ["/absolute/path/to/youtube-intelligence-mcp/build/index.js"],
"env": {
"NODE_ENV": "production"
}
}
}
}
Note: Replace
/absolute/path/to/with the actual path on your system.
Usage
Once configured, the following tool becomes available to your MCP-enabled AI:
replicate_tutorial
Arguments:
url(string): The full URL of the YouTube tutorial you want to replicate.
Example: "Replicate this tutorial for me: https://www.youtube.com/watch?v=dQw4w9WgXcQ"
What Happens Next?
- Extraction: The server opens a persistent Chrome instance and extracts all data.
- Analysis: Data is sent to Gemini to generate a structured execution graph.
- Execution: ChatGPT receives the graph and starts executing steps (creating files, running bash commands).
- Final Output: A full report is saved to
replication_results/replication_report.md.
Technical Architecture
- Runtime Agent: Manages the Playwright/Chrome lifecycle.
- ChatGPT Agent: Interacts with the ChatGPT Web UI for high-level reasoning and command generation.
- Gemini Agent: Interacts with the Gemini Web UI for vision-based video analysis.
- Execution Engine: The central "brain" that coordinates the 7-phase workflow.