YouTube MCP Server

A powerful Model Context Protocol (MCP) server for YouTube video transcription and metadata extraction. This server provides advanced tools for AI agents to retrieve video metadata and generate high-quality transcriptions with native language support.

🌟 Features

Metadata Extraction: Retrieve comprehensive video details (title, description, views, duration, etc.) without downloading the video.
Smart Transcription:
- In-Memory Processing: fast, efficient, and disk-I/O free pipeline.
- VAD (Voice Activity Detection): uses Silero VAD for precise segmentation.
- Multilingual Support: supports 99 languages.
- Translation: Transcribe to any supported language.
Caching: Intelligent file-based caching to avoid redundant processing.
Optimized Performance:
- Uses yt-dlp for robust extraction.
- Hardware acceleration (MPS/CUDA) for Whisper inference.
- Parallel processing for transcription segments.

🛠️ Prerequisites

Python 3.10+
ffmpeg: Required for audio processing.
- Mac: brew install ffmpeg
- Linux: sudo apt install ffmpeg
- Windows: Download and add to PATH.

📦 Installation

Clone the repository:

git clone https://github.com/mourad-ghafiri/youtube-mcp-server
cd youtube-mcp-server

Install dependencies: Using uv (recommended):
```
uv sync
```

⚙️ Configuration

The server configuration is located in src/youtube_mcp_server/config.py. You can adjust the following parameters:

Directories

TRANSCRIPTIONS_DIR: Directory where transcription JSON files are cached (default: "transcriptions").

Models

WHISPER_MODEL_NAME: OpenAI Whisper model to use. Options: "tiny", "base", "small", "medium", "large", "turbo". (default: "tiny").

Note: Larger models require more RAM and a GPU (CUDA/MPS).
SILERO_REPO / SILERO_MODEL: VAD model repository and ID.

Audio Processing

SAMPLING_RATE: Audio sampling rate for Whisper/VAD (default: 16000 Hz).
SEGMENT_PADDING_MS: Padding added to each audio segment to avoid cutting off words (default: 200 ms).

Concurrency

MAX_WORKERS: Number of parallel threads for transcribing audio segments (default: 4). Increasing this speeds up transcription but uses more CPU/Memory.

🚀 Usage

1. Start the Server

uv run main.py

The server runs on SSE (Server-Sent Events) transport at http://127.0.0.1:8000/sse.

2. Configure MCP Client

Add the server configuration to your MCP client:

{
  "mcpServers": {
    "youtube": {
      "url": "http://127.0.0.1:8000/sse"
    }
  }
}

🛠️ Tools Reference

`get_video_info`

Retrieves metadata for a given YouTube video.

Input: url (string)

Output: JSON object with title, views, description, thumbnails, etc.

{
  "id": "VIDEO_ID",
  "title": "Video Title",
  "description": "Video description...",
  "view_count": 1000000,
  "duration": 212,
  "uploader": "Channel Name",
  "upload_date": "20091025",
  "thumbnail": "https://i.ytimg.com/...",
  "tags": ["tag1", "tag2"],
  "categories": ["Music"]
}

`transcribe_video`

Transcribes a video with optional translation.

Inputs:
- url (string): Video URL.
- language (string, default="auto"):
  - "auto": Transcribe in detected language.
  - "en": Translate to English.
  - "fr", "es", etc.: Transcribe in specific language.

Output: JSON with segments and metadata.

{
  "id": "VIDEO_ID",
  "title": "Video Title",
  "duration": 212,
  "transcription": [
    {
      "from": "00:00:00",
      "to": "00:00:05",
      "transcription": "First segment text..."
    },
    {
      "from": "00:00:05",
      "to": "00:00:10",
      "transcription": "Second segment text..."
    }
  ]
}

🏗️ Technical Architecture

Services: DownloadService, VADService (Silero), WhisperService (OpenAI), CacheService.
In-Memory Pipeline: Audio is downloaded -> loaded to RAM -> segmented by VAD -> transcribed by Whisper -> Cached.
Concurrency: Parallel segment transcription.

🌍 Appendix: Supported Languages

| Country (Primary/Region) | Language | Code | |:-------------------------|:---------|:-----| | South Africa | Afrikaans | af | | Ethiopia | Amharic | am | | Arab World | Arabic | ar | | India | Assamese | as | | Azerbaijan | Azerbaijani | az | | Russia | Bashkir | ba | | Belarus | Belarusian | be | | Bulgaria | Bulgarian | bg | | Bangladesh | Bengali | bn | | Tibet | Tibetan | bo | | France (Brittany) | Breton | br | | Bosnia and Herzegovina | Bosnian | bs | | Spain (Catalonia) | Catalan | ca | | Czech Republic | Czech | cs | | Wales | Welsh | cy | | Denmark | Danish | da | | Germany | German | de | | Greece | Greek | el | | USA / UK | English | en | | Spain | Spanish | es | | Estonia | Estonian | et | | Spain (Basque) | Basque | eu | | Iran | Persian | fa | | Finland | Finnish | fi | | Faroe Islands | Faroese | fo | | France | French | fr | | Spain (Galicia) | Galician | gl | | India | Gujarati | gu | | Nigeria | Hausa | ha | | Hawaii | Hawaiian | haw | | Israel | Hebrew | he | | India | Hindi | hi | | Croatia | Croatian | hr | | Haiti | Haitian Creole | ht | | Hungary | Hungarian | hu | | Armenia | Armenian | hy | | Indonesia | Indonesian | id | | Iceland | Icelandic | is | | Italy | Italian | it | | Japan | Japanese | ja | | Indonesia (Java) | Javanese | jw | | Georgia | Georgian | ka | | Kazakhstan | Kazakh | kk | | Cambodia | Khmer | km | | India | Kannada | kn | | South Korea | Korean | ko | | Ancient Rome | Latin | la | | Luxembourg | Luxembourgish | lb | | Congo | Lingala | ln | | Laos | Lao | lo | | Lithuania | Lithuanian | lt | | Latvia | Latvian | lv | | Madagascar | Malagasy | mg | | New Zealand | Maori | mi | | North Macedonia | Macedonian | mk | | India | Malayalam | ml | | Mongolia | Mongolian | mn | | India | Marathi | mr | | Malaysia | Malay | ms | | Malta | Maltese | mt | | Myanmar | Myanmar | my | | Nepal | Nepali | ne | | Netherlands | Dutch | nl | | Norway | Nynorsk | nn | | Norway | Norwegian | no | | France (Occitania) | Occitan | oc | | India (Punjab) | Punjabi | pa | | Poland | Polish | pl | | Afghanistan | Pashto | ps | | Portugal / Brazil | Portuguese | pt | | Romania | Romanian | ro | | Russia | Russian | ru | | India | Sanskrit | sa | | Pakistan | Sindhi | sd | | Sri Lanka | Sinhala | si | | Slovakia | Slovak | sk | | Slovenia | Slovenian | sl | | Zimbabwe | Shona | sn | | Somalia | Somali | so | | Albania | Albanian | sq | | Serbia | Serbian | sr | | Indonesia | Sundanese | su | | Sweden | Swedish | sv | | East Africa | Swahili | sw | | India | Tamil | ta | | India | Telugu | te | | Tajikistan | Tajik | tg | | Thailand | Thai | th | | Turkmenistan | Turkmen | tk | | Philippines | Tagalog | tl | | Turkey | Turkish | tr | | Russia (Tatarstan) | Tatar | tt | | Ukraine | Ukrainian | uk | | Pakistan | Urdu | ur | | Uzbekistan | Uzbek | uz | | Vietnam | Vietnamese | vi | | Ashkenazi Jewish | Yiddish | yi | | Nigeria | Yoruba | yo | | China (Guangdong) | Cantonese | yue | | China | Chinese | zh |

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the project
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

Distributed under the MIT License. See LICENSE for more information.

Built with love ❤️

MCP Servers

YouTube MCP Server

🌟 Features

🛠️ Prerequisites

📦 Installation

⚙️ Configuration

Directories

Models

Audio Processing

Concurrency

🚀 Usage

1. Start the Server

2. Configure MCP Client

🛠️ Tools Reference

`get_video_info`

`transcribe_video`

🏗️ Technical Architecture

🌍 Appendix: Supported Languages

🤝 Contributing

📄 License

安装包（如果需要）

Cursor 配置 (mcp.json)

YouTube MCP Server

🌟 Features

🛠️ Prerequisites

📦 Installation

⚙️ Configuration

Directories

Models

Audio Processing

Concurrency

🚀 Usage

1. Start the Server

2. Configure MCP Client

🛠️ Tools Reference

get_video_info

transcribe_video

🏗️ Technical Architecture

🌍 Appendix: Supported Languages

🤝 Contributing

📄 License

安装包 （如果需要）

Cursor 配置 (mcp.json)

`get_video_info`

`transcribe_video`

安装包（如果需要）