MCP Servers

模型上下文协议服务器、框架、SDK 和模板的综合目录。

MCP server by bogdanovich

创建于 5/4/2026
更新于 about 3 hours ago
Repository documentation and setup instructions

media-mcp

Backend-agnostic MCP server for durable asynchronous media download and transcription jobs.

media-mcp is meant to be used by chat agents such as PicoClaw, OpenClaw, or any MCP-capable agent runtime. It downloads media with yt-dlp, stores artifacts on disk, persists job state in SQLite, and returns artifact paths plus metadata to the calling agent.

It deliberately does not send files to Telegram, Matrix, Discord, or any other chat backend. Delivery should be handled by the agent runtime's native send_file / message tools.

Why Async Jobs

Media work is slow and failure-prone:

  • social/video downloads may take seconds or minutes;
  • transcription can be CPU-heavy;
  • an agent runtime may restart while work is running;
  • users often need status/result inspection later.

For that reason, media-mcp stores jobs durably in SQLite and runs workers in detached child processes.

Tools

download_async

Starts a durable async media download job.

Arguments:

  • url string, required: URL supported by yt-dlp.
  • mode string: audio or video. Defaults to audio.
  • quality string: low or best. Defaults to low.
  • prepare_transcription_audio boolean: create 16 kHz mono WAV for transcription. Defaults to true.

Returns immediately with:

{
  "ok": true,
  "accepted": true,
  "job_id": "job_20260504231751_bf964e43",
  "type": "download",
  "status": "queued"
}

When finished, the job result includes fields such as:

  • asset_id
  • media_path
  • audio_path
  • normalized_audio_path
  • sendable_file_path
  • title
  • platform
  • duration_seconds

transcribe_async

Starts a durable async transcription job for a previously downloaded asset.

Arguments:

  • asset_id string, required.
  • model string: transcription model name passed to the helper. Defaults to base.
  • language string: optional language hint such as en or ru.
  • timestamps boolean: request segment timestamps when supported.

job_status

Fast status check for a job.

Use this for quick checks or when the user explicitly asks whether a job is still running.

job_wait

Waits for a job to finish and returns the same payload shape as job_result.

Arguments:

  • job_id string, required.
  • timeout_seconds number: defaults to 60, maximum 180.
  • poll_interval_seconds number: defaults to 2.

Agents should prefer job_wait over repeated job_status polling when a job is expected to finish during the current turn. This avoids wasting LLM/tool iterations.

job_result

Reads the durable result or error for a job.

Use this when a job is already known to be finished, or when resuming a previously started job by id.

Install

git clone https://github.com/your-org/media-mcp.git
cd media-mcp
go build -o media-mcp ./cmd/media-mcp

Runtime dependencies:

  • Go 1.24+ to build.
  • yt-dlp for downloads.
  • ffmpeg for audio extraction/normalization.
  • Optional transcriber command for transcribe_async.

Run

./media-mcp \
  --db-path /var/lib/media-mcp/media.db \
  --asset-root /var/lib/media-mcp/assets \
  --yt-dlp /usr/local/bin/yt-dlp \
  --ffmpeg /usr/bin/ffmpeg \
  --transcriber-command python3 \
  --transcriber-args "/opt/media-mcp/examples/transcribers/faster_whisper.py {audio} --model {model} {language_arg} {timestamps_arg}"

Flags:

  • --db-path: SQLite job/asset database path. Required.
  • --asset-root: root directory for downloaded media and transcripts. Required.
  • --yt-dlp: yt-dlp executable. Defaults to yt-dlp.
  • --ffmpeg: ffmpeg executable. Defaults to ffmpeg.
  • --transcriber-command: executable used by transcribe_async, for example python3.
  • --transcriber-args: argument template for the transcriber command.
  • --transcribe-helper: deprecated compatibility shortcut for Python helpers. It expands to --transcriber-command python3 --transcriber-args "<helper> {audio} --model {model} {language_arg} {timestamps_arg}".

Transcriber Adapter Contract

media-mcp delegates speech-to-text to an external command so the core server remains provider-agnostic. You can use local faster-whisper, whisper.cpp, OpenAI, Deepgram, AssemblyAI, or any other backend as long as your adapter follows the stdout JSON contract.

The command is configured with --transcriber-command and --transcriber-args.

Supported argument placeholders:

  • {audio}: normalized 16 kHz mono WAV path.
  • {model}: model requested by the tool call.
  • {language}: raw language hint, if any.
  • {language_arg}: expands to --language <language> when language is set, otherwise empty.
  • {timestamps_arg}: expands to --timestamps when timestamps are requested, otherwise empty.

Example:

--transcriber-command python3
--transcriber-args "/opt/media-mcp/examples/transcribers/faster_whisper.py {audio} --model {model} {language_arg} {timestamps_arg}"

The adapter should print JSON to stdout:

{
  "backend": "local-faster-whisper",
  "model": "base",
  "requested_language": "ru",
  "detected_language": "ru",
  "language_probability": 0.98,
  "text": "Transcript text...",
  "segments": [
    { "start": 0.0, "end": 2.4, "text": "Transcript text..." }
  ]
}

segments may be omitted when timestamps are not requested.

A reference faster-whisper adapter is included at examples/transcribers/faster_whisper.py.

MCP Config Examples

PicoClaw-style config

{
  "tools": {
    "mcp": {
      "enabled": true,
      "servers": {
        "media": {
          "enabled": true,
          "deferred": false,
          "command": "/opt/media-mcp/media-mcp",
          "args": [
            "--db-path",
            "/var/lib/media-mcp/media.db",
            "--asset-root",
            "/var/lib/media-mcp/assets",
            "--yt-dlp",
            "/usr/local/bin/yt-dlp",
            "--ffmpeg",
            "/usr/bin/ffmpeg",
            "--transcriber-command",
            "python3",
            "--transcriber-args",
            "/opt/media-mcp/examples/transcribers/faster_whisper.py {audio} --model {model} {language_arg} {timestamps_arg}"
          ],
          "type": "stdio"
        }
      }
    }
  }
}

OpenClaw-style config

{
  "mcp": {
    "servers": {
      "media": {
        "command": "/opt/media-mcp/media-mcp",
        "args": [
          "--db-path",
          "/var/lib/media-mcp/media.db",
          "--asset-root",
          "/var/lib/media-mcp/assets",
          "--yt-dlp",
          "/usr/local/bin/yt-dlp",
          "--ffmpeg",
          "/usr/bin/ffmpeg",
          "--transcriber-command",
          "python3",
          "--transcriber-args",
          "/opt/media-mcp/examples/transcribers/faster_whisper.py {audio} --model {model} {language_arg} {timestamps_arg}"
        ]
      }
    }
  }
}

Different runtimes expose MCP tool names differently. For a server named media, the agent may see tools as mcp_media_download_async, media__download_async, or similar.

Recommended Agent Instructions

See examples/TOOLS.md for a copy-pasteable instruction block.

The important rules are:

  • Prefer quality: "low" for normal downloads.
  • Use quality: "best" only when explicitly requested or when low quality is unusable.
  • Use job_wait instead of repeatedly polling job_status.
  • Do not treat the initial accepted: true response as completion.
  • Do not ask media-mcp to send files to chat. Use the agent runtime's native delivery tool with sendable_file_path, transcript_path, or another artifact path.

Typical Flows

Download a video and send it to the user

  1. Call download_async with mode: "video" and quality: "low".
  2. Call job_wait with the returned job_id.
  3. Read result.sendable_file_path.
  4. Use the runtime's native file delivery tool.

Extract a recipe from Instagram/TikTok/YouTube media

  1. Call download_async with:
    • mode: "video"
    • quality: "low"
    • prepare_transcription_audio: true
  2. Call job_wait.
  3. Call transcribe_async with the resulting asset_id.
  4. Call job_wait.
  5. Use result.text or result.transcript_path to extract ingredients and steps.
  6. Reply with a concise structured recipe.

Resume an old job

  1. Call job_status with the known job_id.
  2. If succeeded or failed, call job_result.
  3. If still running, call job_wait with a reasonable timeout or tell the user it is still running.

Storage Layout

SQLite stores job and asset metadata. Media artifacts are stored under --asset-root by asset_id.

Example:

assets/
  20260504231751_7ad348ff/
    source.mp4
    source.info.json
    source.m4a
    audio_16k.wav
    transcript.txt
    transcript.json
  _worker-logs/
    job_20260504231818_bbb042b2.log

Security Notes

  • Treat downloaded media as untrusted user-controlled files.
  • Do not expose --asset-root publicly without an access-control layer.
  • Keep SQLite and artifacts outside your agent prompt/workspace if users should not browse them directly.
  • Use OS permissions to restrict who can read transcripts and media files.
  • Be careful with cookies/browser profiles used by yt-dlp; this server does not manage secrets for you.

Development

go test ./...
go build -o media-mcp ./cmd/media-mcp
快速设置
此服务器的安装指南

安装命令 (包未发布)

git clone https://github.com/bogdanovich/media-mcp
手动安装: 请查看 README 获取详细的设置说明和所需的其他依赖项。

Cursor 配置 (mcp.json)

{ "mcpServers": { "bogdanovich-media-mcp": { "command": "git", "args": [ "clone", "https://github.com/bogdanovich/media-mcp" ] } } }