tts-mcp-server

Local text-to-speech for Claude Code via MCP. Runs Kokoro-82M on Apple Silicon through MLX-audio, giving Claude the ability to speak task notifications and arbitrary text aloud.

Requirements: macOS on Apple Silicon (M1+), Python 3.12+.

Setup

cd ~/projects/tts-mcp-server
uv venv && uv pip install -e .

# Pre-download the TTS model (~200 MB, one-time):
uv run tts-mcp-init

claude mcp add --transport stdio --scope user tts -- \
  /path/to/tts-mcp-server/.venv/bin/tts-mcp-server

Verify:

claude mcp list   # should show tts: ✓ Connected

Tools

`notify(message)`

Quick task-completion alert. Speaks at 1.2x speed with the default voice (af_heart). Designed for short messages like "Build finished" or "Tests passed".

`speak(text, voice?, speed?)`

Full TTS with voice and speed control.

| Parameter | Default | Range / Options | | --------- | ------------ | --------------------- | | text | (required) | Any string | | voice | af_heart | See voices | | speed | 1.2 | 0.5 -- 2.0 |

Voices

Kokoro ships 54 presets. A useful subset:

| ID | Description | | ----------- | ------------------------- | | af_heart | American female (default) | | af_bella | American female | | af_nova | American female | | am_adam | American male | | am_echo | American male | | bf_emma | British female | | bm_george | British male |

Full list: prefix af_ / am_ (American), bf_ / bm_ (British), jf_ / jm_ (Japanese), zf_ / zm_ (Chinese).

Architecture

Claude Code  ──stdio──>  FastMCP server  ──>  MLX-audio/Kokoro  ──>  afplay

Lazy loading: The model, spacy G2P pipeline, and Metal shaders all initialize on the first tool call (~6 s). This keeps the MCP handshake instant so health checks pass. Subsequent calls run in ~0.1 s.
No persistent daemon: Claude Code spawns the server on session start and kills it on exit. No LaunchAgent needed.
Temp files: Audio is written to a temp WAV, played with afplay, then deleted. No disk accumulation.

Performance (M2 Max)

| Metric | First call | Subsequent | | ---------------------- | ---------------------- | ---------- | | Latency (short phrase) | ~6 s | ~0.1 s | | Memory | ~420 MB | ~420 MB | | CPU | < 5% (GPU-accelerated) | < 5% |

Troubleshooting

Server shows ✗ Failed to connect: The model is loading during the health check. This was fixed by deferring model load to first tool call. If you still see this, ensure main() calls mcp.run() immediately (before any model loading).

First call is slow (~6 s): Expected. Spacy G2P pipeline and Metal shader compilation happen once per server lifetime. After that, calls are sub-200 ms.

First call hangs indefinitely: The spacy en_core_web_sm model is missing. Misaki's G2P calls spacy.cli.download() at runtime, which shells out to pip -- but uv-managed venvs don't have pip, so it hangs forever. This model is declared as a dependency in pyproject.toml, so uv pip install -e . should handle it. If you somehow end up without it, reinstall.

afplay not found: You're not on macOS. Replace the afplay call in _play() with your platform's audio player (e.g., paplay on Linux, sox cross-platform).

Model download fails: Pre-download with the init script, or manually via huggingface-cli:

uv run tts-mcp-init
# or:
.venv/bin/huggingface-cli download mlx-community/Kokoro-82M-bf16

License

MIT

MCP Servers

tts-mcp-server

Setup

Tools

`notify(message)`

`speak(text, voice?, speed?)`

Voices

Architecture

Performance (M2 Max)

Troubleshooting

License

安装包（如果需要）

Cursor 配置 (mcp.json)

tts-mcp-server

Setup

Tools

notify(message)

speak(text, voice?, speed?)

Voices

Architecture

Performance (M2 Max)

Troubleshooting

License

安装包 （如果需要）

Cursor 配置 (mcp.json)

`notify(message)`

`speak(text, voice?, speed?)`

安装包（如果需要）