docs mcp server
mpaas-docs
MCP server for semantic search over Alipay+ Mini Program documentation. Uses hybrid search (vector + FTS5), graph navigation, and provides MCP tools/prompts/resources for AI assistants.
Pipeline
crawl ──► extract ──► parse ──► migrate ──► load ──► serve
│ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼
graph.json state.json *.md SQLite chunks+ FastMCP
(URL-graph) (URL list) (markdown) schema embed- server
dings (tools)
Source docs: Alipay+ Mini Program — ~170 pages covering Quick Start, Mini Program Studio, IAPMiniProgram SDK (Android/iOS/Flutter/React Native), OpenAPIs, JSAPI reference, Capabilities, Framework, Custom Components, and antd-mini extended components.
Quick start
uv pip install -e ".[dev]"
playwright install chromium
Full pipeline
Downloads, chunks, embeds, and indexes all documentation:
mpaas-docs pipeline
Skip completed steps on re-run:
mpaas-docs pipeline --skip-crawl --skip-extract --skip-parse
mpaas-docs pipeline --skip-migrate --skip-load
Step by step
| Command | What it does | Produces |
|---------|-------------|----------|
| mpaas-docs crawl | Scrapes sidebar URL graph via Playwright | data/raw/graph.json |
| mpaas-docs extract | Flattens graph into page list | data/raw/state.json |
| mpaas-docs parse | Downloads each page as markdown | data/raw/md/**/*.md |
| mpaas-docs migrate | Creates SQLite tables (pages, chunks, fts5, vec0, graph) | data/sqlite/docs.db |
| mpaas-docs load | Chunks text + generates embeddings + builds graph | Populates DB |
| mpaas-docs serve | Starts MCP server | — |
MCP server
# HTTP (default)
mpaas-docs serve
# stdio — for AI assistant integration
mpaas-docs serve --transport stdio
HTTP flags:
| Flag | Default |
|------|---------|
| --host | 127.0.0.1 |
| --port | 8000 |
| --path | /mcp |
MCP interface
Tools
| Tool | Description |
|------|-------------|
| search_docs(query, mode, code_only, include_context, filter_section, limit) | Hybrid search — vector (cosine via sqlite-vec), FTS5 (BM25), or combined with score summation |
| get_page(page_id, url) | Full page with all chunks |
| get_related_pages(page_id, direction, depth) | Graph-traverse related pages via CONTAINS edges |
Prompts
| Prompt | Purpose |
|--------|---------|
| implement_feature(requirement) | Structured plan with step breakdown, code, and tradeoffs |
| explain_concept(concept) | 3-layer explanation: analogy → technical → code |
| debug_code(code) | Systematic root-cause analysis before proposing a fix |
Resources
| URI | Description |
|-----|-------------|
| docs://structure | JSON navigation tree (section/page IDs and names) |
Database schema
┌─────────┐ ┌──────────────┐ ┌───────────────────┐
│ pages │ │ chunks │ │ chunk_embeddings │
├─────────┤ ├──────────────┤ ├───────────────────┤
│ id (PK) │──┐ │ id (PK) │ │ chunk_id (PK) │
│ content │ └──► │ source_page_id├──────►│ embedding FLOAT[] │
│ url │ │ content │ └───────────────────┘
└─────────┘ │ chunk_index │ ┌──────────────┐
│ metadata_json│ │ chunks_fts │
└──────────────┘ │ (FTS5) │
└──────────────┘
┌─────────────────────────────────────────┐
│ graphqlite: nodes + edges (CONTAINS) │
│ hierarchical nav tree (Section→Page→ │
│ Chunk) │
└─────────────────────────────────────────┘
Configuration
Single YAML file (mpaas-docs.yaml at project root):
paths:
data_dir: data
db_path: data/sqlite/docs.db
crawler:
start_url: "https://miniprogram.alipayplus.com/docs/miniprogram/mpdev"
base_url: "https://miniprogram.alipayplus.com"
headless: true
wait_timeout: 10000
request_delay: 2
llm:
base_url: "http://localhost:1234/v1"
api_key: "lm-studio"
embed_model: "text-embedding-qwen3-embedding-0.6b"
embedding_dim: 1024
chunking:
chunk_size: 1024
chunk_overlap_ratio: 0.2
server:
name: "docs-search"
host: "127.0.0.1"
port: 8000
transport: "http"
path: "/mcp"
search_default_limit: 10
resource_uri: "docs://structure"
fts:
tokenizer: "porter unicode61"
Requires a running OpenAI-compatible embedding endpoint (default: LM Studio at http://localhost:1234/v1).
Project structure
src/mpaas_docs/
├── cli.py # CLI entry point (argparse)
├── config/settings.py # YAML config loader (lru_cached)
├── crawler/
│ ├── spider.py # Playwright sidebar graph scraper
│ ├── extract.py # Graph → flat URL list
│ └── parser.py # Page download → markdown
├── db/
│ ├── connection.py # sqlite-vec factory
│ └── migrations.py # Schema (pages, chunks, FTS5, vec0)
├── etl/
│ ├── chunker.py # Legacy chunker (MarkdownSyntax splitter)
│ └── loader.py # Chunking + embeddings + graph building
├── server/
│ ├── app.py # FastMCP instance + registration
│ ├── tools.py # search_docs, get_page, get_related_pages
│ ├── prompts.py # implement_feature, explain_concept, debug_code
│ └── resources.py # docs://structure nav tree
├── __init__.py
└── __main__.py # python -m mpaas_docs
Development
ruff check src/mpaas_docs/ tests/
ruff format src/mpaas_docs/ tests/
Manual search test (requires populated DB + running embedding endpoint):
python tests/test_vec_search.py