Universal Local RAG Server for Model Context Protocol (MCP)
🧠 Markdown RAG MCP Server
Your Personal High-Performance Local RAG Knowledge Base
Seamlessly connect your Agentic IDEs (Antigravity, Windsurf, Claude Code, Cursor) to your local Markdown documentation via intelligent hybrid semantic search.
Features • Architecture • Quickstart • Configuration • FAQ
✨ Features
- Hybrid Search: Fuses ChromaDB (Vector Search) and BM25 (Keyword Search) using Reciprocal Rank Fusion ($k=60$) for both semantic understanding and exact-term retrieval.
- Cross-Encoder Reranking: Re-scores the top candidates with a specialized
ms-marcoAI model to ensure surgical precision on the final output. - Heading-Aware Chunking: Intelligently splits Markdown files at
##and###boundaries, and includes sentence overlaps to prevent context loss between chunks. - Multilingual Context: Powered by
paraphrase-multilingual-MiniLM-L12-v2, natively supporting queries and documents in English, Russian, and 50+ other languages. - Auto-Categorizing: Automatically tags every indexed document using its H1 heading as the category — no folder structure or frontmatter required.
- 100% Local & Free: No Docker required, no OpenAI API keys, no monthly fees. Runs natively on Windows, macOS, and Linux.
🏗️ Architecture
flowchart TD
Q(["🔍 User Query"])
Q --> VS
Q --> BM
subgraph HYBRID["⚡ Stage 1 — Retrieval"]
VS["🧠 Vector Search<br>(ChromaDB)"]
BM["📝 Keyword Search<br>(BM25 Okapi)"]
end
VS --> RRF
BM --> RRF
subgraph FUSION["🔀 Stage 2 — Fusion"]
RRF["Reciprocal Rank Fusion<br>(RRF algorithm)"]
end
RRF --> CE
subgraph RERANK["🎯 Stage 3 — Reranking"]
CE["Cross-Encoder<br>Scoring"]
end
CE --> OUT
subgraph OUTPUT["📋 Stage 4 — Result"]
OUT["Top-N Documents<br>with Breadcrumbs"]
end
style Q fill:#6366f1,color:#fff,stroke:#4338ca
style HYBRID fill:#0f172a,color:#e2e8f0,stroke:#334155
style FUSION fill:#0f172a,color:#e2e8f0,stroke:#334155
style RERANK fill:#0f172a,color:#e2e8f0,stroke:#334155
style OUTPUT fill:#0f172a,color:#e2e8f0,stroke:#334155
style VS fill:#1e40af,color:#bfdbfe,stroke:#3b82f6
style BM fill:#065f46,color:#a7f3d0,stroke:#10b981
style RRF fill:#7c3aed,color:#ede9fe,stroke:#8b5cf6
style CE fill:#b45309,color:#fef3c7,stroke:#f59e0b
style OUT fill:#1e3a5f,color:#bae6fd,stroke:#38bdf8
How the Pipeline Works
- Hybrid Retrieval: The user query is searched simultaneously using
paraphrase-multilingual-MiniLM-L12-v2for semantic meaning (Cosine Similarity) andBM25for exact keyword matching. - Reciprocal Rank Fusion: Ranks from both engines are mathematically combined, prioritizing chunks that perform well in both abstract context and exact terminology.
- Cross-Encoder Reranking: The top candidates are passed to a secondary model (
ms-marco-MiniLM-L-6-v2) which deeply computes relevance across the full query and document text. - Structured Output: The final results are returned to the LLM agent formatted with breadcrumbs (e.g.,
README.md > Quickstart > Installation) to establish position context.
🚀 Quickstart
1. Prerequisites
- Python 3.10 or higher
gitinstalled
2. Installation
Clone the repository and run the setup script. It will install dependencies, download AI models (~520MB), and automatically configure your IDE.
git clone https://github.com/ElvinBayramov/Markdown-RAG-MCP-Server.git
cd Markdown-RAG-MCP-Server
python install.py
The installer auto-detects your IDE (Antigravity, Claude Desktop, Windsurf) and injects the correct config — no manual path editing required. Just restart your IDE after installation.
3. Point it to your documents
By default, the server scans the parent directory of the repository for all .md files recursively. No setup needed — if your docs are anywhere in that tree, they'll be found.
Want to index a specific folder? Add an env block to your MCP config:
{
"mcpServers": {
"markdown-rag": {
"command": "python",
"args": ["C:\\path\\to\\Markdown-RAG-MCP-Server\\server.py"],
"env": {
"RAG_DOCS_PATH": "C:\\Users\\you\\MyProject\\docs"
}
}
}
}
Multiple folders? Point RAG_DOCS_PATH to their common parent directory — the server scans recursively, so all subfolders are indexed automatically:
C:\Docs\ ← set RAG_DOCS_PATH to this
├── ProjectA\docs\ ← indexed
├── ProjectB\wiki\ ← indexed
└── SharedNotes\ ← indexed
Alternatively, ask your AI agent at any time:
> index_documents("C:\\Users\\you\\ProjectA\\docs")
This re-indexes on demand without changing any config file.
⚙️ Usage & Configuration
The installer handles configuration automatically. If you need to configure manually (e.g. for Cursor or other MCP hosts), add this to your mcp_config.json:
{
"mcpServers": {
"markdown-rag": {
"command": "python",
"args": ["C:\\absolute\\path\\to\\Markdown-RAG-MCP-Server\\server.py"]
}
}
}
Windows tip: If
pythondoesn't work in the"command"field, use the full path to your Python executable:"C:\\Users\\you\\AppData\\Local\\Programs\\Python\\Python313\\python.exe".
envvariables (both optional):
RAG_DOCS_PATH— folder to scan for.mdfiles (default: parent dir of the server)RAG_DB_PATH— where to store the vector database (default:chroma_db/inside the server folder)
🛠️ MCP Tools
Once connected, your AI assistant will have access to three new automated tools:
index_documents(docs_path?)
Indexes all .md files found in your documentation folder into ChromaDB.
Note: You only need to run this once, or whenever you substantially update your documentation files.
search_docs(query, n_results?, category?, filename?)
Performs the hybrid search across your indexed docs. Your AI agent can use this tool to ask questions and optionally filter down the search space.
rag_status()
Returns current index statistics, including file count, chunk count, and categories loaded in memory.
🏷️ Auto-Categorization
The server automatically derives a category for every indexed file without any hardcoded rules. It uses a three-priority system:
Priority 1: YAML Frontmatter (Explicit override)
Add a category: key to the top of any .md file to force a specific category:
---
category: architecture
---
# System Design
Priority 2: H1 Heading (Automatic, zero-effort)
If there is no frontmatter, the server reads your file's first # Title heading and uses that as the category. Since every document already has a title, categorization is completely automatic with no folders or config needed.
# Game Audio Design Document → category: game audio design document
# API Endpoints Reference → category: api endpoints reference
Priority 3: Filename (Ultimate fallback)
If there's no H1 heading either, the filename stem is used as the category.
system_overview.md → category: system overview
Want a fixed folder instead of scanning the whole project? Set
RAG_DOCS_PATHin your MCP configenvor directly modify theDOCS_PATHdefault inserver.py.
❓ FAQ
Q: Does this send my documentation data anywhere?
A: No. Everything runs 100% locally on your machine. The embedding and reranking models are downloaded from HuggingFace once during installation. After that, the server can run entirely offline. There are no API keys required and zero usage costs.
Q: Do I need a dedicated GPU to run this?
A: No. The server uses highly optimized, small-parameter NLP models (the MiniLM family). They are specifically designed for fast CPU inference, meaning searches run in milliseconds on standard processors without requiring a heavy GPU.
Q: How do I update the index when my documents change?
A: Simply ask your AI agent to call the index_documents() tool again. It will automatically clear the old collection and re-index all current .md files.
Q: Why does the very first search take a few seconds?
A: The Cross-Encoder reranking model is loaded lazily into RAM on the first query. This is an intentional design choice to save background memory while your IDE is idle. All subsequent searches execute instantly.
Q: Does it support my language?
A: Yes. The default embedding model (paraphrase-multilingual-MiniLM-L12-v2) natively supports over 50 languages, including English, Russian, Spanish, Chinese, and more. Semantic matching works even if the query and the document are in different languages.
📄 License
Licensed under the Apache License 2.0. See the LICENSE file for more details. Free to use, modify, and distribute for personal and commercial usage.