Self-RAG MCP Server

A backend-only Model Context Protocol (MCP) server for self-reflective retrieval augmented generation. It can answer simple questions directly, retrieve indexed PDF content from Qdrant Cloud, and use Tavily web search as a bounded fallback.

Features

FastMCP server with local stdio and Streamable HTTP transports.
LangGraph workflow with retrieval decisions, relevance grading, source extraction, groundedness checks, and bounded web-search retries.
Qdrant Cloud hybrid retrieval using Gemini dense embeddings and local BM25 sparse embeddings.
Idempotent PDF ingestion with stable chunk IDs and replacement of stale chunks when a file is re-ingested.
User-facing citations with PDF page numbers or web URLs.
Lazy external clients: the server can start with empty API keys and report configuration errors through MCP tools.

Architecture

Open the standalone LangGraph visual guide in a browser for an interactive execution map, node-by-node reference, state dictionary, routing rules, and ingestion pipeline.

flowchart TD
    client["MCP client"] --> server["FastMCP server"]
    server --> graph["LangGraph Self-RAG workflow"]
    graph --> gemini["Gemini chat and embeddings"]
    graph --> qdrant["Qdrant Cloud hybrid collection"]
    graph --> tavily["Tavily web search"]

The graph contains ten nodes:

decide_retrieval
  |-- no  --> generate_direct --> END
  `-- yes --> retrieve --> is_relevant
                         |-- yes --> extract_sources --> generate_from_context
                         |                               --> grade_hallucination --> END or retry
                         `-- no  --> rewrite_query --> web_search --> is_relevant

Retries stop after MAX_WEB_ATTEMPTS. If no useful evidence is found, no_docs_fallback returns a
clear response.

MCP Tools

| Tool | Purpose | | --- | --- | | query_knowledge_base(question) | Answer a question through the Self-RAG graph. | | ingest_documents(file_paths) | Index server-side PDF files in Qdrant Cloud. | | get_knowledge_base_info() | Report collection status and indexed chunk count. |

Query results include answer, sources, is_grounded, answer_status, used_retrieval, used_web, and web_search_attempts.

answer_status distinguishes verified RAG answers from direct general-knowledge answers:

| Status | Meaning | | --- | --- | | verified | The answer passed the context-groundedness check. | | direct_unverified | The graph answered directly from Gemini general knowledge. | | unverified | The answer could not be fully supported after retrying. | | not_found | No useful knowledge-base or web evidence was found. | | error | Configuration or external-service failure. |

Requirements

Python 3.12 or newer.
A Gemini API key.
A Qdrant Cloud cluster URL and API key.
A Tavily API key if web-search fallback is required.

Setup

git clone https://github.com/vvanshkkumar/Self-RAG-MCP-Server.git
cd Self-RAG-MCP-Server
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
cp .env.example .env

Add your keys to .env. Real keys must never be committed.

GOOGLE_API_KEY=
QDRANT_URL=
QDRANT_API_KEY=
TAVILY_API_KEY=

The defaults use gemini-2.5-flash and gemini-embedding-001. Both can be changed through .env without editing Python files.

Environment Variables

| Variable | Default | Purpose | | --- | --- | --- | | GOOGLE_API_KEY | empty | Gemini chat and embedding API key. | | GEMINI_CHAT_MODEL | gemini-2.5-flash | Chat model for generation and grading. | | GEMINI_EMBEDDING_MODEL | gemini-embedding-001 | Dense embedding model. | | GEMINI_EMBEDDING_DIMENSIONS | 768 | Dense Qdrant vector size. | | QDRANT_URL | empty | Qdrant Cloud cluster URL. | | QDRANT_API_KEY | empty | Qdrant Cloud API key. | | QDRANT_COLLECTION | self_rag_documents_v1 | Hybrid collection name. | | TAVILY_API_KEY | empty | Tavily web-search API key. | | RETRIEVAL_TOP_K | 6 | Hybrid chunks retrieved before grading. | | MAX_WEB_ATTEMPTS | 2 | Maximum Tavily fallback calls per query. | | CHUNK_SIZE | 800 | PDF chunk size in characters. | | CHUNK_OVERLAP | 150 | PDF chunk overlap in characters. | | INGEST_ALLOWED_ROOT | empty | Optional directory boundary for PDF ingestion. | | MCP_HOST | 127.0.0.1 | Streamable HTTP bind address. | | MCP_PORT | 8000 | Streamable HTTP port. |

Changing the embedding model or dimensions requires a new Qdrant collection name and re-ingestion of your PDFs.

Ingest PDFs

Use the CLI for initial ingestion:

source .venv/bin/activate
python ingest.py ./documents/
python ingest.py ./documents/policy.pdf
python ingest.py "./documents/*.pdf"

Folders are searched recursively. PDF extension matching supports .pdf and .PDF.

The MCP ingest_documents tool accepts absolute file paths on the server machine. Set INGEST_ALLOWED_ROOT in .env when you want to prevent ingestion outside a known documents directory.

Run the MCP Server

For desktop clients using stdio:

source .venv/bin/activate
python mcp_server.py

For local Streamable HTTP:

source .venv/bin/activate
python mcp_server.py --transport http --host 127.0.0.1 --port 8000

The HTTP server binds to localhost by default. Do not expose it publicly without authentication and network controls. The ingestion tool can read server-side PDF paths, so INGEST_ALLOWED_ROOT should also be configured before any remote deployment.

Claude Desktop Configuration

Use absolute paths and point the command at this project's virtual-environment Python:

{
  "mcpServers": {
    "self-rag": {
      "command": "/absolute/path/to/Self-RAG-MCP-Server/.venv/bin/python",
      "args": ["/absolute/path/to/Self-RAG-MCP-Server/mcp_server.py"]
    }
  }
}

The server loads API keys from .env, so keys do not need to be duplicated in the client configuration.

Development Checks

source .venv/bin/activate
pytest
ruff check .
ruff format --check .
mypy rag ingest.py mcp_server.py
python -m compileall -q rag ingest.py mcp_server.py tests

Tests use local fakes and FastMCP's in-memory client. They do not require API keys or external network calls.

MCP Servers

Self-RAG MCP Server

Features

Architecture

MCP Tools

Requirements

Setup

Environment Variables

Ingest PDFs

Run the MCP Server

Claude Desktop Configuration

Development Checks

安装包（如果需要）

Cursor 配置 (mcp.json)

Self-RAG MCP Server

Features

Architecture

MCP Tools

Requirements

Setup

Environment Variables

Ingest PDFs

Run the MCP Server

Claude Desktop Configuration

Development Checks

安装包 （如果需要）

Cursor 配置 (mcp.json)

安装包（如果需要）