A MCP server that enables AI assistants to read and extract content from PDF, Excel, and Word documents.一个服务器,赋予 AI 助手读取和提取 PDF、Excel 和 Word 文档内容的能力。
MCP Doc Reader
A Model Context Protocol (MCP) server that enables AI assistants to read and extract content from PDF, Excel, and Word documents.
Features
- PDF Reading: Extract text content from PDF files using
pdfminer.six - Excel Reading: Read
.xlsxand.xlsfiles with formatted table output - Word Reading: Extract text and tables from
.docxfiles - Cross-Platform: Works on Windows, Linux, and macOS
- Unicode Support: Full support for non-ASCII characters (Chinese, Japanese, etc.)
Installation
Using uvx (Recommended)
uvx mcp-doc-reader
Using pip
pip install mcp-doc-reader
From Source
git clone https://github.com/yourusername/mcp-doc-reader.git
cd mcp-doc-reader
pip install -e .
Configuration
Add the following to your MCP client configuration (e.g., Claude Desktop, Cursor):
Option 1: Using uvx (Recommended)
{
"mcpServers": {
"DocReader": {
"command": "uvx",
"args": ["mcp-doc-reader"]
}
}
}
Option 2: Using pip-installed command
{
"mcpServers": {
"DocReader": {
"command": "mcp-doc-reader"
}
}
}
Option 3: Windows with Unicode Support
For Windows systems with non-ASCII file paths (e.g., Chinese characters):
{
"mcpServers": {
"DocReader": {
"command": "cmd",
"args": [
"/c",
"chcp 65001 >nul && uvx mcp-doc-reader"
]
}
}
}
Option 4: Linux/macOS with Python module
{
"mcpServers": {
"DocReader": {
"command": "python",
"args": ["-m", "docreader"]
}
}
}
Available Tools
read_pdf
Read text content from a PDF file.
Parameters:
file_path(string, required): Absolute path to the PDF file
Example:
{
"name": "read_pdf",
"arguments": {
"file_path": "/path/to/document.pdf"
}
}
read_excel
Read content from an Excel file (.xlsx or .xls).
Parameters:
file_path(string, required): Absolute path to the Excel file
Example:
{
"name": "read_excel",
"arguments": {
"file_path": "/path/to/spreadsheet.xlsx"
}
}
read_word
Read text content from a Word file (.docx).
Parameters:
file_path(string, required): Absolute path to the Word file
Example:
{
"name": "read_word",
"arguments": {
"file_path": "/path/to/document.docx"
}
}
Usage Examples
Once configured, you can ask your AI assistant to:
- "Read the contents of /path/to/report.pdf"
- "Extract data from /path/to/data.xlsx"
- "What does the document /path/to/memo.docx contain?"
Development
Setup Development Environment
git clone https://github.com/yourusername/mcp-doc-reader.git
cd mcp-doc-reader
pip install -e ".[dev]"
Run Tests
pytest
Build Package
pip install build
python -m build
Publish to PyPI
pip install twine
twine upload dist/*
Project Structure
mcp-doc-reader/
├── src/
│ └── docreader/
│ ├── __init__.py
│ ├── __main__.py
│ ├── server.py
│ └── readers/
│ ├── __init__.py
│ ├── pdf_reader.py
│ ├── excel_reader.py
│ └── word_reader.py
├── examples/
│ ├── mcp_config_pip.json
│ ├── mcp_config_uvx.json
│ ├── mcp_config_windows.json
│ └── mcp_config_linux.json
├── pyproject.toml
├── README.md
└── LICENSE
Troubleshooting
Windows: Unicode/Chinese filename issues
If you encounter issues with non-ASCII characters in file paths on Windows, use the Windows-specific configuration that sets the code page to UTF-8:
{
"mcpServers": {
"DocReader": {
"command": "cmd",
"args": ["/c", "chcp 65001 >nul && mcp-doc-reader"]
}
}
}
.doc files not supported
The Word reader only supports .docx format. To read .doc files, please convert them to .docx first using Microsoft Word or LibreOffice.
License
MIT License - see LICENSE for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.