MCP Doc Reader

English | 中文

A Model Context Protocol (MCP) server that enables AI assistants to read and extract content from PDF, Excel, and Word documents.

Features

PDF Reading: Extract text content from PDF files using pdfminer.six
Excel Reading: Read .xlsx and .xls files with formatted table output
Word Reading: Extract text and tables from .docx files
Cross-Platform: Works on Windows, Linux, and macOS
Unicode Support: Full support for non-ASCII characters (Chinese, Japanese, etc.)

Installation

Using uvx (Recommended)

uvx mcp-doc-reader

Using pip

pip install mcp-doc-reader

From Source

git clone https://github.com/yourusername/mcp-doc-reader.git
cd mcp-doc-reader
pip install -e .

Configuration

Add the following to your MCP client configuration (e.g., Claude Desktop, Cursor):

Option 1: Using uvx (Recommended)

{
  "mcpServers": {
    "DocReader": {
      "command": "uvx",
      "args": ["mcp-doc-reader"]
    }
  }
}

Option 2: Using pip-installed command

{
  "mcpServers": {
    "DocReader": {
      "command": "mcp-doc-reader"
    }
  }
}

Option 3: Windows with Unicode Support

For Windows systems with non-ASCII file paths (e.g., Chinese characters):

{
  "mcpServers": {
    "DocReader": {
      "command": "cmd",
      "args": [
        "/c",
        "chcp 65001 >nul && uvx mcp-doc-reader"
      ]
    }
  }
}

Option 4: Linux/macOS with Python module

{
  "mcpServers": {
    "DocReader": {
      "command": "python",
      "args": ["-m", "docreader"]
    }
  }
}

Available Tools

`read_pdf`

Read text content from a PDF file.

Parameters:

file_path (string, required): Absolute path to the PDF file

Example:

{
  "name": "read_pdf",
  "arguments": {
    "file_path": "/path/to/document.pdf"
  }
}

`read_excel`

Read content from an Excel file (.xlsx or .xls).

Parameters:

file_path (string, required): Absolute path to the Excel file

Example:

{
  "name": "read_excel",
  "arguments": {
    "file_path": "/path/to/spreadsheet.xlsx"
  }
}

`read_word`

Read text content from a Word file (.docx).

Parameters:

file_path (string, required): Absolute path to the Word file

Example:

{
  "name": "read_word",
  "arguments": {
    "file_path": "/path/to/document.docx"
  }
}

Usage Examples

Once configured, you can ask your AI assistant to:

"Read the contents of /path/to/report.pdf"
"Extract data from /path/to/data.xlsx"
"What does the document /path/to/memo.docx contain?"

Development

Setup Development Environment

git clone https://github.com/yourusername/mcp-doc-reader.git
cd mcp-doc-reader
pip install -e ".[dev]"

Run Tests

pytest

Build Package

pip install build
python -m build

Publish to PyPI

pip install twine
twine upload dist/*

Project Structure

mcp-doc-reader/
├── src/
│   └── docreader/
│       ├── __init__.py
│       ├── __main__.py
│       ├── server.py
│       └── readers/
│           ├── __init__.py
│           ├── pdf_reader.py
│           ├── excel_reader.py
│           └── word_reader.py
├── examples/
│   ├── mcp_config_pip.json
│   ├── mcp_config_uvx.json
│   ├── mcp_config_windows.json
│   └── mcp_config_linux.json
├── pyproject.toml
├── README.md
└── LICENSE

Troubleshooting

Windows: Unicode/Chinese filename issues

If you encounter issues with non-ASCII characters in file paths on Windows, use the Windows-specific configuration that sets the code page to UTF-8:

{
  "mcpServers": {
    "DocReader": {
      "command": "cmd",
      "args": ["/c", "chcp 65001 >nul && mcp-doc-reader"]
    }
  }
}

.doc files not supported

The Word reader only supports .docx format. To read .doc files, please convert them to .docx first using Microsoft Word or LibreOffice.

License

MIT License - see LICENSE for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

MCP Servers

MCP Doc Reader

Features

Installation

Using uvx (Recommended)

Using pip

From Source

Configuration

Option 1: Using uvx (Recommended)

Option 2: Using pip-installed command

Option 3: Windows with Unicode Support

Option 4: Linux/macOS with Python module

Available Tools

`read_pdf`

`read_excel`

`read_word`

Usage Examples

Development

Setup Development Environment

Run Tests

Build Package

Publish to PyPI

Project Structure

Troubleshooting

Windows: Unicode/Chinese filename issues

.doc files not supported

License

Contributing

安装包（如果需要）

Cursor 配置 (mcp.json)

MCP Doc Reader

Features

Installation

Using uvx (Recommended)

Using pip

From Source

Configuration

Option 1: Using uvx (Recommended)

Option 2: Using pip-installed command

Option 3: Windows with Unicode Support

Option 4: Linux/macOS with Python module

Available Tools

read_pdf

read_excel

read_word

Usage Examples

Development

Setup Development Environment

Run Tests

Build Package

Publish to PyPI

Project Structure

Troubleshooting

Windows: Unicode/Chinese filename issues

.doc files not supported

License

Contributing

安装包 （如果需要）

Cursor 配置 (mcp.json)

`read_pdf`

`read_excel`

`read_word`

安装包（如果需要）