ENA REST MCP - Model Context Protocol Server

Expose a carefully selected subset of European Nucleotide Archive (ENA) REST services through the Model Context Protocol (MCP), enabling AI agents to interact with ENA programmatically in a safe, structured, and reproducible way.

Overview

The European Nucleotide Archive (ENA) provides a rich set of REST APIs that allow users to query genomic metadata, sequence records, and submission information. This project bridges ENA services with modern AI agents through MCP.

Key Features

Flexible Data Query Interface: Support for natural language, structured queries, and API endpoint filtering
Multi-Format Response Support: CSV, Excel, JSON outputs with downloadable reports
Downloadable Summary & Reports: Concise summaries with analytics and links to raw datasets
Internal Authentication: Secure handling of authentication for protected ENA services
Performance Optimized: Fast response times and high accuracy

Project Objectives

Monolith MCP Server: Single MCP server exposing REST APIs from multiple ENA services
Data Access: Users can request data in simple formats or plain text with clear summaries
Full Stack Delivery: Backend MCP service with proper test cases and optional UI

Services Integrated

Core Services

Webin-REST: ENA core service for metadata storage and access (requires authentication)
Webin-Report: Service for reporting queries from ENA databases (requires authentication)
EBI Search Service: BM25 text-based search service (https://www.ebi.ac.uk/ebisearch/ws/rest/sra-sample)
ENA Browser: Public access to genomic metadata

Project Structure

ENA-MCP/
├── src/
│   ├── mcp_server/        # MCP server implementation
│   ├── ena_api/           # ENA API client and wrappers
│   ├── utils/             # Utility functions
│   └── __init__.py
├── tests/                 # Test suite
├── docs/                  # Documentation
├── config/                # Configuration files
├── requirements.txt       # Python dependencies
├── setup.py              # Package setup
├── .gitignore            # Git ignore rules
├── .env.example          # Environment variables template
└── README.md             # This file

Installation

Prerequisites

Python 3.8+
pip or conda

Setup

Clone the repository:

git clone https://github.com/Sumithraju/ena-mcp.git
cd ENA-MCP

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Configure environment variables:

cp .env.example .env
# Edit .env with your ENA credentials (if needed)

Usage

Start the MCP Server

python src/mcp_server/main.py

Example Queries

Natural Language Query:

"Show widely used Mpox samples"
"Give me latest Covid related submission details"

Structured Query:

{
  "query_type": "sample_search",
  "keywords": ["Mpox"],
  "fields": ["acc", "description", "name"],
  "format": "json",
  "limit": 100
}

API Endpoints

The MCP server exposes the following main tools:

search_samples: Search ENA sample records
search_studies: Query study metadata
search_runs: Look up run/sequence records
get_submission_details: Retrieve submission information
download_data: Generate downloadable reports

Configuration

Configuration is managed through:

.env file for credentials
config/ directory for server settings
Command-line arguments for runtime options

Testing

Run the test suite:

pytest tests/ -v

Generate coverage report:

pytest tests/ --cov=src --cov-report=html

Documentation

API Reference: See docs/API.md
Architecture: See docs/ARCHITECTURE.md
ENA Programmatic Access: https://ena-docs.readthedocs.io/en/latest/retrieval/programmatic-access.html

Development

Contributing

Create a feature branch: git checkout -b feature/your-feature
Make your changes and write tests
Commit with clear messages: git commit -m "Add feature: description"
Push to your fork: git push origin feature/your-feature
Create a Pull Request

Code Style

This project follows PEP 8 style guidelines. Use black and flake8 for formatting and linting:

black src/
flake8 src/

Performance Targets

Response time: < 2 seconds for typical queries
Test coverage: ≥ 95%
Support for large datasets with downloadable reports

Project Timeline (GSoC 2026)

Phase 1: Research and setup
Phase 2: Core MCP server implementation
Phase 3: ENA API integration
Phase 4: Testing and optimization
Phase 5: Documentation and UI (optional)

References

ENA Browser: https://www.ebi.ac.uk/ena/browser/home
Webin-Reports Swagger: https://www.ebi.ac.uk/ena/submit/report/swagger-ui/index.html
ENA Programmatic Access: https://ena-docs.readthedocs.io/en/latest/retrieval/programmatic-access.html
FastMCP: https://gofastmcp.com/getting-started/welcome
MCP Python SDK: https://github.com/modelcontextprotocol/python-sdk
GSoC 2026 Timeline: https://developers.google.com/open-source/gsoc/timeline

License

This project is part of EMBL-EBI's Google Summer of Code 2026 initiative.

Author

Sumithra Raju - GSoC 2026 Contributor at EMBL-EBI

Support

For issues, questions, or contributions, please open an issue on GitHub or contact the development team.

Last Updated: March 16, 2026

MCP Servers