MCP sandbox server
MCP DevBench
Production-ready Docker container management server implementing the Model Context Protocol (MCP)
MCP DevBench provides isolated, persistent development workspaces through a secure, audited, and observable container management API. Built for AI assistants like Claude, it enables safe command execution and filesystem operations in Docker containers.
✨ Features
Core Capabilities
- 🚀 Container Lifecycle Management - Create, start, stop, and remove Docker containers with fine-grained control
- 📁 Secure Filesystem Operations - Read, write, delete files with path validation and ETag-based concurrency control
- ⚡ Async Command Execution - Non-blocking execution with streaming output and timeout handling
- 🔐 Enterprise Security - Capability dropping, read-only rootfs, resource limits, and comprehensive audit logging
- 📊 Production Observability - Prometheus metrics, structured JSON logging, and system health monitoring
- ⚙️ True Async I/O - All blocking operations wrapped in thread pools for optimal concurrency
Advanced Features
- Warm Container Pool - Sub-second container provisioning for instant attach
- Graceful Shutdown - Drain active operations before server termination
- Automatic Recovery - Reconciles Docker state with database on startup
- Image Policy Enforcement - Allow-list validation with digest pinning
- Multi-Transport Support - stdio, SSE, or HTTP-based MCP transports
- Flexible Authentication - None, Bearer token, or OIDC authentication modes
🚀 Quick Start
Prerequisites
- Python 3.11+
- Docker Engine
- uv package manager
Installation
# Install uv
pip install uv
# Clone the repository
git clone https://github.com/pvliesdonk/mcp-devbench.git
cd mcp-devbench
# Install dependencies
uv sync
Running the Server
Development Mode (stdio)
uv run python -m mcp_devbench.server
Production Mode (HTTP)
export MCP_TRANSPORT_MODE=streamable-http
export MCP_HOST=0.0.0.0
export MCP_PORT=8000
uv run python -m mcp_devbench.server
Using Docker
docker build -t mcp-devbench .
docker run -v /var/run/docker.sock:/var/run/docker.sock \
-p 8000:8000 \
-e MCP_TRANSPORT_MODE=streamable-http \
mcp-devbench
Using Docker Compose
docker-compose up -d
🏗️ Architecture
┌─────────────────────────────────────────────────────┐
│ MCP DevBench API │
│ (FastMCP Server with Auth) │
└─────────────────┬───────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌──────────┐
│Container│ │ Exec │ │Filesystem│
│ Manager │ │ Manager │ │ Manager │
└────┬────┘ └────┬────┘ └────┬─────┘
│ │ │
└────────────┼────────────┘
│
┌────────┼────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌────┐ ┌──────────┐
│ Docker │ │ DB │ │ Audit │
│ Daemon │ │ │ │ Logger │
└────────┘ └────┘ └──────────┘
Design Patterns:
- Repository Pattern - Data access abstraction
- Manager Pattern - Business logic encapsulation
- Dependency Injection - Loose coupling via factory functions
- Async/Await - Non-blocking I/O throughout with thread pool for blocking operations
📚 Usage Examples
Basic Container Workflow
from mcp_devbench.mcp_tools import *
# 1. Spawn a container
result = await spawn(SpawnInput(
image="python:3.11-slim",
persistent=False,
alias="dev-workspace"
))
container_id = result.container_id
# 2. Attach to container
await attach(AttachInput(
target=container_id,
client_name="my-client",
session_id="session-123"
))
# 3. Execute command
exec_result = await exec_start(ExecInput(
container_id=container_id,
cmd=["python", "--version"],
timeout_s=30
))
# 4. Poll for output
output = await exec_poll(ExecPollInput(
exec_id=exec_result.exec_id,
after_seq=0
))
# 5. Write a file
await fs_write(FileWriteInput(
container_id=container_id,
path="/workspace/hello.py",
content=b"print('Hello, World!')"
))
# 6. Clean up
await kill(KillInput(
container_id=container_id,
force=True
))
⚙️ Configuration
All configuration is managed through environment variables with the MCP_ prefix.
Essential Settings
| Variable | Default | Description |
|----------|---------|-------------|
| MCP_TRANSPORT_MODE | streamable-http | Transport: stdio, sse, or streamable-http |
| MCP_HOST | 0.0.0.0 | Server bind address (HTTP transports only) |
| MCP_PORT | 8000 | Server port (HTTP transports only) |
| MCP_ALLOWED_REGISTRIES | docker.io,ghcr.io | Comma-separated allowed registries |
| MCP_LOG_LEVEL | INFO | Logging level: DEBUG, INFO, WARNING, ERROR |
| MCP_LOG_FORMAT | json | Log format: json or text |
Authentication
| Variable | Default | Description |
|----------|---------|-------------|
| MCP_AUTH_MODE | none | Auth mode: none, bearer, or oidc |
| MCP_BEARER_TOKEN | - | Bearer token (when auth_mode=bearer) |
| MCP_OAUTH_CLIENT_ID | - | OIDC client ID |
| MCP_OAUTH_CLIENT_SECRET | - | OIDC client secret |
| MCP_OAUTH_CONFIG_URL | - | OIDC discovery URL |
Advanced Settings
| Variable | Default | Description |
|----------|---------|-------------|
| MCP_STATE_DB | ./state.db | SQLite database path |
| MCP_DRAIN_GRACE_S | 60 | Shutdown grace period (seconds) |
| MCP_TRANSIENT_GC_DAYS | 7 | Transient container retention (days) |
| MCP_WARM_POOL_ENABLED | true | Enable warm container pool |
| MCP_DEFAULT_IMAGE_ALIAS | python:3.11-slim | Default warm pool image |
Example Configurations
Local Development (stdio)
MCP_TRANSPORT_MODE=stdio
MCP_AUTH_MODE=none
MCP_LOG_LEVEL=DEBUG
MCP_LOG_FORMAT=text
Production (HTTP + OIDC)
MCP_TRANSPORT_MODE=streamable-http
MCP_HOST=0.0.0.0
MCP_PORT=8000
MCP_AUTH_MODE=oidc
MCP_OAUTH_CLIENT_ID=your-client-id
MCP_OAUTH_CLIENT_SECRET=your-secret
MCP_OAUTH_CONFIG_URL=https://auth.example.com/.well-known/openid-configuration
MCP_LOG_LEVEL=INFO
MCP_LOG_FORMAT=json
🔧 MCP Tools Reference
Container Management
spawn
Create and start a new container.
Input:
image(string) - Docker image referencepersistent(boolean) - Persist across restartsalias(string, optional) - User-friendly namettl_s(integer, optional) - Time-to-live for transient containers
Output:
container_id(string) - Opaque container IDalias(string) - Container aliasstatus(string) - Container status
attach
Attach a client to a container for session tracking.
Input:
target(string) - Container ID or aliasclient_name(string) - Client identifiersession_id(string) - Session identifier
Output:
container_id(string) - Actual container IDalias(string) - Container aliasroots(array) - Workspace roots
kill
Stop and remove a container.
Input:
container_id(string) - Container to removeforce(boolean) - Force immediate removal
Output:
status(string) - Operation status
Command Execution
exec_start
Start command execution in a container.
Input:
container_id(string) - Target containercmd(array) - Command and argumentscwd(string) - Working directory (default:/workspace)env(object) - Environment variablesas_root(boolean) - Execute as roottimeout_s(integer) - Execution timeoutidempotency_key(string) - Prevent duplicate execution
Output:
exec_id(string) - Execution IDstatus(string) - Initial status
exec_cancel
Cancel a running execution.
exec_poll
Poll for execution output and status.
Input:
exec_id(string) - Execution IDafter_seq(integer) - Return messages after sequence number
Output:
messages(array) - Stream messagescomplete(boolean) - Execution complete flag
Filesystem Operations
fs_read
Read a file from container workspace.
Output includes: content, etag, size, mime_type
fs_write
Write a file to container workspace.
Supports: ETag-based concurrency control via if_match_etag
fs_delete
Delete a file or directory.
fs_stat
Get file/directory metadata.
fs_list
List directory contents.
System & Monitoring
system_status
Get system health and status.
Output:
- Docker connectivity status
- Active containers/attachments count
- Database status
- Server version
metrics
Retrieve Prometheus-formatted metrics.
reconcile
Manually trigger container reconciliation.
garbage_collect
Trigger manual garbage collection.
list_containers / list_execs
List all containers or active executions.
🔐 Security
Built-in Security Features
- Capability Dropping - All Linux capabilities dropped by default
- Read-Only Root Filesystem - Prevents container modification
- Resource Limits - 512MB memory, 1 CPU, 256 PID limit per container
- Path Validation - Prevents directory traversal attacks
- Image Allow-List - Only approved registries allowed
- Audit Logging - Complete audit trail with PII redaction
- User Isolation - Configurable UID (default 1000)
Security Best Practices
- Use OIDC Authentication in production
- Restrict allowed registries to trusted sources only
- Enable audit logging and monitor for suspicious activity
- Run with least privilege - never run as root
- Keep images updated - use digest pinning for reproducibility
- Isolate network access - use Docker network policies
📊 Observability
Structured Logging
All operations are logged in JSON format with:
- ISO8601 timestamps
- Correlation IDs
- Contextual metadata
- Automatic PII redaction
Prometheus Metrics
Available via the metrics tool:
mcp_devbench_container_spawns_total- Container creation countmcp_devbench_exec_total- Command execution countmcp_devbench_exec_duration_seconds- Execution duration histogrammcp_devbench_fs_operations_total- Filesystem operation countmcp_devbench_active_containers- Active container gaugemcp_devbench_memory_usage_bytes- Container memory usage
Audit Events
All operations generate audit events:
CONTAINER_SPAWN,CONTAINER_ATTACH,CONTAINER_KILLEXEC_START,EXEC_CANCELFS_READ,FS_WRITE,FS_DELETESYSTEM_RECONCILE,SYSTEM_GC
🧪 Development
For detailed development guidelines, see our Project Style Guide.
Running Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=mcp_devbench --cov-report=html
# Run specific test file
uv run pytest tests/unit/test_container_manager.py
# Run integration tests only
uv run pytest tests/integration/
Code Quality
# Lint with ruff
uv run ruff check .
# Format code
uv run ruff format .
# Type checking (recommended)
uv run pyright src/
Project Structure
mcp-devbench/
├── src/mcp_devbench/
│ ├── config/ # Configuration management
│ ├── models/ # SQLAlchemy ORM models
│ ├── managers/ # Business logic layer
│ ├── repositories/ # Data access layer
│ ├── utils/ # Utilities (logging, Docker, metrics)
│ ├── server.py # FastMCP server
│ └── mcp_tools.py # Pydantic models for MCP
├── tests/
│ ├── unit/ # Unit tests
│ └── integration/ # Integration tests
├── alembic/ # Database migrations
└── .github/workflows/ # CI/CD pipelines
📈 Project Status
Current Version: 0.1.0
Completed Epics
✅ Epic 1: Foundation Layer - Configuration, state store, Docker lifecycle ✅ Epic 2: Command Execution - Async exec, streaming, idempotency ✅ Epic 3: Filesystem Operations - CRUD, batch ops, import/export ✅ Epic 4: MCP Integration - Tools, resources, streaming transport ✅ Epic 5: Security - Image policy, hardening, warm pool ✅ Epic 6: State Management - Shutdown, recovery, maintenance ✅ Epic 7: Observability - Audit logging, metrics, admin tools
Test Coverage: ~72% (201 tests) Code Quality: Zero linting issues (ruff) Production Ready: Yes, for small-to-medium deployments
Roadmap
See IMPLEMENTATION_ROADMAP.md for detailed future plans.
🤝 Contributing
We welcome contributions! Please see our Contributing Guide for details on:
- Development workflow
- Testing guidelines
- Code style requirements
- Submission process
Important: We use a main/next branching model. See BRANCHING_STRATEGY.md for details.
Quick Contribution Steps
- Fork the repository
- Create a feature branch from
next:git checkout next && git pull && git checkout -b feature/amazing-feature - Make your changes and add tests
- Run tests:
uv run pytest - Lint code:
uv run ruff check . - Commit with conventional commits:
git commit -m "feat: add amazing feature" - Push and create a Pull Request to the
nextbranch
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Resources
- Documentation: docs/
- Issue Tracker: GitHub Issues
- Discussions: GitHub Discussions
- Changelog: CHANGELOG.md
💬 Support
- Questions? Open a Discussion
- Bug Reports: File an Issue
- Security Issues: See SECURITY.md
Built with ❤️ using FastMCP, Docker, and modern Python async