Code Execution with MCP - Template Repository

A production-ready template for building AI agents using the Code Execution with MCP pattern. This harness enables AI agents to dynamically discover and execute MCP tools through secure, sandboxed code execution.

Inspired by: This template implements the architectural patterns and design philosophy from Anthropic's Code Execution with MCP engineering blog post and their Skills Repository. We are grateful to Anthropic for openly sharing these patterns.

🌟 Key Features

Dynamic Tool Discovery - Tools discovered at runtime using list_mcp_tools() and get_mcp_tool_details() (no static files)
Secure Sandbox Execution - Docker-based isolation with resource limits, read-only filesystem, and network restrictions
PII Protection - Automatic tokenization/de-tokenization of sensitive data
Persistent Skills - /skills directory for reusable agent code
Ephemeral Workspace - /workspace directory for temporary task files
Multi-Turn Conversations - Support for complex agent workflows
Extensible Architecture - Easy to customize and extend

💡 Why Code Execution?

The Token Efficiency Problem: Traditional AI agents must describe every computational step in natural language, consuming valuable context window space. Processing 1,000 records might use 50,000 tokens just to describe the transformations.

The Solution: Code execution lets agents write and run code, delegating computation to traditional software while focusing their intelligence on high-level reasoning. The same 1,000-record task uses just ~500 tokens of code.

Key Benefits:

📊 Scalability: Handle tasks of any complexity within token limits
🔄 Reusability: Save code to /skills for future use
🔒 Privacy: PII tokenized before reaching the LLM
🎯 Reliability: Deterministic code execution vs. natural language descriptions

📖 Read the full philosophy in docs/PHILOSOPHY.md - explains the "why" behind this architecture based on Anthropic's research.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    User / Application                        │
└────────────────────┬────────────────────────────────────────┘
                     │ HTTP Request
                     ▼
┌─────────────────────────────────────────────────────────────┐
│                  Agent Orchestrator                          │
│  ┌──────────────┐  ┌──────────────┐  ┌─────────────────┐   │
│  │ AgentManager │◄─┤ PII Censor   │◄─┤ MCP Client      │   │
│  └──────┬───────┘  └──────────────┘  └─────────────────┘   │
│         │                                                     │
│         ▼                                                     │
│  ┌──────────────┐                                           │
│  │ LLM Provider │ (OpenAI, Anthropic, etc.)                 │
│  └──────┬───────┘                                           │
│         │                                                     │
│         ▼                                                     │
│  ┌──────────────────────────────────┐                       │
│  │   Sandbox Manager (Docker)       │                       │
│  └──────┬───────────────────────────┘                       │
└─────────┼────────────────────────────────────────────────────┘
          │
          ▼
┌─────────────────────────────────────────────────────────────┐
│              Secure Docker Container                         │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  Agent Code Execution                                 │   │
│  │  - Runtime API (callMCPTool, fs, utils)             │   │
│  │  - Dynamic Tool Discovery                            │   │
│  │  - /skills (persistent, mounted)                     │   │
│  │  - /workspace (ephemeral, mounted)                   │   │
│  └──────────────────────────────────────────────────────┘   │
│                                                              │
│  Security: Non-root user, read-only rootfs, resource limits │
└─────────────────────────────────────────────────────────────┘
          │
          │ Authenticated API Call
          ▼
┌─────────────────────────────────────────────────────────────┐
│                    MCP Servers                               │
│  (File System, Databases, APIs, Custom Tools)               │
└─────────────────────────────────────────────────────────────┘

🚀 Quick Start

Prerequisites

Node.js >= 18.0.0
Docker (for sandbox execution)
TypeScript knowledge

Installation

# Clone the repository
git clone <your-repo-url>
cd code-execution-with-MCP

# Install dependencies
npm install

# Build the project
npm run build

# Build the Docker sandbox image
npm run build-sandbox

# Create required directories
npm run prepare-workspace

# Start the server
npm start

Development

# Run in development mode with auto-reload
npm run dev

# Type checking only
npm run type-check

# Clean build artifacts
npm run clean

📁 Project Structure

mcp-code-exec-harness/
├── src/
│   ├── agent_orchestrator/     # Main agent logic
│   │   ├── AgentManager.ts     # Agent execution loop
│   │   └── prompt_templates.ts # System prompts
│   │
│   ├── sandbox_manager/        # Secure code execution
│   │   ├── SandboxManager.ts   # Abstract interface
│   │   └── DockerSandbox.ts    # Docker implementation
│   │
│   ├── mcp_client/            # MCP communication
│   │   ├── McpClient.ts       # MCP server client
│   │   └── PiiCensor.ts       # PII tokenization
│   │
│   ├── agent_runtime/         # Sandbox runtime API
│   │   └── runtime_api.ts     # Injected helper functions
│   │
│   ├── tools_interface/       # Dynamic tool discovery
│   │   └── DynamicToolManager.ts
│   │
│   └── index.ts               # Main server entry point
│
├── servers/                   # MCP server collection (NEW!)
│   ├── official/              # Official MCP servers
│   ├── archived/              # Archived reference servers
│   ├── community/             # Community-contributed servers
│   ├── README.md              # Server collection documentation
│   ├── catalog.json           # Structured server index
│   └── QUICKSTART.md          # Quick start guide
│
├── skills/                    # Persistent agent skills (user-specific)
├── workspace/                 # Ephemeral execution workspace
├── Dockerfile.sandbox         # Secure sandbox container
├── package.json
├── tsconfig.json
└── README.md

🔧 Configuration

Environment Variables

Create a .env file in the root directory:

# Server Configuration
PORT=3000
NODE_ENV=development

# Sandbox Configuration
SANDBOX_IMAGE=sandbox-image-name
SANDBOX_TIMEOUT_MS=30000
SANDBOX_MEMORY_MB=100
SANDBOX_CPU_QUOTA=50000

# LLM Provider (configure for your provider)
LLM_API_KEY=your-api-key-here
LLM_MODEL=your-model-name

# MCP Servers (customize for your setup)
# Add your MCP server configurations here

Customizing the Agent

Implement LLM Integration - Edit src/agent_orchestrator/AgentManager.ts:

async function callLLM(prompt: string, tools: any[]): Promise<LLMResponse> {
  // Add your LLM API call here
  // Examples: OpenAI, Anthropic, Google Gemini, etc.
}

Connect MCP Servers - Edit src/mcp_client/McpClient.ts:

private initializeServers(): void {
  // Add your MCP server connections
  // Use @modelcontextprotocol/sdk
}

Customize System Prompts - Edit src/agent_orchestrator/prompt_templates.ts
Adjust Sandbox Security - Edit src/sandbox_manager/DockerSandbox.ts

🔐 Security Features

Sandbox Isolation

Non-root execution - Runs as sandboxuser
Read-only root filesystem - Prevents system modifications
Resource limits - CPU and memory constraints
Network restrictions - Configurable network access
Capability dropping - Minimal container privileges

PII Protection

Automatic detection and tokenization of:

Email addresses
Phone numbers
Social Security Numbers
Credit card numbers
IP addresses
Custom patterns (extensible)

Authentication

Session-specific auth tokens for sandbox ↔ host communication
Validate tokens in production deployment

📚 Usage Examples

Making a Request

curl -X POST http://localhost:3000/task \
  -H "Content-Type: application/json" \
  -d '{
    "userId": "user123",
    "task": "Analyze the latest sales data and create a summary report"
  }'

Agent Code Example

The agent writes code like this (executed in sandbox):

// 1. Discover available tools
const tools = await list_mcp_tools();
console.log("Available tools:", tools);

// 2. Get tool details
const dbTool = await get_mcp_tool_details("database__query");
console.log("Tool info:", dbTool.description);

// 3. Execute tools
const salesData = await callMCPTool("database__query", {
  query: "SELECT * FROM sales WHERE date > '2024-01-01'"
});

// 4. Process data in code
const summary = salesData.reduce((acc, sale) => {
  acc.total += sale.amount;
  acc.count += 1;
  return acc;
}, { total: 0, count: 0 });

// 5. Save to skills for reuse
await fs.writeFile('/skills/sales_summary.js', `
  module.exports = async function summarizeSales(data) {
    return data.reduce((acc, sale) => {
      acc.total += sale.amount;
      acc.count += 1;
      return acc;
    }, { total: 0, count: 0 });
  };
`);

// 6. Return results
return { summary, totalSales: summary.total, count: summary.count };

🛠️ Extending the Template

MCP Servers Collection

This repository includes a comprehensive collection of 18 MCP servers organized for progressive discovery:

📦 7 Official Servers - Filesystem, Git, Memory, Fetch, Everything, Time, Sequential Thinking
🗄️ 5 Archived Servers - PostgreSQL, Redis, SQLite, Puppeteer, Sentry
🌍 6 Community Servers - MongoDB, GreptimeDB, Unstructured, Semgrep, MCP Installer, PostgreSQL Community Fork

Quick Start:

# Browse the server collection
cd servers/

# Read the documentation
cat README.md

# Check the quick start guide
cat QUICKSTART.md

# View the structured catalog
cat catalog.json

Documentation:

servers/README.md - Complete server collection documentation
servers/QUICKSTART.md - Quick start guide with common use cases
servers/catalog.json - Structured server index for programmatic discovery
Category-specific READMEs in servers/official/, servers/archived/, and servers/community/

Adding New MCP Servers

// In src/mcp_client/McpClient.ts
async addServer(config: MCPServerConfig): Promise<void> {
  const client = new Client({
    name: config.name,
    version: '1.0.0'
  }, {
    capabilities: { tools: {} }
  });

  const transport = new StdioClientTransport({
    command: config.command,
    args: config.args
  });

  await client.connect(transport);

  // Discover and register tools
  const tools = await client.listTools();
  tools.forEach(tool => this.registerTool(tool));
}

Example configurations for servers from the collection:

// Filesystem server (official)
await this.addServer({
  name: 'filesystem',
  command: 'npx',
  args: ['@modelcontextprotocol/server-filesystem', '/workspace', '/skills']
});

// MongoDB server (community)
await this.addServer({
  name: 'mongodb',
  command: 'npx',
  args: ['-y', 'mongodb-mcp-server', '--readOnly'],
  env: { MDB_MCP_CONNECTION_STRING: process.env.MONGODB_URI }
});

// Git server (official)
await this.addServer({
  name: 'git',
  command: 'npx',
  args: ['mcp-server-git']
});

Custom PII Patterns

// In your code
const piiCensor = new PiiCensor();
piiCensor.addPattern('custom_id', /\bID-\d{6}\b/g);

Alternative Sandbox Implementations

Extend SandboxManager to create custom execution environments:

WebAssembly-based sandboxes
Cloud function execution
Process-based isolation

🧪 Testing

# Test the sandbox
curl -X POST http://localhost:3000/task \
  -H "Content-Type: application/json" \
  -d '{
    "userId": "test",
    "task": "Write a simple hello world function and save it to skills"
  }'

# Check health
curl http://localhost:3000/health

📖 Documentation & References

Core Documentation

PHILOSOPHY.md - ⭐ Start here! Explains the "why" behind code execution, token efficiency, and design principles based on Anthropic's research
QUICK_START.md - Get running in 5 minutes
ARCHITECTURE.md - Technical deep dive into system components
SECURITY.md - Security best practices and hardening checklist
DEPLOYMENT.md - Production deployment guides (Docker, K8s, Cloud)
API_EXAMPLES.md - Usage examples and patterns

Skills & Examples

skills/examples/ - Example skills following the Anthropic skills pattern
- template-skill/ - Template for creating new skills
- data-processor/ - Token-efficient data transformation example

External References

Code Execution with MCP - Anthropic's engineering blog post describing the dynamic execution model and philosophy
Anthropic Skills Repository - Open-source examples of skills that extend agent capabilities
Equipping Agents for the Real World with Agent Skills - Philosophy behind persistent agent capabilities
Model Context Protocol Documentation - MCP specification and guides
Docker Security Best Practices - Container security hardening

🤝 Contributing & Community Collaboration

This is a template repository that represents a new paradigm in AI agent development - one where code execution, security, and persistent capabilities work together seamlessly. We believe this approach has the potential to transform how AI agents are built and deployed at scale.

We're Inviting You to Build This Together

The open-source community is fundamental to advancing this paradigm. We welcome contributions in all forms:

Areas We're Looking For Help

LLM Integrations - Add support for more providers (Claude, GPT-4, Gemini, Llama, etc.)
MCP Server Connectors - Build adapters for popular services (databases, APIs, file systems)
Security Hardening - Audit the sandbox, propose additional security measures
Performance Optimizations - Container pooling, caching strategies, resource tuning
Monitoring & Observability - Prometheus metrics, logging, distributed tracing
Skills Library - Create reusable, domain-specific skills for the community
Documentation - Tutorials, deployment guides, best practices
Testing & Examples - Integration tests, real-world use cases, benchmarks
Alternative Sandboxes - WebAssembly, cloud functions, process isolation implementations
Frontend UI - Dashboard, skill explorer, task monitoring interface

How to Contribute

Fork & Customize - Start with this template for your specific use case
Share Improvements - Submit PRs with general-purpose enhancements
Build Skills - Create reusable skills and submit to the community skills library
Report Issues - Help us identify bugs and security concerns
Discuss Ideas - Join conversations about the architecture and design
Write Documentation - Help others understand and adopt the pattern

The Vision

We're building toward a future where:

🧠 AI agents scale beyond token limitations through code execution
🔄 Skills accumulate over time, making agents continuously smarter
🔒 Privacy is built-in with automatic PII protection
🛡️ Security is layered with multiple defense mechanisms
🌐 Tools are discovered dynamically, not statically configured
📚 Community-driven with shared skills and best practices

Customization Guide for Your Organization

Customize this template for your specific needs:

Implement your LLM integration - Choose your preferred provider
Connect your MCP servers - Wire up your tools and data sources
Customize security policies - Adjust for your threat model
Extend PII detection - Add patterns for your domain
Add monitoring and logging - Integrate with your observability stack
Build domain-specific skills - Create your organization's capability library
Share back - Contribute generic improvements to help the community

Community Resources

Issues & Discussions - Ask questions, propose features, discuss architecture
Skills Repository - Contribute reusable skills to skills/examples/
Documentation - Help improve guides and examples
Partnerships - Collaborate on larger initiatives

Recognition

Contributors will be recognized in:

Project README
Release notes
Community Hall of Fame
Speaking opportunities at community events

Together, we can build the next generation of AI agent infrastructure. Whether you're an AI researcher, DevOps engineer, security expert, or full-stack developer, there's a place for your contributions. Join us in advancing this paradigm!

📝 License

MIT License - See LICENSE file for details

⚠️ Important Notes

TODO Items: Search for TODO comments in the code for areas requiring implementation
Security: Review and harden security settings before production deployment
LLM Integration: The LLM calling function is a placeholder - implement with your provider
MCP Servers: Mock implementations are provided - replace with actual MCP connections
Production Ready: Additional hardening required for production use (monitoring, error handling, scaling)

Built with the Code Execution with MCP pattern for dynamic, secure AI agent workflows.