MCP Servers

A collection of Model Context Protocol servers, templates, tools and more.

MCP server by Sri22082

Created 1/20/2026
Updated about 10 hours ago
Repository documentation and setup instructions

VectorSage MCP RAG Server

Production-ready Pinecone-based RAG server for Claude Desktop with large-scale PDF ingestion, evaluation pipelines, and AI teaching tools


🏗️ Architecture Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Claude Desktop │────│  MCP Server     │────│   Vector DB     │
│   (User Interface)│    │  (FastMCP)      │    │   (Pinecone/    │
└─────────────────┘    └─────────────────┘    │    OpenSearch)   │
                              │               └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │   Document      │
                       │   Processing    │
                       │   Pipeline      │
                       └─────────────────┘
                              │
                              ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│    AWS S3       │  │   AWS Lambda    │  │    AWS ECS      │
│ (Document Store)│  │ (Serverless)    │  │ (Containerized) │
└─────────────────┘  └─────────────────┘  └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │   OpenAI API    │
                       │   (LLM)         │
                       └─────────────────┘

🧩 System Components

  • Claude Desktop: User interface for natural language interactions
  • MCP Server: FastMCP-based server handling tool calls and RAG operations
  • Vector Databases: Pinecone (primary) or AWS OpenSearch (alternative)
  • Document Processing: PDF parsing, text chunking, and embedding generation
  • Storage: AWS S3 for document persistence
  • Compute: AWS ECS (containers) or Lambda (serverless)
  • AI Services: OpenAI GPT models for generation and evaluation

🔄 Data Flow

  1. Document Upload → S3 storage → Processing pipeline → Vector embeddings → Database
  2. User Query → Claude Desktop → MCP Server → Vector search → Context retrieval → LLM generation → Response
  3. Evaluation → Test sets → RAG metrics → Performance analysis

Quick Setup

1. Clone & Install Dependencies

git clone https://github.com/Sri22082/vectorSage_MCP.git

cd vectorSage_MCP
uv sync
uv pip install -r requirements.txt
# OR
uv sync  # uses pyproject.toml

2. Pinecone Setup

-Create account at https://www.pinecone.io/

-Go to API Keys → copy your api key

-Create index:

Name: rag-storage   # name your rag storage 
Modality: Text  
Vector type: Dense  
Dimension: 1024  
Metric: cosine 

3. Create .env

PINECONE_API_KEY = "your api key"

4. Main.py Changes

# ----------------------------------------------------------------------
# Global Configuration
# ----------------------------------------------------------------------
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index = pc.Index("name of the index")

5. Install to Claude Desktop

uv run fastmcp install claude-desktop main.py

6. Configure Claude Desktop

{
  "SERVER_NAME": {
    "command": "C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server/.venv/Scripts/python.exe",
    "args": ["C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server/main.py"],
    "env": {
      "PINECONE_API_KEY": "your api key"
    },
    "transport": "stdio",
    "cwd": "C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server",
    "timeout": 600
  }
}

7. Launch

  1. Save the JSON file

  2. Close Claude Desktop completely

  3. Open Task Manager → end all Claude processes

  4. Restart Claude Desktop

Configuration

VectorSage uses environment variables for all api keys, file paths, credentials, and model configuration. This makes the project portable across machines and operating systems.

Create a .env file using .env.example as reference.

Required Environment Variables

# SAMPLE .ENV

PINECONE_API_KEY=your_key_here
PINECONE_INDEX_NAME=athena-rag
PINECONE_NAMESPACE=ml-theory-algorithms-text
OPENAI_API_KEY=your_api_key_here

# Documents
BOOK_PDF=/absolute/path/to/your/book.pdf
BOOK_NAME=Understanding Machine Learning: Theory and Algorithms

# Evaluation
TESTSET_CSV_PATH=/absolute/path/to/testset.csv

# Models
EMBEDDING_MODEL_NAME=BAAI/bge-large-en-v1.5
LLM_MODEL_NAME=gpt-4o-mini

Project Structure

VectorSage-MCP-RAG-server/
├─ main.py                 # MCP server for Claude Desktop
├─ lambda_main.py          # AWS Lambda handler
├─ aws_utils.py            # AWS S3 and OpenSearch utilities
├─ pinecone_ingestion.py   # Offline PDF ingestion
├─ testset_generator.py    # Synthetic QA generation (RAGAS)
├─ evaluate_rag.py         # RAG evaluation pipeline
├─ Dockerfile              # Docker container definition
├─ docker-compose.yml      # Local development setup
├─ docker-compose.aws.yml  # AWS development setup
├─ aws/                    # AWS deployment configurations
│   ├─ cloudformation.yml      # ECS CloudFormation template
│   ├─ lambda-cloudformation.yml # Lambda CloudFormation template
│   └─ deploy.sh               # AWS deployment script
├─ .env.example            # Environment variable template
├─ requirements.txt
├─ pyproject.toml
├─ testset_small(10Q).csv
├─ testset_large(100Q).csv
└─ README.md

Pinecone Ingestion

This script ingests a textbook-scale PDF into Pinecone using a production-quality semantic ingestion pipeline.

Key features:

  1. Page-aware PDF parsing using pdfplumber
  2. Text cleaning to remove common PDF artifacts
  3. Semantic chunking (512 tokens, 100 overlap)
  4. Context-enriched chunks (book title + page number)
  5. High-quality embeddings using BAAI/bge-large-en-v1.5
  6. Batched upserts into Pinecone

Required configuration:

  1. BOOK_PDF – path to the textbook PDF
  2. BOOK_NAME – title of the document
  3. PINECONE_INDEX_NAME
  4. PINECONE_NAMESPACE
  5. PINECONE_API_KEY

Run ingestion:

python pinecone_ingestion.py

Note: Due to context window and transport limits, large PDFs are best ingested offline via pinecone_ingestion.py, while smaller documents can be safely uploaded through the Claude Desktop MCP interface.

RAG Evaluation

VectorSage includes a complete RAG evaluation pipeline built using the RAGAS framework. This allows users to quantitatively assess retrieval and generation quality rather than relying on subjective judgment.

Metrics Evaluated

The evaluation pipeline reports the following metrics:

  1. Context Precision – How relevant the retrieved chunks are.
  2. Context Recall – How well the retriever covers the required information.
  3. Faithfulness – Whether the answer is grounded in retrieved context.
  4. Answer Relevancy – How well the answer addresses the question.

The evaluation logic is implemented in:

evaluate_rag.py

Evaluation process:

  1. Load questions from a CSV testset
  2. Retrieve context from Pinecone
  3. Generate answers using the LLM
  4. Compute RAGAS metrics

Users can run the evaluation directly once documents are ingested into Pinecone.

TestSet Generation

This script generates synthetic question–answer pairs from the source PDF using the RAGAS testset generator.

Generation process:

  1. Loads the source PDF
  2. Splits the text into semantic chunks
  3. Generates high-quality QA pairs using an LLM
  4. Exports the dataset as a CSV file

Required configuration:

  1. TESTSET_CSV_PATH
  2. PINECONE_INDEX_NAME
  3. PINECONE_NAMESPACE
  4. EMBEDDING_MODEL_NAME
  5. LLM_MODEL_NAME

Run testset generation:

python testset_generator.py

Output:

testset.csv

Note: MiniLM embeddings are used only during testset generation. Retrieval and evaluation use BGE-large embeddings

Evaluation Testsets

To make evaluation reproducible and easy to run, this repository includes two pre-generated testsets derived from the textbook Understanding Machine Learning.

Available Testsets

  1. testset_small(10Q).csv 10-question lightweight testset for quick sanity checks
  2. testset_large(100Q).csv 100-question comprehensive testset for robust evaluation

Each testset contains:

  1. High-quality synthetic questions
  2. Ground-truth reference answers
  3. Designed to test definitions, theoretical concepts, and multi-hop reasoning

These CSV files allow users to evaluate VectorSage's RAG performance immediately without regenerating test data.

Reference Textbook

This project is built around the book:

Understanding Machine Learning: From Theory to Algorithms Shai Shalev-Shwartz and Shai Ben-David Download Book

The book is used for:

  1. Document ingestion
  2. Testset generation
  3. Quantitative RAG evaluation

AWS Deployment

VectorSage supports multiple AWS deployment options for production workloads.

Prerequisites

  1. AWS CLI installed and configured
  2. Docker installed (for containerized deployment)
  3. AWS Account with appropriate permissions
  4. SSM Parameters for API keys:
    # Store your API keys securely in SSM Parameter Store
    aws ssm put-parameter --name "/vectorsage/pinecone-api-key" --value "your-pinecone-key" --type "SecureString"
    aws ssm put-parameter --name "/vectorsage/openai-api-key" --value "your-openai-key" --type "SecureString"
    

Option 1: AWS ECS (Containerized)

Deploy VectorSage as a containerized service on Amazon ECS with Fargate.

Quick Deploy

# Set your environment variables
export ENVIRONMENT=prod
export AWS_DEFAULT_REGION=us-east-1
export VPC_ID=vpc-12345678  # Your VPC ID
export SUBNET_IDS=subnet-12345678,subnet-87654321  # Your subnet IDs

# Run the deployment script
chmod +x aws/deploy.sh
./aws/deploy.sh

Manual Deployment

  1. Build and push Docker image to ECR:
# Authenticate Docker with ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com

# Build and tag the image
docker build -t vectorsage:latest .
docker tag vectorsage:latest YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/vectorsage:latest

# Push to ECR
docker push YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/vectorsage:latest
  1. Deploy CloudFormation stack:
aws cloudformation create-stack \
  --stack-name vectorsage-ecs-prod \
  --template-body file://aws/cloudformation.yml \
  --parameters \
    ParameterKey=ProjectName,ParameterValue=vectorsage \
    ParameterKey=Environment,ParameterValue=prod \
    ParameterKey=VpcId,ParameterValue=YOUR_VPC_ID \
    ParameterKey=SubnetIds,ParameterValue="YOUR_SUBNET_ID_1,YOUR_SUBNET_ID_2" \
  --capabilities CAPABILITY_IAM

Option 2: AWS Lambda (Serverless)

Deploy VectorSage as a serverless function with API Gateway.

Deploy Lambda Version

  1. Create deployment package:
# Install dependencies
pip install -r requirements.txt -t lambda-package/

# Copy source code
cp *.py lambda-package/
cp aws_utils.py lambda-package/

# Create deployment zip
cd lambda-package && zip -r ../vectorsage-lambda.zip . && cd ..
  1. Deploy CloudFormation stack:
aws cloudformation create-stack \
  --stack-name vectorsage-lambda-prod \
  --template-body file://aws/lambda-cloudformation.yml \
  --capabilities CAPABILITY_IAM
  1. Update Lambda function code:
aws lambda update-function-code \
  --function-name vectorsage-prod \
  --zip-file fileb://vectorsage-lambda.zip

Option 3: Docker Compose (Local/Development)

For local development with AWS services:

# Copy environment file
cp .env.example .env

# Edit .env with your AWS credentials and settings
nano .env

# Run with Docker Compose
docker-compose -f docker-compose.aws.yml up -d

AWS Services Used

  • Amazon S3: Document storage and retrieval
  • Amazon OpenSearch: Alternative vector database to Pinecone
  • Amazon ECS/EKS: Container orchestration
  • AWS Lambda: Serverless deployment
  • Amazon API Gateway: API management for Lambda
  • AWS Systems Manager: Secure parameter storage
  • AWS CloudFormation: Infrastructure as code

Environment Variables for AWS

Add these to your .env file when deploying to AWS:

# AWS Configuration
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_DEFAULT_REGION=us-east-1
AWS_S3_BUCKET_NAME=vectorsage-documents-prod
AWS_OPENSEARCH_ENDPOINT=https://your-opensearch-domain.us-east-1.es.amazonaws.com
AWS_OPENSEARCH_INDEX=vectorsage-documents

# Application Configuration
ENVIRONMENT=prod
LOG_LEVEL=INFO

Monitoring and Logging

  • CloudWatch Logs: All application logs are sent to CloudWatch
  • X-Ray: Distributed tracing for Lambda functions
  • CloudWatch Metrics: Performance monitoring
  • Health Checks: Built-in health endpoints for monitoring

Cost Optimization

  • ECS: Use Fargate for serverless containers
  • Lambda: Pay-per-request pricing
  • S3: Low-cost object storage
  • OpenSearch: Reserved instances for production workloads

VectorSage is ready for use!

Quick Setup
Installation guide for this server

Install Package (if required)

uvx vectorsage_mcp

Cursor configuration (mcp.json)

{ "mcpServers": { "sri22082-vectorsage-mcp": { "command": "uvx", "args": [ "vectorsage_mcp" ] } } }