MCP server by Sri22082
VectorSage MCP RAG Server
Production-ready Pinecone-based RAG server for Claude Desktop with large-scale PDF ingestion, evaluation pipelines, and AI teaching tools
🏗️ Architecture Overview
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Claude Desktop │────│ MCP Server │────│ Vector DB │
│ (User Interface)│ │ (FastMCP) │ │ (Pinecone/ │
└─────────────────┘ └─────────────────┘ │ OpenSearch) │
│ └─────────────────┘
│
▼
┌─────────────────┐
│ Document │
│ Processing │
│ Pipeline │
└─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ AWS S3 │ │ AWS Lambda │ │ AWS ECS │
│ (Document Store)│ │ (Serverless) │ │ (Containerized) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ OpenAI API │
│ (LLM) │
└─────────────────┘
🧩 System Components
- Claude Desktop: User interface for natural language interactions
- MCP Server: FastMCP-based server handling tool calls and RAG operations
- Vector Databases: Pinecone (primary) or AWS OpenSearch (alternative)
- Document Processing: PDF parsing, text chunking, and embedding generation
- Storage: AWS S3 for document persistence
- Compute: AWS ECS (containers) or Lambda (serverless)
- AI Services: OpenAI GPT models for generation and evaluation
🔄 Data Flow
- Document Upload → S3 storage → Processing pipeline → Vector embeddings → Database
- User Query → Claude Desktop → MCP Server → Vector search → Context retrieval → LLM generation → Response
- Evaluation → Test sets → RAG metrics → Performance analysis
Quick Setup
1. Clone & Install Dependencies
git clone https://github.com/Sri22082/vectorSage_MCP.git
cd vectorSage_MCP
uv sync
uv pip install -r requirements.txt
# OR
uv sync # uses pyproject.toml
2. Pinecone Setup
-Create account at https://www.pinecone.io/
-Go to API Keys → copy your api key
-Create index:
Name: rag-storage # name your rag storage
Modality: Text
Vector type: Dense
Dimension: 1024
Metric: cosine
3. Create .env
PINECONE_API_KEY = "your api key"
4. Main.py Changes
# ----------------------------------------------------------------------
# Global Configuration
# ----------------------------------------------------------------------
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index = pc.Index("name of the index")
5. Install to Claude Desktop
uv run fastmcp install claude-desktop main.py
6. Configure Claude Desktop
{
"SERVER_NAME": {
"command": "C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server/.venv/Scripts/python.exe",
"args": ["C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server/main.py"],
"env": {
"PINECONE_API_KEY": "your api key"
},
"transport": "stdio",
"cwd": "C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server",
"timeout": 600
}
}
7. Launch
-
Save the JSON file
-
Close Claude Desktop completely
-
Open Task Manager → end all Claude processes
-
Restart Claude Desktop
Configuration
VectorSage uses environment variables for all api keys, file paths, credentials, and model configuration. This makes the project portable across machines and operating systems.
Create a .env file using .env.example as reference.
Required Environment Variables
# SAMPLE .ENV
PINECONE_API_KEY=your_key_here
PINECONE_INDEX_NAME=athena-rag
PINECONE_NAMESPACE=ml-theory-algorithms-text
OPENAI_API_KEY=your_api_key_here
# Documents
BOOK_PDF=/absolute/path/to/your/book.pdf
BOOK_NAME=Understanding Machine Learning: Theory and Algorithms
# Evaluation
TESTSET_CSV_PATH=/absolute/path/to/testset.csv
# Models
EMBEDDING_MODEL_NAME=BAAI/bge-large-en-v1.5
LLM_MODEL_NAME=gpt-4o-mini
Project Structure
VectorSage-MCP-RAG-server/
├─ main.py # MCP server for Claude Desktop
├─ lambda_main.py # AWS Lambda handler
├─ aws_utils.py # AWS S3 and OpenSearch utilities
├─ pinecone_ingestion.py # Offline PDF ingestion
├─ testset_generator.py # Synthetic QA generation (RAGAS)
├─ evaluate_rag.py # RAG evaluation pipeline
├─ Dockerfile # Docker container definition
├─ docker-compose.yml # Local development setup
├─ docker-compose.aws.yml # AWS development setup
├─ aws/ # AWS deployment configurations
│ ├─ cloudformation.yml # ECS CloudFormation template
│ ├─ lambda-cloudformation.yml # Lambda CloudFormation template
│ └─ deploy.sh # AWS deployment script
├─ .env.example # Environment variable template
├─ requirements.txt
├─ pyproject.toml
├─ testset_small(10Q).csv
├─ testset_large(100Q).csv
└─ README.md
Pinecone Ingestion
This script ingests a textbook-scale PDF into Pinecone using a production-quality semantic ingestion pipeline.
Key features:
- Page-aware PDF parsing using pdfplumber
- Text cleaning to remove common PDF artifacts
- Semantic chunking (512 tokens, 100 overlap)
- Context-enriched chunks (book title + page number)
- High-quality embeddings using BAAI/bge-large-en-v1.5
- Batched upserts into Pinecone
Required configuration:
- BOOK_PDF – path to the textbook PDF
- BOOK_NAME – title of the document
- PINECONE_INDEX_NAME
- PINECONE_NAMESPACE
- PINECONE_API_KEY
Run ingestion:
python pinecone_ingestion.py
Note: Due to context window and transport limits, large PDFs are best ingested offline via pinecone_ingestion.py, while smaller documents can be safely uploaded through the Claude Desktop MCP interface.
RAG Evaluation
VectorSage includes a complete RAG evaluation pipeline built using the RAGAS framework.
This allows users to quantitatively assess retrieval and generation quality rather than relying on subjective judgment.
Metrics Evaluated
The evaluation pipeline reports the following metrics:
- Context Precision – How relevant the retrieved chunks are.
- Context Recall – How well the retriever covers the required information.
- Faithfulness – Whether the answer is grounded in retrieved context.
- Answer Relevancy – How well the answer addresses the question.
The evaluation logic is implemented in:
evaluate_rag.py
Evaluation process:
- Load questions from a CSV testset
- Retrieve context from Pinecone
- Generate answers using the LLM
- Compute RAGAS metrics
Users can run the evaluation directly once documents are ingested into Pinecone.
TestSet Generation
This script generates synthetic question–answer pairs from the source PDF using the RAGAS testset generator.
Generation process:
- Loads the source PDF
- Splits the text into semantic chunks
- Generates high-quality QA pairs using an LLM
- Exports the dataset as a CSV file
Required configuration:
- TESTSET_CSV_PATH
- PINECONE_INDEX_NAME
- PINECONE_NAMESPACE
- EMBEDDING_MODEL_NAME
- LLM_MODEL_NAME
Run testset generation:
python testset_generator.py
Output:
testset.csv
Note: MiniLM embeddings are used only during testset generation. Retrieval and evaluation use BGE-large embeddings
Evaluation Testsets
To make evaluation reproducible and easy to run, this repository includes two pre-generated testsets derived from the textbook Understanding Machine Learning.
Available Testsets
testset_small(10Q).csv10-question lightweight testset for quick sanity checkstestset_large(100Q).csv100-question comprehensive testset for robust evaluation
Each testset contains:
- High-quality synthetic questions
- Ground-truth reference answers
- Designed to test definitions, theoretical concepts, and multi-hop reasoning
These CSV files allow users to evaluate VectorSage's RAG performance immediately without regenerating test data.
Reference Textbook
This project is built around the book:
Understanding Machine Learning: From Theory to Algorithms Shai Shalev-Shwartz and Shai Ben-David Download Book
The book is used for:
- Document ingestion
- Testset generation
- Quantitative RAG evaluation
AWS Deployment
VectorSage supports multiple AWS deployment options for production workloads.
Prerequisites
- AWS CLI installed and configured
- Docker installed (for containerized deployment)
- AWS Account with appropriate permissions
- SSM Parameters for API keys:
# Store your API keys securely in SSM Parameter Store aws ssm put-parameter --name "/vectorsage/pinecone-api-key" --value "your-pinecone-key" --type "SecureString" aws ssm put-parameter --name "/vectorsage/openai-api-key" --value "your-openai-key" --type "SecureString"
Option 1: AWS ECS (Containerized)
Deploy VectorSage as a containerized service on Amazon ECS with Fargate.
Quick Deploy
# Set your environment variables
export ENVIRONMENT=prod
export AWS_DEFAULT_REGION=us-east-1
export VPC_ID=vpc-12345678 # Your VPC ID
export SUBNET_IDS=subnet-12345678,subnet-87654321 # Your subnet IDs
# Run the deployment script
chmod +x aws/deploy.sh
./aws/deploy.sh
Manual Deployment
- Build and push Docker image to ECR:
# Authenticate Docker with ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
# Build and tag the image
docker build -t vectorsage:latest .
docker tag vectorsage:latest YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/vectorsage:latest
# Push to ECR
docker push YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/vectorsage:latest
- Deploy CloudFormation stack:
aws cloudformation create-stack \
--stack-name vectorsage-ecs-prod \
--template-body file://aws/cloudformation.yml \
--parameters \
ParameterKey=ProjectName,ParameterValue=vectorsage \
ParameterKey=Environment,ParameterValue=prod \
ParameterKey=VpcId,ParameterValue=YOUR_VPC_ID \
ParameterKey=SubnetIds,ParameterValue="YOUR_SUBNET_ID_1,YOUR_SUBNET_ID_2" \
--capabilities CAPABILITY_IAM
Option 2: AWS Lambda (Serverless)
Deploy VectorSage as a serverless function with API Gateway.
Deploy Lambda Version
- Create deployment package:
# Install dependencies
pip install -r requirements.txt -t lambda-package/
# Copy source code
cp *.py lambda-package/
cp aws_utils.py lambda-package/
# Create deployment zip
cd lambda-package && zip -r ../vectorsage-lambda.zip . && cd ..
- Deploy CloudFormation stack:
aws cloudformation create-stack \
--stack-name vectorsage-lambda-prod \
--template-body file://aws/lambda-cloudformation.yml \
--capabilities CAPABILITY_IAM
- Update Lambda function code:
aws lambda update-function-code \
--function-name vectorsage-prod \
--zip-file fileb://vectorsage-lambda.zip
Option 3: Docker Compose (Local/Development)
For local development with AWS services:
# Copy environment file
cp .env.example .env
# Edit .env with your AWS credentials and settings
nano .env
# Run with Docker Compose
docker-compose -f docker-compose.aws.yml up -d
AWS Services Used
- Amazon S3: Document storage and retrieval
- Amazon OpenSearch: Alternative vector database to Pinecone
- Amazon ECS/EKS: Container orchestration
- AWS Lambda: Serverless deployment
- Amazon API Gateway: API management for Lambda
- AWS Systems Manager: Secure parameter storage
- AWS CloudFormation: Infrastructure as code
Environment Variables for AWS
Add these to your .env file when deploying to AWS:
# AWS Configuration
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_DEFAULT_REGION=us-east-1
AWS_S3_BUCKET_NAME=vectorsage-documents-prod
AWS_OPENSEARCH_ENDPOINT=https://your-opensearch-domain.us-east-1.es.amazonaws.com
AWS_OPENSEARCH_INDEX=vectorsage-documents
# Application Configuration
ENVIRONMENT=prod
LOG_LEVEL=INFO
Monitoring and Logging
- CloudWatch Logs: All application logs are sent to CloudWatch
- X-Ray: Distributed tracing for Lambda functions
- CloudWatch Metrics: Performance monitoring
- Health Checks: Built-in health endpoints for monitoring
Cost Optimization
- ECS: Use Fargate for serverless containers
- Lambda: Pay-per-request pricing
- S3: Low-cost object storage
- OpenSearch: Reserved instances for production workloads