VectorSage MCP RAG Server

Production-ready Pinecone-based RAG server for Claude Desktop with large-scale PDF ingestion, evaluation pipelines, and AI teaching tools

🏗️ Architecture Overview

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Claude Desktop │────│  MCP Server     │────│   Vector DB     │
│   (User Interface)│    │  (FastMCP)      │    │   (Pinecone/    │
└─────────────────┘    └─────────────────┘    │    OpenSearch)   │
                              │               └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │   Document      │
                       │   Processing    │
                       │   Pipeline      │
                       └─────────────────┘
                              │
                              ▼
┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐
│    AWS S3       │  │   AWS Lambda    │  │    AWS ECS      │
│ (Document Store)│  │ (Serverless)    │  │ (Containerized) │
└─────────────────┘  └─────────────────┘  └─────────────────┘
                              │
                              ▼
                       ┌─────────────────┐
                       │   OpenAI API    │
                       │   (LLM)         │
                       └─────────────────┘

🧩 System Components

Claude Desktop: User interface for natural language interactions
MCP Server: FastMCP-based server handling tool calls and RAG operations
Vector Databases: Pinecone (primary) or AWS OpenSearch (alternative)
Document Processing: PDF parsing, text chunking, and embedding generation
Storage: AWS S3 for document persistence
Compute: AWS ECS (containers) or Lambda (serverless)
AI Services: OpenAI GPT models for generation and evaluation

🔄 Data Flow

Document Upload → S3 storage → Processing pipeline → Vector embeddings → Database
User Query → Claude Desktop → MCP Server → Vector search → Context retrieval → LLM generation → Response
Evaluation → Test sets → RAG metrics → Performance analysis

Quick Setup

1. Clone & Install Dependencies

git clone https://github.com/Sri22082/vectorSage_MCP.git

cd vectorSage_MCP
uv sync

uv pip install -r requirements.txt
# OR
uv sync  # uses pyproject.toml

2. Pinecone Setup

-Create account at https://www.pinecone.io/

-Go to API Keys → copy your api key

-Create index:

Name: rag-storage   # name your rag storage 
Modality: Text  
Vector type: Dense  
Dimension: 1024  
Metric: cosine

3. Create .env

PINECONE_API_KEY = "your api key"

4. Main.py Changes

# ----------------------------------------------------------------------
# Global Configuration
# ----------------------------------------------------------------------
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))
index = pc.Index("name of the index")

5. Install to Claude Desktop

uv run fastmcp install claude-desktop main.py

6. Configure Claude Desktop

{
  "SERVER_NAME": {
    "command": "C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server/.venv/Scripts/python.exe",
    "args": ["C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server/main.py"],
    "env": {
      "PINECONE_API_KEY": "your api key"
    },
    "transport": "stdio",
    "cwd": "C:/Users/YOUR_USERNAME/VectorSage-MCP-RAG-server",
    "timeout": 600
  }
}

7. Launch

Save the JSON file
Close Claude Desktop completely
Open Task Manager → end all Claude processes
Restart Claude Desktop

Configuration

VectorSage uses environment variables for all api keys, file paths, credentials, and model configuration. This makes the project portable across machines and operating systems.

Create a .env file using .env.example as reference.

Required Environment Variables

# SAMPLE .ENV

PINECONE_API_KEY=your_key_here
PINECONE_INDEX_NAME=athena-rag
PINECONE_NAMESPACE=ml-theory-algorithms-text
OPENAI_API_KEY=your_api_key_here

# Documents
BOOK_PDF=/absolute/path/to/your/book.pdf
BOOK_NAME=Understanding Machine Learning: Theory and Algorithms

# Evaluation
TESTSET_CSV_PATH=/absolute/path/to/testset.csv

# Models
EMBEDDING_MODEL_NAME=BAAI/bge-large-en-v1.5
LLM_MODEL_NAME=gpt-4o-mini

Project Structure

VectorSage-MCP-RAG-server/
├─ main.py                 # MCP server for Claude Desktop
├─ lambda_main.py          # AWS Lambda handler
├─ aws_utils.py            # AWS S3 and OpenSearch utilities
├─ pinecone_ingestion.py   # Offline PDF ingestion
├─ testset_generator.py    # Synthetic QA generation (RAGAS)
├─ evaluate_rag.py         # RAG evaluation pipeline
├─ Dockerfile              # Docker container definition
├─ docker-compose.yml      # Local development setup
├─ docker-compose.aws.yml  # AWS development setup
├─ aws/                    # AWS deployment configurations
│   ├─ cloudformation.yml      # ECS CloudFormation template
│   ├─ lambda-cloudformation.yml # Lambda CloudFormation template
│   └─ deploy.sh               # AWS deployment script
├─ .env.example            # Environment variable template
├─ requirements.txt
├─ pyproject.toml
├─ testset_small(10Q).csv
├─ testset_large(100Q).csv
└─ README.md

Pinecone Ingestion

This script ingests a textbook-scale PDF into Pinecone using a production-quality semantic ingestion pipeline.

Key features:

Page-aware PDF parsing using pdfplumber
Text cleaning to remove common PDF artifacts
Semantic chunking (512 tokens, 100 overlap)
Context-enriched chunks (book title + page number)
High-quality embeddings using BAAI/bge-large-en-v1.5
Batched upserts into Pinecone

Required configuration:

BOOK_PDF – path to the textbook PDF
BOOK_NAME – title of the document
PINECONE_INDEX_NAME
PINECONE_NAMESPACE
PINECONE_API_KEY

Run ingestion:

python pinecone_ingestion.py

Note: Due to context window and transport limits, large PDFs are best ingested offline via pinecone_ingestion.py, while smaller documents can be safely uploaded through the Claude Desktop MCP interface.

RAG Evaluation

VectorSage includes a complete RAG evaluation pipeline built using the RAGAS framework. This allows users to quantitatively assess retrieval and generation quality rather than relying on subjective judgment.

Metrics Evaluated

The evaluation pipeline reports the following metrics:

Context Precision – How relevant the retrieved chunks are.
Context Recall – How well the retriever covers the required information.
Faithfulness – Whether the answer is grounded in retrieved context.
Answer Relevancy – How well the answer addresses the question.

The evaluation logic is implemented in:

evaluate_rag.py

Evaluation process:

Load questions from a CSV testset
Retrieve context from Pinecone
Generate answers using the LLM
Compute RAGAS metrics

Users can run the evaluation directly once documents are ingested into Pinecone.

TestSet Generation

This script generates synthetic question–answer pairs from the source PDF using the RAGAS testset generator.

Generation process:

Loads the source PDF
Splits the text into semantic chunks
Generates high-quality QA pairs using an LLM
Exports the dataset as a CSV file

Required configuration:

TESTSET_CSV_PATH
PINECONE_INDEX_NAME
PINECONE_NAMESPACE
EMBEDDING_MODEL_NAME
LLM_MODEL_NAME

Run testset generation:

python testset_generator.py

Output:

testset.csv

Note: MiniLM embeddings are used only during testset generation. Retrieval and evaluation use BGE-large embeddings

Evaluation Testsets

To make evaluation reproducible and easy to run, this repository includes two pre-generated testsets derived from the textbook Understanding Machine Learning.

Available Testsets

testset_small(10Q).csv 10-question lightweight testset for quick sanity checks
testset_large(100Q).csv 100-question comprehensive testset for robust evaluation

Each testset contains:

High-quality synthetic questions
Ground-truth reference answers
Designed to test definitions, theoretical concepts, and multi-hop reasoning

These CSV files allow users to evaluate VectorSage's RAG performance immediately without regenerating test data.

Reference Textbook

This project is built around the book:

Understanding Machine Learning: From Theory to Algorithms Shai Shalev-Shwartz and Shai Ben-David Download Book

The book is used for:

Document ingestion
Testset generation
Quantitative RAG evaluation

AWS Deployment

VectorSage supports multiple AWS deployment options for production workloads.

Prerequisites

AWS CLI installed and configured
Docker installed (for containerized deployment)
AWS Account with appropriate permissions

SSM Parameters for API keys:

# Store your API keys securely in SSM Parameter Store
aws ssm put-parameter --name "/vectorsage/pinecone-api-key" --value "your-pinecone-key" --type "SecureString"
aws ssm put-parameter --name "/vectorsage/openai-api-key" --value "your-openai-key" --type "SecureString"

Option 1: AWS ECS (Containerized)

Deploy VectorSage as a containerized service on Amazon ECS with Fargate.

Quick Deploy

# Set your environment variables
export ENVIRONMENT=prod
export AWS_DEFAULT_REGION=us-east-1
export VPC_ID=vpc-12345678  # Your VPC ID
export SUBNET_IDS=subnet-12345678,subnet-87654321  # Your subnet IDs

# Run the deployment script
chmod +x aws/deploy.sh
./aws/deploy.sh

Manual Deployment

Build and push Docker image to ECR:

# Authenticate Docker with ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com

# Build and tag the image
docker build -t vectorsage:latest .
docker tag vectorsage:latest YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/vectorsage:latest

# Push to ECR
docker push YOUR_ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/vectorsage:latest

Deploy CloudFormation stack:

aws cloudformation create-stack \
  --stack-name vectorsage-ecs-prod \
  --template-body file://aws/cloudformation.yml \
  --parameters \
    ParameterKey=ProjectName,ParameterValue=vectorsage \
    ParameterKey=Environment,ParameterValue=prod \
    ParameterKey=VpcId,ParameterValue=YOUR_VPC_ID \
    ParameterKey=SubnetIds,ParameterValue="YOUR_SUBNET_ID_1,YOUR_SUBNET_ID_2" \
  --capabilities CAPABILITY_IAM

Option 2: AWS Lambda (Serverless)

Deploy VectorSage as a serverless function with API Gateway.

Deploy Lambda Version

Create deployment package:

# Install dependencies
pip install -r requirements.txt -t lambda-package/

# Copy source code
cp *.py lambda-package/
cp aws_utils.py lambda-package/

# Create deployment zip
cd lambda-package && zip -r ../vectorsage-lambda.zip . && cd ..

Deploy CloudFormation stack:

aws cloudformation create-stack \
  --stack-name vectorsage-lambda-prod \
  --template-body file://aws/lambda-cloudformation.yml \
  --capabilities CAPABILITY_IAM

Update Lambda function code:

aws lambda update-function-code \
  --function-name vectorsage-prod \
  --zip-file fileb://vectorsage-lambda.zip

Option 3: Docker Compose (Local/Development)

For local development with AWS services:

# Copy environment file
cp .env.example .env

# Edit .env with your AWS credentials and settings
nano .env

# Run with Docker Compose
docker-compose -f docker-compose.aws.yml up -d

AWS Services Used

Amazon S3: Document storage and retrieval
Amazon OpenSearch: Alternative vector database to Pinecone
Amazon ECS/EKS: Container orchestration
AWS Lambda: Serverless deployment
Amazon API Gateway: API management for Lambda
AWS Systems Manager: Secure parameter storage
AWS CloudFormation: Infrastructure as code

Environment Variables for AWS

Add these to your .env file when deploying to AWS:

# AWS Configuration
AWS_ACCESS_KEY_ID=your_aws_access_key
AWS_SECRET_ACCESS_KEY=your_aws_secret_key
AWS_DEFAULT_REGION=us-east-1
AWS_S3_BUCKET_NAME=vectorsage-documents-prod
AWS_OPENSEARCH_ENDPOINT=https://your-opensearch-domain.us-east-1.es.amazonaws.com
AWS_OPENSEARCH_INDEX=vectorsage-documents

# Application Configuration
ENVIRONMENT=prod
LOG_LEVEL=INFO

MCP Servers