MCP-Driven Multi-Agent RAG-Enhanced LangGraph-orchestrated LLM System for Prostate Cancer Decision Support

📍 Presented at SIIM-CAIMI 2025 (Society for Imaging Informatics in Medicine)
🏥 University of Chicago – MS in Applied Data Science Capstone

🔬 Project Overview

This project presents an auditable, multi-agent Retrieval-Augmented Generation (RAG) framework designed to assist oncology workflows through:

Longitudinal temporal summarization
Evidence-grounded literature integration
Treatment recommendation via supervised ML
Lifespan estimation via survival analysis
Hallucination detection and validation loop

The system processes 500 synthetic longitudinal prostate cancer records and produces:

Structured timeline summaries
Literature-verified clinical context
Ranked treatment recommendations (with probabilities)
Survival probabilities at 5-, 10-, and 15-year horizons
Expected lifespan estimates (in years)

🏗 System Architecture

The architecture follows an MCP-governed modular design with strict tool mediation and validation.

High-Level Flow

Patient ID
   ↓
MCP Server (secure context retrieval)
   ↓
LangGraph Orchestration
   ├── Retrieval Tool (PubMed phenotype-aware query builder)
   ├── Summarizer Agent (structured clinical report generation)
   ├── Validator Agent (hallucination + missing data detection)
   ↓
Structured Model APIs
   ├── XGBoost Treatment Model
   ├── Survival Ensemble (Cox + Weibull + RSF)
   ↓
Final Validated Clinical Report

🧠 Model Architecture Diagram (for README)

flowchart TD

A[Patient ID] --> B[MCP Server<br/>Longitudinal Context Retrieval]

B --> C[LangGraph Orchestrator]

C --> D[Phenotype-Aware RAG Tool<br/>PubMed Query + Scoring]
C --> E[Summarizer Agent<br/>Structured Clinical Report]
C --> F[Validator Agent<br/>Hallucination & Missing Data Check]

E --> F
F -->|Retry if Needed| E

F --> G[Treatment Prediction API<br/>XGBoost Classifier]
F --> H[Lifespan Estimation API<br/>Cox + Weibull + RSF]

G --> I[Ranked Therapy + Probabilities]
H --> J[5/10/15-Year Survival + Expected Years]

I --> K[Final MCP-Audited Clinical Report]
J --> K

🛠 Technical Stack

🧩 LLM & Orchestration

LangGraph (multi-agent workflow control)
OpenAI GPT-4 (summarizer + validator agents)
Prompt-constrained structured generation
Retry routing controller with bounded iterations

🔐 Governance Layer

Model Context Protocol (MCP) Server
- Versioned patient context retrieval
- Tool mediation
- Auditable endpoint calls
- Metadata tracking (model name, version, timestamp)

📚 Retrieval (RAG)

PubMed XML API
Phenotype-aware query builder
Signal extraction via regex parsers
Evidence scoring:
- Clinical alignment
- Recency filtering (≥2016)
- Endpoint relevance
- Novelty weighting
Deterministic citation embedding (verbatim insertion)

📊 Treatment Recommendation Model

XGBoost classifier
Features:
- TNM stage
- Gleason grade
- PSA trajectory
- PSA velocity
- Metastatic indicators
- Treatment history
Output:
- Top-N ranked therapies
- Class probabilities
- Feature-driven rationale
Patient-level train/test split
Synthetic dataset accuracy: 1.00 (upper bound, not clinical claim)

📈 Lifespan Estimation Model (Ensemble Survival Framework)

Three complementary models:

Cox Proportional Hazards (interpretable hazard ratios)
Weibull Regression (stage-stratified baseline survival curves)
Random Survival Forest (nonlinear feature interactions)

Workflow:

TNM-based stratification (localized / N1 / M1)
Weibull baseline curve
Cox-based patient-specific risk shift
RSF nonlinear modulation
Ensemble averaging

Outputs:

5-, 10-, 15-year survival probabilities
Expected survival time (years)
Monotonic survival validation checks

Internal QA:

Survival ordering check (M1 < N1 < localized)
Probability bounds enforcement
Curve monotonicity

🏥 Conference Presentation

This work was presented at:

SIIM-CAIMI 2025
Society for Imaging Informatics in Medicine – Conference on Artificial Intelligence in Medical Imaging

Poster: AI-in-oncology-Poster.pdf
Featured in [UChicago Data Science Institution News]

📁 Data

500 synthetic longitudinal prostate cancer records
5–7 time-stamped visits per patient
Variables include:
- PSA (with kinetics)
- Gleason grade
- TNM stage
- Bone lesion count
- Visceral metastasis flag
- ALP, LDH, albumin, hemoglobin
- Treatment history
- Weight trends

Validated for medical plausibility by a practicing radiologist.

🔎 Evaluation & Validation

Structured Models

Held-out patient-level testing
Directional consistency validation
Internal statistical QA checks

LLM Components

Dedicated validator agent
Detection of:
- Hallucinated content
- Missing patient data
Iterative retry loop
Verbatim literature line insertion (no citation hallucination)

No summary finalized with unresolved hallucinations.

🎯 Key Contributions

MCP-governed agentic clinical AI framework
Hallucination-resistant RAG integration
Survival ensemble integrated into LLM workflow
Deterministic literature grounding
Modular API-based predictive model integration
Fully auditable report generation pipeline

📌 Research Context

This project addresses limitations in current oncology AI systems:

Lack of temporal reasoning
Hallucination in generative summaries
Non-auditable clinical AI outputs
Separation between ML survival models and narrative reasoning

The architecture demonstrates a reproducible pattern for safe LLM deployment in healthcare.

⚠ Disclaimer

This system was trained and evaluated on sample data and is intended for research demonstration only.

MCP Servers