An MCP-governed, RAG-enhanced multi-agent LLM system for prostate cancer care, developed in collaboration with UChicago Medicine. Built using LangGraph, LangChain, ChromaDB, FastAPI, OpenAI GPT-4, HuggingFace embeddings, and ML & survival models (XGBoost, Cox, Weibull, RSF) for clinical reasoning and treatment prediction.
MCP-Driven Multi-Agent RAG-Enhanced LangGraph-orchestrated LLM System for Prostate Cancer Decision Support
📍 Presented at SIIM-CAIMI 2025 (Society for Imaging Informatics in Medicine)
🏥 University of Chicago – MS in Applied Data Science Capstone
🔬 Project Overview
This project presents an auditable, multi-agent Retrieval-Augmented Generation (RAG) framework designed to assist oncology workflows through:
- Longitudinal temporal summarization
- Evidence-grounded literature integration
- Treatment recommendation via supervised ML
- Lifespan estimation via survival analysis
- Hallucination detection and validation loop
The system processes 500 synthetic longitudinal prostate cancer records and produces:
- Structured timeline summaries
- Literature-verified clinical context
- Ranked treatment recommendations (with probabilities)
- Survival probabilities at 5-, 10-, and 15-year horizons
- Expected lifespan estimates (in years)
🏗 System Architecture
The architecture follows an MCP-governed modular design with strict tool mediation and validation.
High-Level Flow
Patient ID
↓
MCP Server (secure context retrieval)
↓
LangGraph Orchestration
├── Retrieval Tool (PubMed phenotype-aware query builder)
├── Summarizer Agent (structured clinical report generation)
├── Validator Agent (hallucination + missing data detection)
↓
Structured Model APIs
├── XGBoost Treatment Model
├── Survival Ensemble (Cox + Weibull + RSF)
↓
Final Validated Clinical Report
🧠 Model Architecture Diagram (for README)
flowchart TD
A[Patient ID] --> B[MCP Server<br/>Longitudinal Context Retrieval]
B --> C[LangGraph Orchestrator]
C --> D[Phenotype-Aware RAG Tool<br/>PubMed Query + Scoring]
C --> E[Summarizer Agent<br/>Structured Clinical Report]
C --> F[Validator Agent<br/>Hallucination & Missing Data Check]
E --> F
F -->|Retry if Needed| E
F --> G[Treatment Prediction API<br/>XGBoost Classifier]
F --> H[Lifespan Estimation API<br/>Cox + Weibull + RSF]
G --> I[Ranked Therapy + Probabilities]
H --> J[5/10/15-Year Survival + Expected Years]
I --> K[Final MCP-Audited Clinical Report]
J --> K
🛠 Technical Stack
🧩 LLM & Orchestration
- LangGraph (multi-agent workflow control)
- OpenAI GPT-4 (summarizer + validator agents)
- Prompt-constrained structured generation
- Retry routing controller with bounded iterations
🔐 Governance Layer
-
Model Context Protocol (MCP) Server
- Versioned patient context retrieval
- Tool mediation
- Auditable endpoint calls
- Metadata tracking (model name, version, timestamp)
📚 Retrieval (RAG)
-
PubMed XML API
-
Phenotype-aware query builder
-
Signal extraction via regex parsers
-
Evidence scoring:
- Clinical alignment
- Recency filtering (≥2016)
- Endpoint relevance
- Novelty weighting
-
Deterministic citation embedding (verbatim insertion)
📊 Treatment Recommendation Model
-
XGBoost classifier
-
Features:
- TNM stage
- Gleason grade
- PSA trajectory
- PSA velocity
- Metastatic indicators
- Treatment history
-
Output:
- Top-N ranked therapies
- Class probabilities
- Feature-driven rationale
-
Patient-level train/test split
-
Synthetic dataset accuracy: 1.00 (upper bound, not clinical claim)
📈 Lifespan Estimation Model (Ensemble Survival Framework)
Three complementary models:
- Cox Proportional Hazards (interpretable hazard ratios)
- Weibull Regression (stage-stratified baseline survival curves)
- Random Survival Forest (nonlinear feature interactions)
Workflow:
- TNM-based stratification (localized / N1 / M1)
- Weibull baseline curve
- Cox-based patient-specific risk shift
- RSF nonlinear modulation
- Ensemble averaging
Outputs:
- 5-, 10-, 15-year survival probabilities
- Expected survival time (years)
- Monotonic survival validation checks
Internal QA:
- Survival ordering check (M1 < N1 < localized)
- Probability bounds enforcement
- Curve monotonicity
🏥 Conference Presentation
This work was presented at:
SIIM-CAIMI 2025
Society for Imaging Informatics in Medicine – Conference on Artificial Intelligence in Medical Imaging
-
Poster: AI-in-oncology-Poster.pdf
-
Featured in [UChicago Data Science Institution News]
📁 Data
-
500 synthetic longitudinal prostate cancer records
-
5–7 time-stamped visits per patient
-
Variables include:
- PSA (with kinetics)
- Gleason grade
- TNM stage
- Bone lesion count
- Visceral metastasis flag
- ALP, LDH, albumin, hemoglobin
- Treatment history
- Weight trends
Validated for medical plausibility by a practicing radiologist.
🔎 Evaluation & Validation
Structured Models
- Held-out patient-level testing
- Directional consistency validation
- Internal statistical QA checks
LLM Components
-
Dedicated validator agent
-
Detection of:
- Hallucinated content
- Missing patient data
-
Iterative retry loop
-
Verbatim literature line insertion (no citation hallucination)
No summary finalized with unresolved hallucinations.
🎯 Key Contributions
- MCP-governed agentic clinical AI framework
- Hallucination-resistant RAG integration
- Survival ensemble integrated into LLM workflow
- Deterministic literature grounding
- Modular API-based predictive model integration
- Fully auditable report generation pipeline
📌 Research Context
This project addresses limitations in current oncology AI systems:
- Lack of temporal reasoning
- Hallucination in generative summaries
- Non-auditable clinical AI outputs
- Separation between ML survival models and narrative reasoning
The architecture demonstrates a reproducible pattern for safe LLM deployment in healthcare.
⚠ Disclaimer
This system was trained and evaluated on sample data and is intended for research demonstration only.