**Federated MCP server** that joins relational metadata (MySQL) with timeseries values (InfluxDB) behind a single, LLM-friendly entity layer.
mcp-federated-data
Federated MCP server that joins relational metadata (MySQL) with timeseries values (InfluxDB) behind a single, LLM-friendly entity layer.
The problem
When an LLM needs to answer a question like:
"show me last-7-day temperature trends for active sensors in zone A"
it has to traverse two stores:
- Relational metadata (what sensors exist, where, what kind, what units) — MySQL
- Timeseries values (the readings over time) — InfluxDB
Plugging in two independent MCP servers forces the LLM to:
- Query MySQL → find matching entities → extract their ids
- Push those ids into an InfluxDB query
- Stitch the two result sets in its own context
LLMs reliably stumble at step 3 — especially when the join key is composite (e.g. an InfluxDB tag is the concatenation of two relational fields).
This server collapses all three steps into one tool call.
What the LLM sees
Five tools, no SQL or Flux in the prompt:
| Tool | Purpose |
|---|---|
| list_entities | Filter business entities by their fields |
| get_entity | Look up a single entity by primary key |
| list_related | Walk a configured relation between entities |
| get_entity_timeseries | Federated — metadata + timeseries in one call |
| compare_timeseries | Compare 2–20 specific entities over the same window |
Every tool ships with rich JSON-Schema descriptions so the LLM picks the right arguments without prompt tricks.
How it differs from naive two-server setups
| Scenario | Two independent MCP servers | mcp-federated-data |
|---|---|---|
| "trends for sensors in zone A" | LLM: query MySQL → extract ids → push into Flux → join in its head | one get_entity_timeseries call |
| Composite tag like {deviceId}.{signalId} | LLM composes strings in-prompt — error-prone | composer: "{deviceId}.{signalId}" in YAML, server does it |
| Metadata ↔ timeseries alignment | LLM does the join, frequently mis-pairs | server joins by configured key |
| Data-volume blow-up | unprotected | enforced max entities + auto-downsampling + per-entity point cap |
| Business semantics for the LLM | raw CREATE TABLE only | YAML fields with description |
| Audit trail | none | structured per-call audit log |
Quickstart
git clone https://github.com/baller-coder/mcp-federated-data.git
cd mcp-federated-data
pnpm install
# Sample environment (MySQL + InfluxDB in Docker, with seeded data)
docker compose -f examples/industrial-monitoring/docker-compose.yml up -d
pnpm seed
# Start the MCP server over stdio
pnpm dev -- --config examples/industrial-monitoring/config.yaml
Connect from any MCP client (Claude Desktop, Cursor, mcp-inspector).
Try it
// list active sites
{
"name": "list_entities",
"arguments": {
"entity": "site",
"filters": [{ "field": "status", "op": "eq", "value": "active" }]
}
}
// list sensors attached to site 1
{
"name": "list_related",
"arguments": {
"source_entity": "site",
"source_id": 1,
"target_entity": "sensor"
}
}
// federated query — metadata + timeseries in ONE call
{
"name": "get_entity_timeseries",
"arguments": {
"entity": "sensor",
"filters": [
{ "field": "site_id", "op": "eq", "value": 1 },
{ "field": "kind", "op": "eq", "value": "temperature" }
],
"time_range": { "start": "-7d" },
"aggregation": { "window": "1h", "fn": "mean" }
}
}
// compare 3 specific sensors over the same window
{
"name": "compare_timeseries",
"arguments": {
"entity": "sensor",
"ids": [101, 201, 301],
"time_range": { "start": "-24h" },
"aggregation": { "window": "10m", "fn": "mean" }
}
}
Configuration
The whole server behavior is defined in a single YAML file. Three sections.
1. Datasources
datasources:
- name: business
type: mysql
host: localhost
port: 3306
database: my_db
username: readonly_user
password: secret
- name: timeseries
type: influxdb
url: http://localhost:8086
token: my-token
org: my_org
bucket: my_bucket
2. Entities
Each entity binds to a relational table or view, with optional relations and an optional timeseries binding.
entities:
- name: site
description: Physical monitoring location.
source:
datasource: business
table: sites
primary_key: id
fields:
- { name: id, type: number }
- { name: name, type: string, description: Display name }
- { name: region, type: string }
- { name: status, type: string, description: "active / inactive / maintenance" }
- name: sensor
description: A sensor attached to a site.
source:
datasource: business
table: sensors
primary_key: id
fields:
- { name: id, type: number }
- { name: site_id, type: number }
- { name: name, type: string }
- { name: kind, type: string, description: "temperature / humidity / voltage / ..." }
- { name: unit, type: string }
relations:
- target: site
type: many-to-one
local_key: site_id
foreign_key: id
timeseries:
datasource: timeseries
measurement: sensor_data
value_field: value
join_key:
local: id
remote_tag: sensor_id
3. Defaults (guard rails)
defaults:
max_entities_per_query: 50
max_points_per_entity: 500
query_timeout_ms: 15000
Composite join keys (v0.2 highlight)
When the InfluxDB tag value is a composite of multiple relational fields — common in IoT / industrial systems where a tag like 400001240.438000066 encodes {deviceId}.{signalId} — declare it like this:
timeseries:
measurement: sensor_data
value_field: value
join_key:
local: [device_id, signal_id]
remote_tag: signal_id
composer: "{device_id}.{signal_id}"
The server:
- Pulls metadata from MySQL (the composer's local fields are auto-included in
SELECT). - Composes each row's tag value using the template.
- Pushes the composed list into the InfluxDB tag filter.
- Joins results back by the same template.
Single-field bindings (the v0.1 form) keep working unchanged — the server treats them as a one-element composite, so all paths stay uniform.
How it works — the 50-line core
Every federated tool follows the same three steps:
- Fetch metadata — relational query against the business store, with safe-identifier checks and parameterised
WHERE. Allowed filter fields are restricted to those declared in the entity config. - Fetch timeseries — pull join-key values from step 1, push them into a tag filter against the timeseries store, with optional aggregation and per-call point cap.
- Merge — group timeseries points by the remote tag value, then stitch each metadata row with its sorted series.
No SQL parser. No cross-store query planner. By design.
Architecture
┌──────────────────────────────────┐
│ MCP client (Claude / Cursor) │
└────────────┬─────────────────────┘
│ stdio (JSON-RPC)
▼
┌──────────────────────────────────┐
│ mcp-federated-data │
│ ┌────────────────────────────┐ │
│ │ Tools (5 tools) │ │
│ ├────────────────────────────┤ │
│ │ Entity registry │ │
│ │ Join-key normalizer │ │
│ │ Composer engine │ │
│ │ Guards (limits/timeout) │ │
│ │ Audit logger │ │
│ ├────────────────────────────┤ │
│ │ Datasource adapters │ │
│ └─────────┬────────┬─────────┘ │
└────────────┼────────┼────────────┘
▼ ▼
┌──────┐ ┌──────────┐
│MySQL │ │InfluxDB │
└──────┘ └──────────┘
Where this fits
mcp-federated-data is schema-driven, not domain-specific. It applies anywhere business metadata lives in a relational store and observed values live in a timeseries store:
- IoT device telemetry
- Industrial process monitoring
- Asset performance management
- Building automation
- Energy / power monitoring
- Environmental sensor networks
- Network device monitoring
If your stack is MySQL + InfluxDB and you want LLMs to reason over it — this server is for you.
Comparison with related projects
| | mcp-federated-data | mcp-server-mysql | mcp-server-influxdb | Wren AI / Vanna | |---|:---:|:---:|:---:|:---:| | Cross-store join | ✅ | ❌ | ❌ | partial (own engine) | | Composite tag composer | ✅ | n/a | ❌ | ❌ | | LLM-safe SQL guards | ✅ | varies | n/a | ✅ | | Schema as semantic layer | ✅ | ❌ | ❌ | ✅ | | Configuration | YAML | env / args | env / args | dedicated DSL | | Scope | narrow but deep | thin wrapper | thin wrapper | full BI platform |
Contributing
Issues and PRs welcome. Before opening one:
- For new data-source adapters — open an issue first so we can align on the interface.
- For new tools — match the existing JSON-Schema + audit-log pattern.
- Public APIs must keep backward compatibility within minor versions.
Acknowledgments
- Model Context Protocol — the standard this server speaks
@modelcontextprotocol/sdk— TypeScript reference SDK- Cube.dev / dbt — prior art on the "semantic layer" idea
- InfluxDB and MySQL communities