Data Query Builder — MCP Server

Overview

An MCP server that turns natural-language data questions into SQL queries. Load any CSV file into an in-memory SQLite database, explore schemas, run read-only queries, and compute column statistics — all through AI-powered tool use via Gemini CLI.

Setup

cd Project
python -m venv .venv

# Windows
.venv\Scripts\activate

# macOS / Linux
# source .venv/bin/activate

pip install "mcp[cli]"

No additional dependencies — sqlite3 and csv are part of the Python standard library.

Gemini CLI Configuration

Add the following to ~/.gemini/settings.json:

{
  "mcpServers": {
    "data-query-builder": {
      "command": "C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\.venv\\Scripts\\python.exe",
      "args": ["C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\server.py"]
    }
  }
}

After editing, relaunch Gemini CLI to pick up the new server.

Tools

`load_csv`

Description: Load a CSV file into a new SQLite table with auto-detected column types (INTEGER, REAL, TEXT).

| Parameter | Type | Required | Description | |-------------|-------|----------|------------------------------------------| | file_path | str | Yes | Absolute or relative path to the CSV file | | table_name | str | Yes | Name for the new SQLite table |

Example: "Load the file sample_data.csv as a table called sales" → calls load_csv

`list_tables`

Description: List all tables currently loaded in the database with their row counts and column info.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | (none) | | | |

Example: "What data do I have loaded?" → calls list_tables

`describe_schema`

Description: Show the full database schema — all tables, columns, and their data types.

| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | (none) | | | |

Example: "What columns does the sales table have?" → calls describe_schema

`run_query`

Description: Execute a read-only SQL SELECT query and return formatted results. Rejects any write operations (DROP, DELETE, ALTER, INSERT, UPDATE) for safety.

| Parameter | Type | Required | Description | |-----------|------|----------|--------------------------------------------| | sql | str | Yes | A SQL SELECT query | | limit | int | No | Max rows to return (default 50, max 500) |

Example: "Show me the average price per category" → calls run_query with SELECT category, AVG(price) FROM sales GROUP BY category

`get_statistics`

Description: Compute summary statistics for a column — count, min, max, mean, sum, and null count (numeric columns) or count, distinct, nulls (text columns).

| Parameter | Type | Required | Description | |-------------|------|----------|---------------------------------| | table_name | str | Yes | Name of the table to analyze | | column | str | Yes | Column to compute statistics for |

Example: "Give me stats on the price column" → calls get_statistics with table_name="sales", column="price"

Resources

`db://schema`

Current database schema as JSON — lists all tables, their columns, and data types.

`db://query-history`

JSON list of all SQL queries executed this session, including SQL text, row count, and column names.

Security

run_query rejects any SQL containing DROP, DELETE, ALTER, INSERT, UPDATE, CREATE, or other write keywords.
The database is in-memory only — no persistent changes to disk.
A default row limit of 50 prevents excessively large outputs.

Limitations

In-memory database: Data does not persist between server restarts. CSVs must be reloaded each session.
CSV only: Does not support Excel, JSON, or other file formats directly.
No joins at load time: Each CSV becomes a separate table; joins must be done via SQL queries.
Type detection is heuristic: Based on the first 100 rows — mixed-type columns may be misdetected.

Test Scenarios

| Scenario | Expected Tool Sequence | |----------|----------------------| | "Load sales data and find highest revenue region" | load_csv → describe_schema → run_query | | "Average price per product category?" | load_csv → run_query → get_statistics | | "What tables are loaded and what columns do they have?" | list_tables → describe_schema |

Comparison Results

With vs. Without Tools

| Dimension | Without Tools | With Tools | |-------------|-------------------------------------------|---------------------------------------------| | Accuracy | Hallucinates numbers or uses only pasted data | Uses real computed values from SQL queries | | Specificity | Generic advice about data analysis | Specific answers from actual data | | Completeness | Limited to what's in the prompt | Can explore schema, run multiple queries | | Confidence | Hedges and qualifies heavily | Cites exact tool results and row counts | | Latency | One fast response | Multiple tool-call round-trips |

Prompting Strategy Comparison

| Aspect | Strategy 1 (Minimal) | Strategy 3 (Expert Workflow) | |----------------------|------------------------------------------|--------------------------------------------------| | Tool calls triggered | Fewer, less targeted | More, systematically sequenced | | Planning | Dives in immediately | States plan before acting | | Synthesis quality | Shallow summary | Structured analysis citing tool outputs | | Errors / dead ends | May skip schema exploration | Explores schema first, then queries purposefully |

MCP Servers