MCP Server that turns natural-language questions into SQL queries on CSV data. Built with Python, FastMCP, and SQLite for Gemini CLI integration.
Data Query Builder — MCP Server
Overview
An MCP server that turns natural-language data questions into SQL queries. Load any CSV file into an in-memory SQLite database, explore schemas, run read-only queries, and compute column statistics — all through AI-powered tool use via Gemini CLI.
Setup
cd Project
python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS / Linux
# source .venv/bin/activate
pip install "mcp[cli]"
No additional dependencies — sqlite3 and csv are part of the Python standard library.
Gemini CLI Configuration
Add the following to ~/.gemini/settings.json:
{
"mcpServers": {
"data-query-builder": {
"command": "C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\.venv\\Scripts\\python.exe",
"args": ["C:\\Users\\abdul\\Downloads\\Electives\\MCP\\Project\\server.py"]
}
}
}
After editing, relaunch Gemini CLI to pick up the new server.
Tools
load_csv
Description: Load a CSV file into a new SQLite table with auto-detected column types (INTEGER, REAL, TEXT).
| Parameter | Type | Required | Description |
|-------------|-------|----------|------------------------------------------|
| file_path | str | Yes | Absolute or relative path to the CSV file |
| table_name | str | Yes | Name for the new SQLite table |
Example: "Load the file sample_data.csv as a table called sales" → calls load_csv
list_tables
Description: List all tables currently loaded in the database with their row counts and column info.
| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | (none) | | | |
Example: "What data do I have loaded?" → calls list_tables
describe_schema
Description: Show the full database schema — all tables, columns, and their data types.
| Parameter | Type | Required | Description | |-----------|------|----------|-------------| | (none) | | | |
Example: "What columns does the sales table have?" → calls describe_schema
run_query
Description: Execute a read-only SQL SELECT query and return formatted results. Rejects any write operations (DROP, DELETE, ALTER, INSERT, UPDATE) for safety.
| Parameter | Type | Required | Description |
|-----------|------|----------|--------------------------------------------|
| sql | str | Yes | A SQL SELECT query |
| limit | int | No | Max rows to return (default 50, max 500) |
Example: "Show me the average price per category" → calls run_query with SELECT category, AVG(price) FROM sales GROUP BY category
get_statistics
Description: Compute summary statistics for a column — count, min, max, mean, sum, and null count (numeric columns) or count, distinct, nulls (text columns).
| Parameter | Type | Required | Description |
|-------------|------|----------|---------------------------------|
| table_name | str | Yes | Name of the table to analyze |
| column | str | Yes | Column to compute statistics for |
Example: "Give me stats on the price column" → calls get_statistics with table_name="sales", column="price"
Resources
db://schema
Current database schema as JSON — lists all tables, their columns, and data types.
db://query-history
JSON list of all SQL queries executed this session, including SQL text, row count, and column names.
Security
run_queryrejects any SQL containingDROP,DELETE,ALTER,INSERT,UPDATE,CREATE, or other write keywords.- The database is in-memory only — no persistent changes to disk.
- A default row limit of 50 prevents excessively large outputs.
Limitations
- In-memory database: Data does not persist between server restarts. CSVs must be reloaded each session.
- CSV only: Does not support Excel, JSON, or other file formats directly.
- No joins at load time: Each CSV becomes a separate table; joins must be done via SQL queries.
- Type detection is heuristic: Based on the first 100 rows — mixed-type columns may be misdetected.
Test Scenarios
| Scenario | Expected Tool Sequence |
|----------|----------------------|
| "Load sales data and find highest revenue region" | load_csv → describe_schema → run_query |
| "Average price per product category?" | load_csv → run_query → get_statistics |
| "What tables are loaded and what columns do they have?" | list_tables → describe_schema |
Comparison Results
With vs. Without Tools
| Dimension | Without Tools | With Tools | |-------------|-------------------------------------------|---------------------------------------------| | Accuracy | Hallucinates numbers or uses only pasted data | Uses real computed values from SQL queries | | Specificity | Generic advice about data analysis | Specific answers from actual data | | Completeness | Limited to what's in the prompt | Can explore schema, run multiple queries | | Confidence | Hedges and qualifies heavily | Cites exact tool results and row counts | | Latency | One fast response | Multiple tool-call round-trips |
Prompting Strategy Comparison
| Aspect | Strategy 1 (Minimal) | Strategy 3 (Expert Workflow) | |----------------------|------------------------------------------|--------------------------------------------------| | Tool calls triggered | Fewer, less targeted | More, systematically sequenced | | Planning | Dives in immediately | States plan before acting | | Synthesis quality | Shallow summary | Structured analysis citing tool outputs | | Errors / dead ends | May skip schema exploration | Explores schema first, then queries purposefully |