MCP AI Assistant ; Model Context Protocol with llama3.2 & Ollama
MCP AI Assistant
Replace the fragile regex parsing of TP2 with MCP (Model Context Protocol) — the industrial-grade standard for LLM ↔ tool communication.
Overview
In TP2 we built a ReAct agent that worked, but relied on brittle regex parsing of the LLM's free-text output. If Gemma 3 wrote Action: light_on(salon) instead of Action: light_on, the parser could silently fail.
MCP (Model Context Protocol) fixes this by replacing free-text commands with structured JSON-RPC: the LLM emits a proper function call object, the runtime parses it natively, and the server executes it with full type validation.
Architecture
┌───────────────────────────────────────────────────────────────┐
│ YOUR MACHINE │
│ │
│ ┌─────────────────┐ stdio pipe ┌──────────────────────┐ │
│ │ client.py │ ◄────────────► │ server.py │ │
│ │ (MCP Client) │ JSON-RPC │ (MCP Server) │ │
│ └────────┬────────┘ └──────────┬───────────┘ │
│ │ │ │
│ │ HTTP │ os.listdir │
│ │ localhost:11434 │ open(file) │
│ ┌────────▼────────┐ ┌──────────▼───────────┐ │
│ │ Ollama Server │ │ logs_simulation/ │ │
│ │ llama3.2:3b │ │ system.log │ │
│ └─────────────────┘ │ access.log │ │
│ └──────────────────────┘ │
└───────────────────────────────────────────────────────────────┘
The data journey
Python function → @mcp.tool() → JSON Schema → Ollama tools=[]
↑ ↓
Tool result ← session.call_tool() ← tool_call JSON ← LLM
Features
| Feature | Description |
|---|---|
| MCP server | FastMCP auto-generates JSON schemas from type hints + docstrings |
| MCP client | Discovers tools at runtime — never hardcodes function names |
| Native tool calls | llama3.2:3b returns structured JSON, no regex needed |
| Type-safe | inputSchema enforces int vs str before execution |
| Modular | Swap server.py for smart_home_server.py with zero client changes |
| Secure | Path traversal sanitization in read_log_head() |
Project Structure
tp3-mcp-agent/
├── launcher.py # Spawns the MCP server subprocess & connects client
├── client.py # MCP client + Ollama chat loop
├── server.py # MCP server exposing log-reading tools
├── smart_home_server.py # Bonus: MCP server re-using TP2's SmartHome class
├── config.py # Model name + terminal color codes
├── setup.py # Generates dummy log files for simulation
├── smart_home.py # SmartHome class (from TP2, needed for bonus server)
└── README.md
Setup
Prerequisites: Ollama running (see TP1). This TP uses llama3.2:3b (not Gemma 3) because it natively supports Ollama's function-calling protocol.
# 1. Pull the model
ollama pull llama3.2:3b
# 2. Install the MCP Python library
pip install mcp
# 3. Generate simulation log files
python setup.py
# 4. Launch the full system (server + client)
python launcher.py
Key Concepts
Why not Gemma 3?
Gemma 3 was not specifically trained for structured tool calling. Getting it to reliably output valid JSON function calls requires either fine-tuning or constrained generation (see Bonus section). llama3.2:3b supports Ollama's native tools=[] parameter out of the box.
@mcp.tool() — Zero-boilerplate tool registration
@mcp.tool()
def read_log_head(filename: str, lines: int = 10) -> str:
"""Read the first N lines of a log file."""
...
FastMCP reads the type annotations (str, int) and docstring to auto-generate the full JSON Schema. No manual JSON writing required.
Tool conversion: MCP → Ollama
{
"type": "function",
"function": {
"name": tool.name,
"description": tool.description,
"parameters": tool.inputSchema, # Already a valid JSON Schema
}
}
The chat loop (Question 5 — explanation)
1. User types a message
2. Client sends: messages + tools=ollama_tools → llama3.2
3. LLM responds with tool_calls=[{"function": {"name": "read_log_head", "arguments": {...}}}]
4. Client calls session.call_tool("read_log_head", {"filename": "system.log", "lines": 5})
5. MCP server executes the Python function and returns the result
6. Client appends {"role": "tool", "content": result} to history
7. Client sends updated history → llama3.2 again (no tools this time)
8. LLM generates a natural-language answer using the tool result
9. Print and loop
Reliability Comparison
| Criterion |(last project) Manual ReAct | MCP |
|---|---|---|
| Parsing method | Regex on free text | Native JSON struct |
| Type validation | None | JSON Schema (inputSchema) |
| Robustness to format drift | Breaks silently | Structured contract |
| Adding a new tool | Edit regex + prompt | Add @mcp.tool() |
| Client knows tool names | Hardcoded in prompt | Discovered at runtime |
| Arbitrary code exec risk | Medium (whitelist helps) | Low (schema-validated) |
Swapping Servers (Question 9)
To control the smart home via MCP instead of reading logs, change one line in launcher.py:
# Log server (default)
server_script = os.path.join(script_dir, "server.py")
# Smart home server (Question 9)
server_script = os.path.join(script_dir, "smart_home_server.py")
The client discovers the new tools automatically. No other changes needed.
Bonus — Constrained Generation (Deterministic JSON)
MCP with llama3.2 is still probabilistic: we ask the model nicely to output JSON and it usually complies. For 100% guaranteed valid JSON, use constrained generation with llama.cpp:
# Force every token to comply with a JSON grammar
llama = Llama(model_path="model.gguf", n_ctx=2048)
output = llama(
prompt,
grammar=LlamaGrammar.from_string(json_grammar)
)
The engine sets the probability of any token that would violate the grammar to 0 before sampling. The model is physically unable to produce invalid JSON — turning a probabilistic prototype into a deterministic industrial tool.
References
- Model Context Protocol — Anthropic
- FastMCP documentation
- llama3.2 — Meta AI
- llama.cpp constrained generation
License
Academic project — Polytech Nantes, IDIA.