Offline-first MCP server for UserGate public documentation
UserGate MCP
Offline-first Model Context Protocol server for UserGate public documentation.
UserGate MCP builds a reproducible local documentation corpus from official public UserGate documentation sources, indexes it with SQLite FTS, and exposes it to MCP clients through search, retrieval, and task-oriented operational workflow tools.
It is designed for documentation-backed administration work where responses should be grounded in a local, auditable corpus instead of live web lookup at query time.
This project is an independent documentation tooling project and is not affiliated with or endorsed by UserGate.
What It Provides
- Offline-first ingestion from official public UserGate documentation surfaces.
- Coverage audit before indexing, so missing or failed upstream files are visible.
- Local SQLite FTS index over normalized HTML and PDF text chunks.
- Content-hash deduplication with alternate source URL preservation.
- Product, version, language, and document-type metadata for targeted retrieval.
- MCP tools for status, facets, document search, exact document fetch, and section-level retrieval.
- MCP resources and prompts for guided documentation usage and quick corpus orientation.
- Admin-oriented workflow packs for upgrade planning, VPN, authentication, routing, web policy, captive portal, HA, certificates, SIEM, Management Center, and Log Analyzer operations.
- Retrieval regression evaluation for ranking and source-selection changes.
- Workflow-pack regression evaluation for operational checklist and source quality.
- Query expansion for common English/Russian administration terminology gaps.
- Human inspection commands for exploring the generated SQLite index without an MCP client.
- Repository hygiene that keeps downloaded manuals, raw pages, reports, and the SQLite index out of Git.
Corpus Baseline
The generated corpus is intentionally not committed. A current local sync builds an index with this approximate shape:
| Metric | Value |
| --- | ---: |
| Public documentation URLs discovered | 3,114 |
| Indexed documents | 2,683 |
| Indexed chunks | 40,633 |
UserGate documentation changes over time. Run the sync pipeline before relying on corpus statistics in a new environment.
Architecture
src/usergate_mcp/
discovery.py Discover sitemap, crawl-discovered links, and PDF catalog entries
download.py Store raw official documentation files locally
audit.py Verify manifest coverage and report upstream failures
extract.py Normalize HTML and PDF text
metadata.py Infer product, version, language, and document type
index.py Build and query the SQLite FTS index
cli.py Expose status, search, document, section, catalog, and workflow commands
evaluation.py Run retrieval regression checks against golden queries
inspect.py Print human-readable index inventory as JSON
workflows.py Assemble operational source packs and checklists
server.py Expose the local corpus through MCP tools
update.py Bootstrap or incrementally refresh the generated corpus
sync.py Run discovery, download, audit, and indexing in order
data/
raw/ Generated downloaded corpus, ignored by Git
reports/ Generated audit and download reports, ignored by Git
index/ Generated SQLite index, ignored by Git
manifests/ Generated discovery manifests, ignored by Git
Quick Start
python -m venv .venv
.\.venv\Scripts\python -m pip install -e .[dev]
.\.venv\Scripts\python -m usergate_mcp.update
.\.venv\Scripts\python -m usergate_mcp.server
usergate_mcp.update is the recommended corpus command. On a fresh checkout it downloads public documentation files into a local generated corpus and builds the index. On an existing checkout it compares the newly discovered source set with the previous local state, downloads new or changed files, refreshes reports, and rebuilds the index.
Individual pipeline stages can also be run directly:
.\.venv\Scripts\python -m usergate_mcp.discovery
.\.venv\Scripts\python -m usergate_mcp.download
.\.venv\Scripts\python -m usergate_mcp.audit
.\.venv\Scripts\python -m usergate_mcp.index
Useful update modes:
.\.venv\Scripts\python -m usergate_mcp.update --check-only
.\.venv\Scripts\python -m usergate_mcp.update --force-redownload
.\.venv\Scripts\python -m usergate_mcp.update --skip-index
Common CLI commands:
.\.venv\Scripts\python -m usergate_mcp.cli status
.\.venv\Scripts\python -m usergate_mcp.cli search "ipsec vpn tunnel" --product NGFW
.\.venv\Scripts\python -m usergate_mcp.cli sections "ssl inspection certificate" --product NGFW
.\.venv\Scripts\python -m usergate_mcp.cli workflow vpn --scenario "ipsec tunnel" --version 7.x
Installed script aliases include usergate-status, usergate-search, usergate-sections, usergate-doc, usergate-section, usergate-catalog, and usergate-workflow.
MCP Setup
See docs/codex-mcp-setup.md for a Codex stdio MCP configuration example and smoke-check commands. See docs/mcp-client-setup.md for Codex, Claude Code, Claude Desktop, Cursor, and VS Code configuration snippets.
The server is intended to run from a local checkout with a generated corpus and index. The repository does not ship the downloaded documentation corpus.
MCP Tool Surface
Core retrieval tools:
get_project_status()list_index_facets()search_docs(query, product?, version?, language?, doc_type?, limit?)get_document(doc_id_or_url, max_chars?)search_doc_sections(query, product?, version?, language?, doc_type?, limit?)get_doc_section(chunk_id, context_chunks?)list_documents_catalog(product?, doc_type?, language?, limit?)
Operational workflow tools:
prepare_upgrade_plan(product, target_version, current_version?, language?)prepare_vpn_debug_pack(scenario, version?, language?)prepare_auth_debug_pack(auth_type, version?, language?)prepare_change_research(task, product?, version?, language?)prepare_routing_debug_pack(scenario, version?, language?)prepare_web_policy_pack(scenario, version?, language?)prepare_captive_portal_pack(scenario, version?, language?)prepare_backup_ha_pack(product, scenario, version?, language?)prepare_cert_lifecycle_pack(scenario, version?, language?)prepare_ops_platform_pack(product, scenario, version?, language?)
Workflow tools return the queries used, a focused checklist, and deduplicated local documentation sources with snippets, headings, locators, chunk IDs, and document IDs.
MCP resources expose compact orientation data:
usergate://statususergate://facetsusergate://productsusergate://manualsusergate://reports/updateusergate://reports/auditusergate://reports/indexusergate://documents/{doc_id}
Prompt templates include guide_usergate_docs_usage, admin_lookup, change_plan, upgrade_prompt, vpn_debug_prompt, routing_debug_prompt, web_policy_prompt, certificate_prompt, and ops_platform_prompt.
Retrieval Behavior
Search ranking is tuned for administration use cases:
- articles, manuals, and hardware documents are preferred by default over less relevant glossary or release-note results;
- query intent can boost release notes, glossary entries, or manuals when the wording asks for them;
- query expansion bridges common English/Russian terminology gaps such as
collector,certificate,routing,backup,restore, andtroubleshooting; - results are diversified by document so one long manual does not dominate the result set;
- section-aware retrieval can cite narrow chunks and manual pages instead of only full-document matches;
- product-aware scoring reduces cross-product noise for Management Center, SIEM, and Log Analyzer workflows.
Development
Install development dependencies and run the test suite:
.\.venv\Scripts\python -m pip install -e .[dev]
.\.venv\Scripts\python -m pytest -q
Some workflow tests expect a built local index at data/index/usergate_docs.sqlite3. Run python -m usergate_mcp.sync first in a fresh checkout.
Index-dependent tests are marked as integration and are skipped when the local SQLite index is absent. To run only those checks after building the corpus:
.\.venv\Scripts\python -m pytest -q -m integration
Run retrieval regression checks:
.\.venv\Scripts\python -m usergate_mcp.evaluation
The default suite contains 15 golden admin scenarios across NGFW, SWG, SIEM, Management Center, Log Analyzer, WAF, and UserGate Client. See docs/retrieval-evaluation.md for case format and query-expansion guidance.
Run workflow-pack regression checks:
.\.venv\Scripts\python -m usergate_mcp.workflow_evaluation
The workflow suite checks task-oriented packs for checklist presence, source counts, product-appropriate sources, and deduplication. See docs/workflow-evaluation.md.
Inspect the generated SQLite index without an MCP client:
.\.venv\Scripts\python -m usergate_mcp.inspect
See docs/human-inspection.md for SQLite and Datasette inspection examples.
Repository Hygiene
Generated files are excluded from Git:
data/raw/data/reports/data/index/data/manifests/*.json.venv/- Python cache and test cache directories
Only source code, tests, documentation, and placeholder .gitkeep files are intended to be committed.
Contributing
See CONTRIBUTING.md for development setup, test guidance, and repository hygiene rules.
Security
See SECURITY.md for vulnerability reporting guidance and security scope.
License
The project code is released under the MIT License. See NOTICE for trademark, affiliation, and generated-corpus notes.
Source Scope
The current public source of truth is https://docs.usergate.com/ because it provides:
- an official sitemap;
- a public PDF documentation catalog;
- canonical documentation pages that observed public support URLs redirect or mirror into.
Official linked attachments from support.usergate.com and static.usergate.com are retained when referenced by in-scope documentation pages.
Limitations
- The project indexes public documentation only.
- Authenticated customer materials and private support artifacts are out of scope.
- Image-only PDF content is skipped unless an OCR path is added later.
- Sync results depend on availability and structure of the upstream public documentation site.