PTXprint MCP

Stateless content-addressed build system for Bible typesetting via PTXprint, exposed as an MCP server, plus a governance knowledge base for AI agents that drive it.

What this repo is

This repo serves two coupled purposes:

An MCP server implementation — Cloudflare Worker + Container + Durable Objects + R2, exposing four tools (submit_typeset, get_job_status, cancel_job, get_upload_url) that an AI agent uses to drive PTXprint headlessly.
A governance knowledge base — markdown documents under canon/ that teach AI agents how to construct typesetting payloads, interpret results, and handle the long tail of PTXprint operational concerns.

The two are kept in one repo because they are tightly coupled: a tool surface change requires a governance update, and vice versa. Splitting them creates drift; co-locating them keeps the contract honest.

Project status

Pre-implementation. The v1.2 specification is drafted and under review. No Worker, Container, or deployment code exists yet. The first autonomous coding run is planned for hackathon week starting 2026-04-28.

Read the spec: canon/specs/ptxprint-mcp-v1.2-spec.md.

Read the architecture overview: ARCHITECTURE.md.

What PTXprint is

PTXprint is a typesetting tool maintained by SIL that produces print-ready PDFs from Paratext Bible translation projects. It wraps a XeTeX macro engine in a Python/GTK GUI, with an extensive configuration surface (~400 settings across ~25 sections, plus stylesheets, picture lists, paragraph adjustments, and more).

This MCP server exposes PTXprint's headless run path (no GUI required) as a job queue, so AI agents can drive it conversationally on behalf of translation teams.

Upstream: sillsdev/ptx2pdf.

Architecture in one paragraph

The agent constructs a payload describing one typesetting job (config files inline as text; USFM sources, fonts, and figures referenced by URL with sha256 verification) and calls submit_typeset. A Cloudflare Worker validates the payload, computes its sha256, and either returns the cached output URL (if the same payload has been seen before) or dispatches to a Cloudflare Container that materializes a scratch directory, runs PTXprint, and uploads the resulting PDF to R2 at a content-addressed path. State is held in Durable Objects, polled via get_job_status. PTXprint itself is treated as a pure function from inputs to PDF — re-running an unchanged build is free.

Quick start

For agent operators: Once deployed, point your MCP-aware agent at the server URL and load the canon repo via oddkit MCP for governance retrieval. The agent's reasoning loop becomes: search canon → construct payload → submit job → poll → handle result.

For developers: Implementation is forthcoming. The spec in canon/specs/ is the build target; deployment uses Cloudflare's wrangler CLI.

Repository layout

ptxprint-mcp/
├── README.md                       (this file)
├── LICENSE                         (MIT)
├── ARCHITECTURE.md                 (architecture overview)
├── CONTRIBUTING.md                 (how to contribute)
├── canon/                          (oddkit-readable knowledge base)
│   ├── README.md                   (canon directory index + frontmatter conventions)
│   ├── specs/                      (versioned MCP server specifications)
│   ├── governance/                 (agent-facing operational knowledge)
│   ├── handoffs/                   (cross-session work transfer documents)
│   └── encodings/                  (DOLCHEO+H session encodings)
└── (src/, wrangler.toml — added when implementation begins)

License

Acknowledgements

Upstream

PTXprint is developed by SIL Global. This project wraps and extends that work; it does not modify PTXprint itself. The PTXprint MASTER SLIDES deck (438 slides authored by Martin Hosken, Mark Penny, and David Gardner) and the upstream sillsdev/ptx2pdf repository were primary canon sources.

Methodology and tooling

This project was scoped, designed, and bootstrapped using klappy's oddkit — an MCP-served knowledge base and discipline layer for Outcomes-Driven Development (ODD). ODD treats exploration, planning, and execution as distinct epistemic modes with different rules: planning front-loads ambiguity (questions are the work product); execution does not (questions should be answered before code is written). The architectural conventions applied here — vodka architecture, KISS, DRY canon, mode discipline, verification-requires-fresh-context — come from the broader klappy canon.

Discovery process

The pre-implementation phase ran across five conversational sessions, processing multiple sources before any spec was committed:

Planning meeting transcripts with PTXprint's SME (Martin Hosken) and the project operator. Each transcript was encoded as DOLCHEO+H artifacts — Decisions, Observations, Learnings, Constraints, Handoffs, plus Open questions — so context survives across sessions and remains agent-searchable. The five session encodings live under canon/encodings/.
The PTXprint MASTER SLIDES deck (438 slides) — processed via Epistemic Surface Extraction (ESE), a lens-based pass over artifacts too large to ingest in one shot. The resulting surface.json and surface.md artifacts seeded the agent-facing operational knowledge base.
The sillsdev/ptx2pdf source repository — the same ESE method applied to a code repository, surfacing entry points, project shape, configuration model, failure modes, and deployment footprint.
The first-pass MCP server PoC and its specification — analyzed against vodka-architecture principles to identify domain opinions that had drifted into server code, driving the v1.0 → v1.1 → v1.2 simplification (17 tools → 7 → 4) documented in canon/encodings/transcript-encoded-session-5.md.
A PDF extraction of the deck → operator-authored governance document — the most concrete agent-facing canon material drafted to date, currently being aligned to v1.2 per canon/handoffs/governance-update-handoff.md.
The operator's ~1000 real PTXprint configurations (private corpus) — used to validate that the proposed payload schema covers real-world variation across many translation projects.

oddkit provided the discipline throughout: anchoring elapsed time across long sessions; retrieving and applying canon at every mode transition; orienting at context shifts; pressure-testing decisions before they hardened into specs; validating completion claims against required artifacts before any "done" signal; encoding durable session ledgers so each new session inherited the prior one's reasoning rather than rebuilding it.

Going forward

oddkit will continue at runtime. The AI agent driving this MCP server will load oddkit MCP separately and set its knowledge_base_url to this repository, treating canon/ as the project's knowledge base. The agent's reasoning loop becomes: search canon → understand → construct payload → submit job → poll → handle result. Each future autonomous coding run that builds, extends, or maintains this MCP server follows the same discipline — anchoring time, declaring mode, retrieving canon, preflighting before artifact work, validating against required outputs, and encoding decisions for the next session to inherit. Co-locating the code and the governance KB in this single repo prevents drift between what the system does and what the agent thinks it does.

MCP Servers