mainframe-mcp

A Model Context Protocol (MCP) plugin for Claude Code that gives an AI assistant safe, structured access to an IBM z/OS mainframe — read screens, browse datasets, edit code, submit jobs, and run regression tests, all through natural-language instructions in the Claude Code terminal.

Target user: A developer with a Claude Code terminal, a Python workstation, and access to a TN3270-reachable mainframe.

Project status — v1.0.0

All five build phases are code complete with comprehensive test coverage. Live verification against the reference host (IBM ADCD z/OS 1.10 at 147.93.154.32:23, userid ADCDC) is in the state below.

| Phase | Code | Unit/offline tests | Live verification | |---|---|---|---| | Phase 1 — Read-only TN3270 MCP server | ✅ | included in 260-test suite | ✅ live-verified | | Phase 2 — Dataset reads (ISPF + FTP transports) | ✅ | included | ✅ live-verified | | Phase 3 — Write operations + three-tier autonomy | ✅ | included | ✅ live-verified 8/8 acceptance (tag phase-3-live-verified) | | Phase 4 — Job submission + monitoring | ✅ | 40 unit tests | 🟡 1/12 live-verified (gate refusal). 11 remaining cases deferred — see §14.1 | | Phase 5 — Testing automation | ✅ | 51 unit tests + 23/23 offline sweep | 🟡 1 live-runbook case deferred — see §14.1 |

260 unit tests passing. Three real safety bugs found and fixed during Phase 3 live verification: rc-None tolerance (would have falsely reported write success on JES2-purged spool), bare-word "ABEND" false-positive (would have frozen the plugin when a member name happened to be "ABEND"), and the JES2/RACF jobname-prefix rule (would have silently rejected every internally-generated job). All three are pinned by regression tests.

Deferred items are environmental, not code blockers. The remaining 11 Phase 4 cases and the 1 Phase 5 live runbook case require an uninterrupted ADCD login window that the public reference host has not consistently provided. The plugin code paths under those cases are pinned by 91 unit/offline tests; the deferred work is purely a host-availability question. See §14.1 for the saturation observation and §14.9 for the bootstrap-residue quirk discovered during the partial Phase 4 sweep.

Future work tracked in §14.10 "Acknowledged future work" below.

How to use this document (Claude Code, read this first)

This README is both the human-facing project description and the authoritative build specification for Claude Code. When the user asks you to "build Phase N" or "continue the implementation," do the following:

Re-read the relevant phase section in full before writing any code.
Honor the Safety Model section unconditionally. It is not aspirational — every write tool you create must enforce these rules in code, not just in docstrings.
Follow the file structure under "Project Structure" exactly. New files go where this document says they go.
Match the tool signatures in the "Tool Surface Specification" section. Tool names, parameters, return types, and module assignments are fixed.
At the end of each phase, run the acceptance criteria checklist. Do not move to the next phase until the current one passes.
When in doubt, ask the user. Do not invent new tools, change the safety model, or skip phases without explicit confirmation.

If any instruction in this document conflicts with a verbal user request, surface the conflict and ask for clarification rather than silently choosing one.

Project goals
Target environment
Tech stack
Architecture
Safety model
Project structure
Configuration
Implementation phases
Tool surface specification
Coding conventions
Testing and verification
Distribution
Appendices

1. Project goals

What this plugin does

When fully built, mainframe-mcp lets a Claude Code user accomplish mainframe development tasks by talking to Claude in plain English:

"Log in to the mainframe and show me what's on the screen."
"List the COBOL members in ADCDB.SOURCE.COBOL."
"Read PAY001 and explain what it does."
"There's an S0C7 in PAY001. Find the bug and fix it." (with confirmation step)
"Submit the compile JCL and tell me when it finishes."
"Run the regression test suite against the payroll transaction."

Six capabilities, built in order

Read — screens, PDS members, datasets, job output. (Phase 1–2)
Write — edit existing members, create new ones, with Git versioning. (Phase 3)
Audit — analyze code for issues, propose fixes via three-tier autonomy. (Phase 3, refined later)
Submit & monitor — JCL submission, job status, SYSOUT retrieval. (Phase 4)
Navigate — keyboard-only ISPF/CICS navigation through Claude's instruction. (Phase 1+)
Test — automated regression testing of screen flows and batch jobs. (Phase 5)

Non-goals

Real-time multi-user collaboration. Single-user, single-session.
Production deployment. This is a learning/portfolio project against ADCD.
Offensive security tooling. No brute force, no fuzzing, no protected-field tampering.
Bypassing mainframe security. Every action is taken as the logged-in user with their normal RACF permissions.

2. Target environment

Mainframe

| Property | Value | |----------|-------| | Type | IBM ADCD (Application Developer's Controlled Distribution) | | z/OS version | 1.10 | | Host | 147.93.154.32 (public learning instance) | | TN3270 port | 23 (plain — no TLS available) | | Authentication | Plain RACF userid/password | | Default test user | ADCDB / TEST (publicly documented defaults) | | z/OSMF available? | No (z/OS 1.10 predates z/OSMF) | | File transfer | Plain FTP (z/OS FTP server on standard port 21) |

Workstation

| Property | Value | |----------|-------| | OS | Windows 11 (primary), should also work on Linux/macOS | | Python | 3.11.9 (confirmed installed) | | Working directory | C:\mainframe-mcp | | Virtual environment | .venv inside the working directory | | 3270 client | wc3270 (Windows distribution of x3270) — provides s3270.exe | | AI client | Claude Code (Anthropic CLI) |

Why these choices

ADCD over real corporate mainframe: no compliance constraints, free to experiment.
Plain FTP over Zowe REST: z/OS 1.10 has no z/OSMF, so Zowe's REST mode is unavailable. FTP is built into z/OS and works against any version.
p3270 over py3270: py3270 has not been updated in over a year and is effectively unmaintained. p3270 is actively maintained, has a cleaner API, and wraps the same s3270 binary.

3. Tech stack

Exact versions

Python              >= 3.11.0
mcp                 >= 1.27.0      # Official MCP SDK with FastMCP
p3270               latest          # TN3270 wrapper around s3270
keyring             latest          # OS credential store
pyyaml              latest          # Config file parsing
# Standard library: ftplib, sqlite3, subprocess, pathlib, logging

System requirements

s3270.exe (from wc3270) on PATH
git on PATH
Node.js + @anthropic-ai/claude-code for the AI client

Why each dependency

| Dependency | Purpose | Alternatives considered | |------------|---------|--------------------------| | mcp | MCP protocol implementation | fastmcp standalone — rejected, extra dep with no benefit for this scope | | p3270 | TN3270 client | py3270 (unmaintained), robotframework-mainframe3270 (kept as Phase 5 option for full test scripts) | | keyring | Credential storage | .env file (rejected — secrets in plaintext) | | pyyaml | Config | TOML (acceptable alternative, YAML chosen for familiarity) | | ftplib (stdlib) | File transfer to z/OS | Zowe CLI (not available without z/OSMF) | | sqlite3 (stdlib) | Audit log | JSON lines (rejected — harder to query) |

4. Architecture

Diagram

┌─────────────────────┐
│   Claude Code CLI   │
│   (your terminal)   │
└──────────┬──────────┘
           │ stdio / JSON-RPC (MCP protocol)
           │
   ┌───────┴────────┐
   │                │
┌──┴──────────┐  ┌──┴───────────┐  ┌──────────────┐
│ read MCP    │  │ write MCP    │  │ test MCP     │
│ (always on) │  │ (opt-in)     │  │ (opt-in)     │
└──┬──────────┘  └──┬───────────┘  └──┬───────────┘
   │                │                  │
   │                │                  │
   ├────────────────┴──────────────────┤
   │                                   │
   ▼                                   ▼
┌──────────────────┐         ┌──────────────────┐
│  s3270 process   │         │ FTP client       │
│  (TN3270 stream) │         │ (file transfer)  │
└────────┬─────────┘         └────────┬─────────┘
         │ port 23                    │ port 21
         │                            │
         └────────────┬───────────────┘
                      ▼
         ┌────────────────────────┐
         │  ADCD z/OS 1.10        │
         │  147.93.154.32         │
         │  (TSO, ISPF, JES, PDS) │
         └────────────────────────┘

Three MCP servers, not one

The plugin is implemented as three separate MCP server processes, each registered independently with Claude Code:

mainframe-read — always loaded. Read-only tools: screens, datasets, job output, navigation (keystrokes that don't write data).
mainframe-write — loaded only when MAINFRAME_MODE=WRITE is set. Modifies PDS members, submits jobs, deletes things. All operations gated by safety rules.
mainframe-test — loaded only when MAINFRAME_MODE=TEST is set. Captures baselines, runs test scripts, compares results.

Why three instead of one:

The write tools literally do not exist in the read server, so they cannot be called by accident.
Different sessions can run different modes (read-only by default, write only when intentional).
Smaller per-server tool surfaces are easier for the AI to use correctly.

Session model

Each server process maintains one persistent TN3270 session to the mainframe. Sessions are:

Opened on first tool call (lazy connect) or explicitly via connect().
Health-checked before every tool call via a fast ping.
Auto-reconnected on broken pipe (handles mainframe-side timeouts).
Closed cleanly on server shutdown via atexit.

Code-writing pattern (important)

When Claude edits a member, the flow is:

1. read_member() pulls PDS member → workspace/COBOL/PAY001.cbl
2. Claude reads the local file, proposes changes
3. edit_member() writes new content → workspace/COBOL/PAY001.cbl
4. Auto-stage in git: `git add workspace/COBOL/PAY001.cbl`
5. Show diff to user; require confirmation for sensitive datasets
6. Upload to PDS via FTP: STORE workspace/COBOL/PAY001.cbl → ADCDB.SOURCE.COBOL(PAY001)
7. Commit: `git commit -m "Claude: <description>"`

From the user's perspective, the code appears in their ISPF emulator after a refresh (F5). No copy-paste. Git is the safety net — every change is versioned locally before it touches the mainframe.

5. Safety model

These rules are non-negotiable. They are enforced in code, not just in documentation. Tools that violate these rules must refuse to execute even if the AI passes "correct"-looking arguments.

Three-tier autonomy

Every potentially-modifying action is classified into one of three tiers:

| Tier | Behavior | Examples | |------|----------|----------| | Auto-apply | Tool executes, logs the action, returns result. | Whitespace fixes, comment additions, lint fixes, JCL syntax corrections. | | Confirm-then-apply | Tool returns a diff and waits. Requires the user to issue an explicit confirmation tool call. | Logic changes, new validation rules, JCL parameter changes. | | Suggest-only | Tool writes a proposal file to proposals/ and returns a summary. No mainframe write happens. User applies manually if desired. | Anything touching financial calculations, security exits, regulated/audit code. |

Implementation rule: The tier of each operation is determined by a deterministic function classify_change(target, change_type) in safety/classifier.py, NOT by the AI's self-reported confidence.

Hard rules (always enforced)

Read-only by default. The write server only loads when MAINFRAME_MODE=WRITE is explicitly set AND the --allow-writes flag is passed at startup.
Dataset scope is enforced. Any write tool calls permissions.check_write_allowed(dataset) as its first line. If the dataset matches any pattern in scope.forbidden_patterns or doesn't match any in scope.allowed_datasets, the tool raises PermissionDenied and refuses.
Sensitive patterns are read-only forever. Patterns like SYS1.*, *.LINKLIB, *.PROD.* (configurable) cannot be written to even in write mode.
Snapshot before write. Before any modification of a PDS member, the current content is downloaded to workspace/<dataset>/<member> and committed to git. If git commit fails, the write does not proceed.
Abend stops automation. If check_for_errors() detects an abend on the current screen, the next 5 tool calls auto-refuse with a message asking the user to investigate. (Prevents runaway loops on broken sessions.)
Audit log every tool call. Every invocation of any tool — read or write — writes a row to logs/audit.sqlite with timestamp, userid, tool name, arguments (passwords redacted), and outcome.
Rate limits. Default 60 tool calls per minute, configurable. Excess calls block until the window clears.
No credential exposure. Passwords are read from OS keychain. They never appear in tool arguments, return values, log output, or screen captures.

Escalation protocol

When a tool would land in confirm-then-apply tier, it returns a structured response Claude shows to the user:

PROPOSED CHANGE: edit_member ADCDB.SOURCE.COBOL(PAY001)
TIER: confirm-then-apply
REASON: Modifies logic inside PROCEDURE DIVISION (not whitespace/comments only).

DIFF:
- 0010     MOVE EMP-ID TO WS-LOOKUP-KEY
+ 0010     IF EMP-ID IS NUMERIC
+ 0011         MOVE EMP-ID TO WS-LOOKUP-KEY
+ 0012     ELSE
+ 0013         DISPLAY 'INVALID EMP-ID'
+ 0014         GOBACK
+ 0015     END-IF

CONFIRM by calling: confirm_change(token="<token>")
REJECT by ignoring or calling: reject_change(token="<token>")

The token is single-use and expires in 5 minutes.

What this means for the AI assistant

Claude should:

Always identify the dataset being targeted before any write.
Surface diffs to the user before confirming any change.
Stop and ask the user when uncertain about a fix's impact.
Never claim a change was applied if a tool returned PermissionDenied or RequiresConfirmation.

6. Project structure

mainframe-mcp/
├── README.md                       # This file (the build spec)
├── LICENSE                         # AGPL-3.0
├── pyproject.toml                  # Package metadata + dependencies
├── requirements.txt                # Pinned deps for reproducibility
├── .gitignore                      # Excludes .venv, logs/, secrets, workspace/
├── .env.example                    # Template env file for users
├── config.example.yaml             # Template config
│
├── src/mainframe_mcp/
│   ├── __init__.py
│   ├── config.py                   # YAML + env var loader, keyring access
│   ├── audit.py                    # SQLite audit log (append-only)
│   │
│   ├── core/
│   │   ├── __init__.py
│   │   ├── session.py              # Persistent TN3270 session wrapper around p3270
│   │   ├── ftp_client.py           # FTP wrapper for dataset transfer
│   │   ├── screen_parser.py        # Row-numbered formatting, signature detection
│   │   └── exceptions.py           # PermissionDenied, RequiresConfirmation, etc.
│   │
│   ├── safety/
│   │   ├── __init__.py
│   │   ├── permissions.py          # check_write_allowed(), scope rules
│   │   ├── classifier.py           # classify_change() — three-tier logic
│   │   ├── abend.py                # check_for_errors(), abend code list
│   │   └── rate_limiter.py         # Token-bucket rate limiting
│   │
│   ├── servers/
│   │   ├── __init__.py
│   │   ├── read_server.py          # mainframe-read MCP server
│   │   ├── write_server.py         # mainframe-write MCP server
│   │   └── test_server.py          # mainframe-test MCP server
│   │
│   └── tools/
│       ├── __init__.py
│       ├── screen_tools.py         # get_screen, find_text, get_text_at, ...
│       ├── nav_tools.py            # send_enter, send_pf, wait_for_text, ...
│       ├── dataset_tools.py        # list_datasets, read_member, ...
│       ├── job_tools.py            # submit_jcl, check_job, fetch_sysout, ...
│       ├── write_tools.py          # edit_member, create_member, ...
│       └── test_tools.py           # capture_baseline, run_test, ...
│
├── workspace/                      # Local git repo of downloaded PDS members
│   └── .gitkeep
│
├── proposals/                      # Suggest-only tier writes proposals here
│   └── .gitkeep
│
├── logs/                           # SQLite audit log + debug logs
│   └── .gitkeep
│
├── tests/                          # pytest suite
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_config.py
│   ├── test_session.py             # uses recorded screens, no live mainframe
│   ├── test_permissions.py
│   └── fixtures/
│       └── screens/
│
└── docs/
    ├── INSTALL.md                  # User-facing install guide
    ├── USAGE.md                    # Example Claude Code prompts
    └── DEVELOPMENT.md              # How to extend / contribute

7. Configuration

Sources of truth, in priority order

Command-line flags (highest priority)
Environment variables
config.yaml in working directory
Built-in defaults (lowest priority)

`config.example.yaml`

mainframe:
  host: 147.93.154.32
  port: 23
  tls: false
  model: "3279-2"
  code_page: cp037                 # ADCD default
  s3270_path: "C:\\Program Files\\wc3270\\"

session:
  connect_timeout: 30              # seconds
  keystroke_delay: 0.3             # seconds between sends
  screen_change_timeout: 10        # default wait timeout

scope:
  # Allowed dataset patterns (write tools refuse anything not matching)
  allowed_datasets:
    - "ADCDB.*"
    - "STUDENT.*"
  # Forbidden patterns (never written, even if otherwise allowed)
  forbidden_patterns:
    - "SYS1.*"
    - "SYS2.*"
    - "*.LINKLIB"
    - "*.LPALIB"
    - "ADCD.*"                     # ADCD's own system datasets
  # Read-only-mandatory patterns (subset of forbidden, more explicit)
  read_only_patterns:
    - "*.PROD.*"

safety:
  default_mode: READ               # READ | WRITE | TEST
  require_confirmation_token_minutes: 5
  abend_lockout_calls: 5           # auto-refuse N calls after abend
  audit_log_path: "./logs/audit.sqlite"
  audit_retention_days: 30

rate_limits:
  calls_per_minute: 60
  burst: 10

git:
  workspace_path: "./workspace"
  auto_commit: true
  commit_author_name: "mainframe-mcp"
  commit_author_email: "noreply@local"

logging:
  level: INFO                      # DEBUG | INFO | WARNING | ERROR
  path: "./logs/mainframe-mcp.log"

Environment variables

| Variable | Purpose | Required | |----------|---------|----------| | MAINFRAME_HOST | Override mainframe.host | No | | MAINFRAME_USERID | RACF userid for this session | Yes | | MAINFRAME_MODE | READ | WRITE | TEST | No (defaults to READ) | | MAINFRAME_CONFIG | Path to config.yaml | No (defaults to ./config.yaml) | | MAINFRAME_DEBUG | 0 | 1 | No |

Secret storage

The RACF password is stored in the OS keychain via the keyring library:

import keyring
keyring.set_password("mainframe-mcp", os.environ["MAINFRAME_USERID"], password)

Set it once with a small helper script (scripts/set_password.py); never type it in code, prompts, or config files.

7a. Known environmental requirements (host-side)

The plugin runs against IBM z/OS hosts in general, but certain JES2 / TSO / RACF defaults are not universal. Both items below were discovered during live verification on the ADCD z/OS 1.10 instance at 147.93.154.32:23 and apply to any host with similar JES2 / RACF configuration. Plugin code handles them; this section documents them so future operators (and future maintainers) know what to expect.

7a.1 JOB-card jobname must start with the submitter's userid

ADCD's RACF JES2 rule rejects any job whose JOB-card jobname does not begin with the userid running the submit, with:

IKJ56328I JOB <jobid> REJECTED - JOBNAME MUST BE YOUR USERID
                                 OR MUST START WITH YOUR USERID

The rejection happens at JCL conversion time, after JES2 has already assigned a job id, so the symptom in logs is a submitted job that reaches NOT FOUND status almost immediately and never produces a step-execution spool. Code handling:

For internally-generated jobnames (make_jobname() used by the IEBGENER / IEBUPDTE write transport and by Phase 3/4 verify bootstrap), the userid is passed in and prefixed onto the jobname. See core/jcl_writer.py and tests/test_jcl_writer.py userid cases.
For user-supplied JCL members (Phase 4 submit_jcl(pds_member)), the JCL author is responsible for the inner JOB-card jobname. Use a jobname that starts with your userid (e.g. //ADCDCH JOB ... for userid ADCDC). The plugin cannot edit user JCL on submit.

7a.1a TSO STATUS/OUTPUT/CANCEL require jobname = userid + 1 char

A stricter, related rule surfaced during Rung-1 live verification (bug #8). Even when a jobname starts with the userid, TSO STATUS / OUTPUT / CANCEL on ADCD only operate on a job whose jobname is the userid plus exactly ONE character. A longer suffix (e.g. ADCDC + 3 chars = ADCDCHW or ADCDC123) is accepted by JES2 at submit — IKJ56250I ... SUBMITTED — but then those TSO commands reject it with the same IKJ56328I ... JOBNAME MUST BE YOUR USERID OR MUST START WITH YOUR USERID message, so job monitoring (check_job / fetch_sysout / cancel_job) silently fails to find a job that actually ran.

Why it was masked until Rung-1: write verification uses LISTDS (member existence), not TSO OUTPUT — so the write path (create_member / edit_member) never depended on a queryable jobname. Only the Phase 4 monitoring path, exercised end-to-end for the first time in Rung-1, hits the TSO STATUS/OUTPUT restriction. JES2's aggressive spool purge (§7a.2) further hid it in earlier partial runs by making OUTPUT return nothing regardless.

Fix: make_jobname(userid) now emits userid + exactly 1 char (36-value 0-9A-Z suffix), total ≤ 8 — e.g. ADCDC → ADCDCH. The userid portion is truncated to 7 for long userids so userid+1 never exceeds the 8-char slot. User-supplied JCL members must follow the same rule to be monitorable. Pinned by test_jcl_writer.py userid+1char cases.

7a.2 Aggressive JES2 spool purge on z/OS 1.10

ADCD's JES2 purges job spool quickly after job termination — often before the plugin can issue its OUTPUT jobid retrieval command, even when the request follows the ON OUTPUT QUEUE state immediately. The plugin handles this with a two-layer verification contract (see §14.8 "Write verification semantics"): host-state read-back via member_exists() is the ground truth for write operations, and the JES2 RC is treated as best-effort diagnostic only. Operators should not be alarmed by note: JES spool was unreadable; success confirmed by read-back probe messages — that is the verified-by-host-state success path.

7a.3 Long-lived TSO sessions and ADCD's session table

A (dirty) disconnect leaves a TSO session in the dangling state. ADCD reclaims dangling sessions on its default TSO timeout (typically 15-30 min) or via the operator console F TSO,USER=<userid>,LOGOFF. The plugin's force_cleanup() tool tries an in-band recovery first; if multiple orphans accumulate (e.g. during an iterative debug session), the session table can saturate and new logons return IKJ56425I LOGON rejected, UserId <userid> already logged on. Wait for the timeout or use the console MODIFY command — there is no other in-band escape.

8. Implementation phases

Each phase is independently shippable. Do not start a phase until the prior phase's acceptance criteria pass.

Phase 0 — Environment verification (already done)

Confirmed:

Python 3.11.9 installed
wc3270 / s3270.exe to be installed and verified by user
Network reach to 147.93.154.32:23 confirmed via VISTA emulator
Claude Code installed and signed in

Acceptance criteria: Smoke test script (provided separately) prints the ADCD welcome screen to stdout when run.

Phase 1 — Read-only TN3270 MCP server

Goal: Claude Code can log in to the mainframe, read screens, navigate ISPF, and report what it sees.

Deliverables:

Project skeleton. Create directory structure under src/mainframe_mcp/ exactly as specified in section 6. Empty __init__.py files where needed.
pyproject.toml with declared dependencies.
config.py — load YAML, layer env vars, expose typed config object.

audit.py — SQLite-backed append-only log with schema:

CREATE TABLE audit (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp TEXT NOT NULL,
  userid TEXT NOT NULL,
  tool_name TEXT NOT NULL,
  arguments_json TEXT NOT NULL,   -- passwords redacted
  outcome TEXT NOT NULL,           -- 'ok' | 'error' | 'refused'
  details TEXT
);

core/session.py — MainframeSession class wrapping p3270.P3270Client:
- connect() — opens s3270 subprocess, connects to host.
- login(userid) — fetches password from keyring, sends VTAM logon sequence, waits for TSO READY.
- ensure_connected() — health-check + auto-reconnect.
- disconnect() — clean shutdown.
- get_raw_screen() — current screen as 24x80 text.
- send(text) — type into current field.
- send_aid(key) — send ENTER/PFn/PAn/CLEAR.
- wait_for_text(pattern, timeout) — poll until match or timeout.
- wait_for_change(timeout) — poll until screen differs.
core/screen_parser.py — format_screen() returns row-numbered text matching hack3270's {i+1:2}| {line} format; identify_screen() checks for known signatures and returns a screen ID.
safety/abend.py — list of abend codes (S0C7, S0C4, ASRA, AICA, AEY7, APCT, ASRB, AEXL, DFHAC2, ABEND, plus generic Sxxx and Uxxxx patterns). Function detect_abend(screen_text) -> Optional[str].
safety/rate_limiter.py — token-bucket implementation, blocking variant.
tools/screen_tools.py — get_screen, find_text, get_text_at, analyze_screen_fields, check_for_errors.
tools/nav_tools.py — send_enter, send_pf (1-24), send_pa (1-3), send_clear, send_keys, wait_for_text, wait_for_screen_change.
servers/read_server.py — FastMCP("mainframe-read", instructions=...). Register all read + nav tools. Define _ensure_connected() helper that wraps every tool. Include connect, login, disconnect, status as session-management tools.

Acceptance criteria:

claude mcp add mainframe-read -- python -m mainframe_mcp.servers.read_server succeeds.
In Claude Code, asking "connect to the mainframe as ADCDB and tell me what's on the screen" results in Claude calling connect, login, then get_screen, and returning a sensible description of the ADCD welcome banner.
Asking "navigate to ISPF option 3.4" results in Claude sending ISPF, ENTER, 3.4, ENTER, and reporting the dataset list panel.
Audit log logs/audit.sqlite contains rows for every tool call with no plaintext passwords.
Disconnecting and reconnecting works without restarting the MCP server.

Phase 2 — Dataset reads (dual transport: ISPF scrape + FTP)

Goal: Browse PDS contents and read members. Two independent transports are implemented because most public ADCD instances only expose TN3270 (port 23) — the FTP port is firewalled and unreachable. The active transport is chosen by config.dataset.transport.

| Transport | When to use | Trade-offs | |-----------|-------------|------------| | ispf (default) | Any reachable host; works against the public ADCD instance | Slower, scrapes ISPF 3.4 and View panels via TN3270, brittle to ISPF panel changes | | ftp | Local Hercules / corporate z/OS where port 21 is open | Fast, structured responses, but requires reachable z/OS FTP server |

If transport: ftp is configured and the FTP control port is unreachable AND dataset.fallback_to_ispf: true (default), the dispatcher logs a warning and silently falls back to the ISPF scrape transport for the remainder of the process. Set fallback_to_ispf: false to hard-fail instead.

Deliverables:

core/ftp_client.py — MainframeFTP class:
- connect() — uses same host as TN3270, port 21, same RACF credentials.
- list_datasets(pattern) / list_members(pds) / read_member(pds_member) / get_dataset_info(name).
- Module-level is_ftp_reachable(host, port, timeout) for the dispatcher's probe.
core/ispf_dataset.py — IspfDatasetAdapter exposing the same surface via ISPF 3.4 navigation and View-panel scraping. Reuses the existing MainframeSession.
core/dataset_transport.py — DatasetDispatcher resolves the active backend on first use, performs the FTP reachability probe, and applies the fallback policy.
tools/dataset_tools.py — list_datasets, list_members, read_member, get_dataset_info. All read-only; no scope restrictions on reads.
Register tools in servers/read_server.py.

Acceptance criteria:

With transport: ispf (default) against the live ADCD host:
- "List datasets matching ADCDC.*" returns at least the user's own datasets.
- "List the members of ADCD.Z110S.PROCLIB" returns a member list.
- "Read PROCLIB(ISPFPROC) and explain it" returns the JCL text.
- Reading a non-existent member returns a clean ERROR: member not found: string, not a crash.
With transport: ftp against a host with port 21 open, the same prompts work via FTP.
With transport: ftp against a host with port 21 closed and fallback_to_ispf: true, the dispatcher logs a warning and the same prompts still succeed via the ISPF path.

Phase 3 — Write operations with Git + FTP + three-tier autonomy

Status: ✅ Code complete and live-verified against ADCD z/OS 1.10 at 147.93.154.32:23 with userid ADCDC (commit-tagged phase-3-live-verified). 239 unit tests pass. 8/8 live acceptance checks pass — see PHASE3_READINESS.md for the completion record and the per-case results.

Goal: Claude can edit code, with full safety scaffolding.

Deliverables:

safety/permissions.py — check_write_allowed(dataset) -> None | raise PermissionDenied. Matches against scope.allowed_datasets, scope.forbidden_patterns, scope.read_only_patterns. Tested exhaustively.
safety/classifier.py — classify_change(target, change_type, diff) -> Tier. Returns one of AUTO, CONFIRM, SUGGEST. Initial rules:
- SUGGEST if target matches any pattern in safety.suggest_only_patterns (configurable list including security exits, audit logging code).
- AUTO if diff is whitespace-only OR comment-only OR JCL syntax fix.
- Everything else → CONFIRM.
Git integration utilities — wrap subprocess.run("git ...") calls in core/git_helper.py. Functions: init_workspace_repo(), stage_file(), commit(message), diff(file), snapshot_before_write(dataset, member).
Confirmation token system — safety/tokens.py. Generate short tokens, store in-memory with 5-minute TTL, single-use.
tools/write_tools.py — edit_member, create_member, delete_member, confirm_change, reject_change, propose_change (suggest-only entry point).
servers/write_server.py — separate FastMCP("mainframe-write") server. Refuses to start unless MAINFRAME_MODE=WRITE AND --allow-writes flag are both present. Loads ALL read tools (shared) PLUS write tools.

Acceptance criteria:

✅ Read-mode session: asking Claude to edit anything results in "I don't have write tools available; please restart in write mode." (gate refusal verified at server start AND live: write_server exits 2 with REFUSED when MAINFRAME_MODE=READ).
✅ Write-mode session: editing a member in allow-listed scope works end-to-end (read → diff → confirm → JCL submit → host read-back verify → git commit), member updated on mainframe. Verified live against ADCDC.MCPTEST.SOURCE(NEWMEM1) on ADCD.
✅ Editing a member in SYS1.* is refused with PermissionDenied regardless of confirmation. Verified live (SYS1.PARMLIB(IEFSSN00) rejected before any host contact).
✅ Whitespace-only edits skip confirmation (auto-tier). Verified live.
✅ Logic edits return a diff + token; calling confirm_change(token) applies it. Verified live.
✅ Workspace git history shows one commit per applied change. Verified live (6 Claude commits in workspace after C/D/E run).

Phase 3 added a third write transport, jcl, that submits inline JCL via TSO SUBMIT * for hosts where FTP port 21 is firewalled. The default is now dataset.transport: jcl; ftp is available when port 21 is reachable. The plugin also supports both standard IBM ISPF/PDF (ADCD z/OS) and Rocket RFE (TK4-/TK5 MVS 3.8j) panel variants via signature-based detection — see core/ispf_dataset.py _DSLIST_TITLE_MARKERS / _DSLIST_FIELD_LABELS.

Phase 4 — Job submission and monitoring

Goal: Compile and run code.

Deliverables:

tools/job_tools.py — submit_jcl(pds_member), check_job(job_id), fetch_sysout(job_id, ddname="ALL"), list_my_jobs(), cancel_job(job_id) (write-mode only, confirm tier).
JCL submission is WRITE-tier because it consumes mainframe resources, but checking job status and reading SYSOUT is READ-tier.
Job IDs are tracked in a session-local list so Claude can refer to "the last job."

Status: Code complete, 249 unit tests passing. Live verification partial: 1/12 acceptance cases verified (READ-mode gate refusal); the remaining 11 cases are deferred pending ADCD session-table availability. See §14.1 for the saturation observation that drove the deferral.

Acceptance criteria:

✅ Submitting an ALLOCATE JCL succeeds and returns a job ID. (Code verified via 40 test_job_runner / test_job_tools unit cases; one live submit succeeded in the partial Phase 4 sweep.)
⏸ Checking status returns "RUNNING" then "OUTPUT" then return code. (Code: JobRunner.status returns raw TSO STATUS labels per §14.6; unit-tested. Live deferred.)
⏸ Fetching SYSOUT of a failed compile shows the error messages. (Code: fetch_sysout returns spool body + RC header; unit-tested. Live deferred.)
⏸ An abend in SYSOUT triggers detect_abend and is highlighted. (Code: _apply_change + fetch_sysout trip abend_state.trip() on abend codes; the bare-word path now requires a structural context marker so screen-scrollback occurrences of "ABEND" (e.g. a job whose jobname happens to be ABEND) don't false-trip. 12 new guard tests in test_abend.py. Live deferred.)

Phase 5 — Testing automation

Status: ✅ Code complete and offline-verified. scripts/phase5_verify.py passes 23/23 acceptance checks (offline / mock-session). Live ISPF flow capture deferred to a combined Phase 4+5 live sweep when ADCD session table is available.

Goal: Regression-test mainframe applications.

Deliverables:

tools/test_tools.py (5 tools, all READ-tier with respect to mainframe state — record/replay does not write to the host):
- capture_baseline(test_name, description="", ignore_patterns="") — appends one frame (last action + current screen) to the named baseline. Optional comma-separated regex ignore_patterns are merged into the baseline on every call (deduped, invalid regex silently skipped).
- run_test(test_name) — replays each recorded action and diffs the live screen against the recorded screen, honoring the baseline's ignore_patterns. Returns PASS: or FAIL: with a per-frame row diff.
- compare_screens(actual, baseline_name) — ad-hoc diff against the LAST frame of a stored baseline.
- list_baselines(), delete_baseline(name, confirm_token) — delete is two-call CONFIRM-tier (irreversible).
Baselines stored as YAML in tests/baselines/<test_name>.yaml (one file per test, human-diffable in git). Schema includes optional top-level ignore_patterns: [<regex>, ...] for masking.
servers/test_server.py — separate FastMCP("mainframe-test") server, loaded only when MAINFRAME_MODE=TEST. Audit log entries from this server are tagged so test-mode activity is separable from production read/write traffic.

Diff algorithm: line-by-line screen comparison after right-strip. When the baseline carries ignore_patterns, each regex match is replaced with same-length spaces in BOTH screens before the comparison — so e.g. a time-of-day field that drifts between runs doesn't surface as a diff. The returned diff tuples carry the original (unmasked) row text so operators see exactly what changed.

Acceptance criteria:

✅ Recording a 3-screen ISPF navigation flow creates a baseline file. (verified offline in scripts/phase5_verify.py case A.)
✅ Replaying it against the same target produces PASS:. (case B.)
✅ Replaying against an intentionally-changed screen produces a clear diff and FAIL:. (case C, with the changed token surfacing in the row diff output.)
✅ ignore_patterns mask a time-of-day field so an expected between-run drift is excluded; real divergence still surfaces. (case D — bonus criterion added per Phase 5 environmental note.)

9. Tool surface specification

Naming convention

All tools use snake_case. Group prefixes are NOT used in the function name (FastMCP exposes functions by their Python name). Grouping is done via module organization and clear docstrings.

Read server tools (always available)

Session management

def connect() -> str
def login(userid: str) -> str
def disconnect() -> str
def status() -> str
def reconnect() -> str

Screen reading

def get_screen() -> str
def find_text(pattern: str) -> str
def get_text_at(row: int, col: int, length: int = 80) -> str
def analyze_screen_fields() -> str
def identify_screen() -> str            # returns known screen_id or "UNKNOWN"
def check_for_errors() -> str           # abend detection

Navigation

def send_enter() -> str
def send_pf(number: int) -> str         # 1..24
def send_pa(number: int) -> str         # 1..3
def send_clear() -> str
def send_keys(text: str) -> str
def wait_for_text(pattern: str, timeout: float = 10.0) -> str
def wait_for_screen_change(timeout: float = 10.0) -> str

Datasets (read only)

def list_datasets(pattern: str) -> str
def list_members(pds: str) -> str
def read_member(pds_member: str) -> str
def get_dataset_info(dataset: str) -> str

Jobs (status only)

def list_my_jobs() -> str
def check_job(job_id: str) -> str
def fetch_sysout(job_id: str, ddname: str = "ALL") -> str

Write server tools (write mode only)

All of the read tools, PLUS:

def edit_member(pds_member: str, new_content: str, description: str) -> str
def create_member(pds_member: str, content: str, description: str) -> str
def delete_member(pds_member: str, confirm_token: str) -> str
def submit_jcl(pds_member: str) -> str
def cancel_job(job_id: str, confirm_token: str) -> str

def confirm_change(token: str) -> str
def reject_change(token: str) -> str
def list_pending_changes() -> str
def propose_change(target: str, description: str, proposed_diff: str) -> str

Test server tools (test mode only)

All read tools, PLUS:

def capture_baseline(test_name: str, description: str = "") -> str
def run_test(test_name: str) -> str
def compare_screens(actual: str, baseline_name: str) -> str
def list_baselines() -> str
def delete_baseline(test_name: str, confirm_token: str) -> str

Return value conventions

All tools return str. Even when returning structured data, format it as readable text. FastMCP exposes the string to the AI.
Action tools that modify state return the resulting screen (formatted, row-numbered). Example: send_enter() returns the new screen, not "ok."
Errors return a string starting with ERROR: followed by a short explanation, NOT a raised exception (raised exceptions are auto-converted by FastMCP but lose context).
Refusals return REFUSED: prefix with reason. Examples: REFUSED: dataset SYS1.PARMLIB matches forbidden pattern, REFUSED: write tools not loaded in READ mode.
Confirmation prompts return CONFIRM: prefix followed by diff + token instructions.

10. Coding conventions

General

Python 3.11+ features OK. Use match/case, type aliases, dataclass(slots=True).
Type hints on every function. FastMCP uses them to generate tool schemas.
Docstrings on every tool. First line is a short description (becomes the tool description in MCP). Following lines explain arguments and return value. Claude reads these — write them for an AI audience.
No bare except:. Catch specific exceptions; let real bugs surface.
Logging, not print. Use the logging module configured in mainframe_mcp/__init__.py.

Tool function pattern

Every tool follows this skeleton:

@mcp.tool()
def some_action(arg1: str, arg2: int = 0) -> str:
    """One-line description for the AI.

    Longer explanation if needed. Mention edge cases the AI should know about.

    Args:
        arg1: What this is.
        arg2: What this is, default 0.
    """
    try:
        rate_limiter.acquire()
        session = ensure_connected()
        audit.log_call("some_action", {"arg1": arg1, "arg2": arg2})

        # ... actual work ...

        audit.log_outcome("ok")
        return format_result(result)

    except PermissionDenied as e:
        audit.log_outcome("refused", str(e))
        return f"REFUSED: {e}"
    except Exception as e:
        audit.log_outcome("error", str(e))
        logger.exception("some_action failed")
        return f"ERROR: {e}"

Imports

Standard library imports first, then third-party, then local. Each group alphabetized.
Use relative imports within the package: from ..core.session import MainframeSession.

File size

Aim for ≤ 400 lines per file.
If a tools module grows past that, split by feature group (e.g., nav_tools.py → nav_aid_tools.py + nav_wait_tools.py).

11. Testing and verification

Unit tests (pytest)

Located in tests/.
Mock the MainframeSession for most tests using recorded screen captures.
Permission tests (test_permissions.py) must be exhaustive — every pattern category gets tests for matching and non-matching cases.
Run with pytest -v from project root.

Integration tests

Live tests that hit the real ADCD mainframe. Marked with @pytest.mark.live.
Skipped by default; run with pytest -m live.
Use a dedicated test dataset prefix (ADCDB.MCPTEST.*) so tests don't conflict with hand-driven work.

Manual acceptance per phase

Each phase has its acceptance criteria listed in section 8. Run through them with Claude Code before declaring the phase done. Document any deviations in docs/CHANGELOG.md.

Recommended verification flow for Claude Code

After implementing a phase:

Run unit tests: pytest -v
Register MCP server with Claude Code (if not already): claude mcp add ...
Restart Claude Code session.
Run through the natural-language acceptance prompts from section 8.
Inspect audit log to confirm clean records.

12. Distribution

GitHub repository

Public repo: github.com/<your-username>/mainframe-mcp
License: GNU AGPL-3.0 (Affero General Public License v3). Copyright (C) 2026 Sagar Kanithi kanithisagar@gmail.com. Strong copyleft — any use, modification, or network deployment must release Corresponding Source under the same license. See LICENSE.
Branch protection: optional for a personal project, recommended for any shared use.

What users need to install

Documented in docs/INSTALL.md. Summary:

Clone the repo.
Install Python 3.11+ and wc3270 (Windows) or x3270 (Linux/Mac).
Create venv and install requirements.
Copy config.example.yaml → config.yaml, edit host/scope.
Set userid env var and password in keyring.
Register MCP servers with Claude Code.

Configurability for other mainframes

The plugin is designed so that other users can target their own mainframe by changing config.yaml:

Different host/port → just change those values.
Different code page (e.g., cp1140 for European EBCDIC) → change mainframe.code_page.
Different scope rules → edit scope.allowed_datasets and friends.
TLS-enabled mainframe → set mainframe.tls: true and mainframe.port: 992.

Defaults in config.example.yaml point at ADCD because that's the assumed development target. Production users override.

Optional: list in the MCP server registry

After the plugin is stable, optionally submit a PR to github.com/modelcontextprotocol/servers to list it in the community registry. Not required.

13. Appendices

A. Abend code reference

| Code | Meaning | Typical cause | |------|---------|---------------| | S0C1 | Operation exception | Invalid instruction (uninitialized branch target) | | S0C4 | Protection exception | Out-of-bounds memory access | | S0C7 | Data exception | Bad numeric data (non-numeric in numeric field) | | S322 | Time limit exceeded | Job ran too long | | S806 | Module not found | Missing load module in STEPLIB | | S913 | Security violation | RACF denial | | U4038 | Language Environment | COBOL runtime error | | ASRA | CICS abend (program check) | Same family as S0C* | | AICA | CICS abend (transaction timeout) | Loop or wait too long | | AEY7 | CICS abend (no authorization) | Resource not authorized | | APCT | CICS abend (program not found) | PROGRAM not in PPT | | DFHAC2206 | CICS message | Transaction abnormally terminated |

B. ADCD-specific notes

Default datasets begin with ADCD.* and SYS1.*. These are system datasets — do not write.
User-allocated datasets typically begin with the userid (ADCDB.* for user ADCDB).
ISPF is started by typing ISPF at the TSO READY prompt.
TSO LOGON is automatic upon TN3270 connection if LOGON is the application.
The ADCD welcome banner shows documented default credentials — these are not secrets.

C. Common ISPF panels and their signatures

| Screen ID | Signature (row, text) | Description | |-----------|----------------------|-------------| | TSO_READY | (24, "READY") | TSO command prompt | | ISPF_PRIMARY | (1, "ISPF Primary Option Menu") | ISPF main menu | | ISPF_DSLIST | (1, "Data Set List Utility") | Option 3.4 dataset list | | ISPF_EDIT | (1, "EDIT") | Editing a member | | ISPF_BROWSE | (1, "BROWSE") | Browsing a member | | SDSF_HOME | (1, "Display Filter View Print Options") | SDSF main panel |

Add more as discovered during Phase 1 work; store in safety/screens.py.

D. Useful references

ADCD documentation: http://dtsc.dfw.ibm.com/adcd.html
x3270 / wc3270 documentation: https://x3270.org
MCP specification: https://modelcontextprotocol.io
z/OS FTP user's guide (IBM Knowledge Center)
p3270 library: https://github.com/mstiri/p3270
hack3270 (reference for screen-handling patterns, NOT offensive tools): https://github.com/gglessner/hack3270

E. Glossary

| Term | Meaning | |------|---------| | ADCD | Application Developer's Controlled Distribution (IBM's z/OS for developer learning) | | AID | Attention Identifier (any key that sends data to host: ENTER, PFn, PAn, CLEAR) | | CICS | Customer Information Control System (online transaction processing) | | EBCDIC | Extended Binary Coded Decimal Interchange Code (mainframe character encoding) | | ISPF | Interactive System Productivity Facility (mainframe TUI/IDE) | | JCL | Job Control Language (batch job specification) | | JES | Job Entry Subsystem (batch scheduler) | | MCP | Model Context Protocol (the standard this plugin implements) | | PDS | Partitioned Data Set (mainframe "folder" containing members) | | RACF | Resource Access Control Facility (z/OS security manager) | | SDSF | System Display and Search Facility (job spool viewer) | | TSO | Time Sharing Option (interactive z/OS user environment) | | TN3270 | Telnet 3270 (the wire protocol for IBM terminals) | | z/OSMF | z/OS Management Facility (REST API for z/OS — not available on z/OS 1.10) |

14. Known deviations from this spec

Items where the running code intentionally diverges from this build specification. Every entry was either explicitly user-approved during implementation or is a strict superset of the spec behaviour.

14.1 Live verification status

The target host is ADCD z/OS 1.10 at 147.93.154.32:23. Phases 1, 2, and 3 are live-verified against this host as userid ADCDC. Phase 3 acceptance pass (8/8) is captured by the git tag phase-3-live-verified. 249 unit tests pass. Phase 5 acceptance sweep passes 23/23 offline via scripts/phase5_verify.py (no host required for record/replay logic).

Phase 4 is code-complete; 1/12 live acceptance cases verified (READ-mode gate refusal). Remaining 11 cases require an unblocked ADCDC login window. Deferred because ADCD's TSO session table saturated during iterative live-verification attempts — see §14.1.1.

14.1.1 Saturation pattern observed during Phase 4 verify

Each (dirty) disconnect during iterative debugging leaves a TSO session in the dangling state. ADCD reclaims dangling sessions on its default TSO timeout (15-30 min). When several debug iterations happen in quick succession (e.g. fixing a script bug, re-running, fixing another, re-running again), the orphan-creation rate outpaces the timeout-reclaim rate. Once the session table for the userid is saturated, every new LOGON returns IKJ56425I LOGON rejected, UserId <userid> already logged on and the in-band force_cleanup() + Reconnect=S path can't keep up.

Mitigation: space verification runs across hours, not minutes. Clean disconnect at the end of each run. If saturation occurs, only two paths clear it: wait for the host timeout, or operator-console F TSO,USER=<userid>,LOGOFF. The plugin code is correct in all paths observed; the saturation is purely a host-side state problem.

Phase 4 live verify will be re-attempted in a single combined sweep with Phase 5 once the host has at least 4 hours of uninterrupted quiet time. Until then, the 1/12 + offline coverage stands.

14.2 Three write transports (FTP + JCL + ISPF) instead of FTP-only

The §4 architecture diagram shows a single FTP path for writes. The running code adds a second write path:

dataset.transport: jcl (current default) — submits inline IEBGENER / IEBUPDTE JCL via TSO SUBMIT *. Used when port 21 is firewalled. Implemented in core/jcl_writer.py and routed by core/dataset_transport.py.
dataset.transport: ftp — the spec-default path. Still implemented and tested via unit fixtures. Switch back with one config line when an FTP-reachable host is in play.
dataset.transport: ispf — read-only TN3270 panel scraping. Reads only; writes via this transport raise NotImplementedError.

The default was flipped to jcl because every host we attempted during build had FTP port 21 blocked. Restore the spec default by setting dataset.transport: ftp in config.example.yaml once an FTP-reachable target is verified.

14.3 ISPF panel-flavor support

The spec assumes IBM ISPF/PDF (ADCD z/OS 1.10) panel chrome. The running code additionally detects Rocket RFE chrome (used by the TK4-/TK5 MVS 3.8j distribution) via signature: see _DSLIST_TITLE_MARKERS and _DSLIST_FIELD_LABELS in core/ispf_dataset.py. Default ADCD code path is unchanged; RFE support is purely additive.

Likewise core/session.py _drive_to_tso_logon_panel accepts either the full TSO/E LOGON panel (z/OS) or a single-line ENTER CURRENT PASSWORD FOR <userid> prompt (base TSO, MVS 3.8j) as a password-ready state.

These were added during troubleshooting against a TK5 instance that turned out to be the wrong target. They cost nothing to keep and make the plugin work against any reasonably modern MVS/z/OS host.

14.4 Extra modules not listed in §6

| Path | Purpose | |---|---| | core/dataset_transport.py | FTP/ISPF/JCL dispatcher with reachability probe + fallback | | core/ispf_dataset.py | ISPF panel-based dataset reader (used by ispf/jcl transports) | | core/jcl_writer.py | TSO SUBMIT * + IEBGENER/IEBUPDTE + STATUS/OUTPUT/CANCEL primitives | | core/job_runner.py | Phase 4 orchestration: history dequeue + abend marking | | safety/abend_state.py | Per-process abend lockout counter (split from abend.py) | | tools/session_tools.py | connect / login / disconnect / status / reconnect / force_cleanup | | runtime.py | Process-wide singleton holding session, audit, tokens, jobs, git |

All implementation detail. None change tool behaviour or safety guarantees relative to the spec.

14.5 Extra tool not listed in §9

force_cleanup() is registered on both read and write servers. Drives the robust-disconnect protocol to release a dangling host session and, on failure, surfaces a one-line operator instruction for the Hercules console MODIFY command. Added after a third reproduction of IKJ56425I LOGON REJECTED IN USE during build.

14.6 Job status label naming

Phase 4 acceptance text says status returns "RUNNING" then "OUTPUT". The running code returns the raw TSO STATUS command labels (EXECUTING, ON OUTPUT QUEUE, ON INPUT QUEUE, ON HOLD QUEUE, NOT FOUND, UNKNOWN). Functionally identical states; label strings follow the host wire format rather than the README's prose. If README-exact labels are required, alias them in JclWriter.status().

14.7 Extra root-level docs

PHASE3_READINESS.md — Phase 3 completion record (acceptance status, fixed bugs, live-verification result).
LIVE_VERIFY_PORTABILITY.md — host-swap guide (code page, scope patterns, env vars).
config.test-case9.yaml — TTL=1min config for the token-expiry acceptance case.

All explicitly user-requested during build. Consider moving under docs/ to match the §6 layout in a future cleanup pass.

14.8 Write verification semantics (host state is ground truth)

The §5 safety model requires every applied write to leave an audit trail proving the host state changed. The implementation enforces this via a read-back probe after submit, not by trusting JES2 return codes. Design rationale:

ADCD z/OS 1.10 + JES2 aggressively purges job spool between job completion and the plugin's OUTPUT retrieval command. The plugin frequently observes status=NOT FOUND with no parseable RC, even though the submitted utility step ran to completion.
Trusting "RC=0 in spool → success" would let a purged-spool job falsely report failure. Trusting "no spool found → failure" would reject successful writes. Neither is the contract callers expect.
The fix: after every write submit, member_exists() (a TSO LISTDS / panel probe) checks the host directly. Member present after a create/edit → ✅ OK. Member absent after a delete → ✅ OK. Anything else → ❌ ERROR + git reset --hard to the pre-write SHA.

State table for the post-submit verify path:

| JES outcome | Host probe | Tool returns | |---|---|---| | RC = 0 | member present | OK: applied | | RC = 0 | member absent | ERROR: WriteVerificationFailed + git rollback | | RC ≠ 0 | not consulted | ERROR: RC=<n> (short-circuit) | | Abend (S0C7, U4038, …) | not consulted | ERROR: ABENDED <code> + lockout trip | | RC unparseable (spool purged) | member present | OK: applied + note: JES spool was unreadable; success confirmed by read-back probe | | RC unparseable (spool purged) | member absent | ERROR: WriteVerificationFailed + git rollback + JES detail: <message> | | RC unparseable | probe raises | ERROR: WriteVerificationFailed: verification probe raised: <err> + git rollback |

Implemented in tools/write_tools.py:_apply_change / _apply_delete / confirm_change. Pinned by tests/test_write_verification.py (11 cases including all four spool/probe permutations). Live-verified on ADCD: all three write cases (C/D/E) in the Phase 3 acceptance sweep took the RC unparseable + member present → OK branch because ADCD's JES2 purges before the OUTPUT retrieval. Reporting was correct in every case.

This is the safety-critical contract: the plugin never reports a write succeeded without positive evidence the host state changed. Equally important, it never reports failure when the host state did change — that would force a retry that could double-apply the write.

14.9 Bootstrap residue (Phase 4 verify)

A second pattern observed during Phase 4 partial live verify: an interrupted verification run can leave ADCDC.MCPTEST.SOURCE in an inconsistent state — the dataset exists in the catalog but its expected member set is empty (or partial). The Phase 4 verify script treats "PDS exists" as "skip allocation, populate via IEBUPDTE PARM=NEW", which then races against the JES2-purge cycle and yields POST-SUBMIT VERIFY FAIL — missing: [...] even though the IEBUPDTE job itself may have completed successfully.

Symptoms:

bootstrap_pds reports "PDS already exists; skipping allocation."
bootstrap_members IEBUPDTE submit returns rc=None abend=None (spool purged before retrieval).
member_exists() ground-truth check shows all expected members absent.

Plugin behaviour is correct: the read-back verify refuses to report success without positive evidence, exactly as §14.8 requires. The fault is in the verify script's bootstrap path — cleanup_pds is fire-and-forget (no RC verification), and the IEBUPDTE submit doesn't tolerate "PDS exists but empty".

Mitigation (not implemented in v1.0.0 — tracked in §14.10):

Run-ID-suffixed PDS names (ADCDC.MCPTEST.SOURCE.R20260527) so each verify run is hermetic.
cleanup_pds should verify the dataset is actually scratched before returning.
Bootstrap should detect "PDS exists but empty" and switch to IEBUPDTE PARM=MOD / REPL or scratch-and-reallocate before populating.

For now: manually delete ADCDC.MCPTEST.SOURCE (via wc3270 ISPF 3.4, or by a console operator) between Phase 4 verify runs, or wait for ADCD to reclaim it on its own.

14.10 Acknowledged future work

Tracked items not in v1.0.0 scope; none are correctness blockers, all are quality-of-life or completeness improvements for the live verification surface.

| Item | Why | |---|---| | Phase 4 live verify completion (11 remaining cases) | Requires uninterrupted ADCD login window. Plugin code is unit-tested for every case. | | Phase 5 live runbook ISPF banner case | Same — requires live host. The mechanism is offline-verified 23/23. | | cleanup_pds RC verification | Currently fire-and-forget. Should poll the catalog post-submit to confirm scratch happened, mirroring the write-tools verify-via-read-back pattern. | | Run-ID-suffixed bootstrap PDS names | Eliminates §14.9 residue entirely. ADCDC.MCPTEST.R<timestamp>.SOURCE per run. | | Idempotent bootstrap (PDS-exists-empty case) | Detect partial state, use IEBUPDTE PARM=MOD with ./ REPL for existing names. | | Reconnect=S precise field positioning | The current 10-tab approach is fragile across panel variants. moveTo(row, col) against discovered field coordinates would be more robust. | | README §14.6 status-label normalization | Optional alias to map raw TSO STATUS labels to the README's prose ("RUNNING"/"OUTPUT") if README-exact wording is required. |

These are tracked here rather than as GitHub issues because this project is shipping as a personal/internal release and the issue tracker isn't in active use. Subsequent maintainers should promote these to issues if iteration continues.

Implementation kickoff checklist

Before Claude Code writes its first line, verify:

[ ] Phase 0 environment setup is complete and smoke test passes.
[ ] You've read this README in full once.
[ ] You understand the three-tier autonomy and read-only-by-default model.
[ ] You know the project name is mainframe-mcp and the target is ADCD z/OS 1.10.
[ ] You're working in C:\mainframe-mcp with .venv activated.
[ ] Git is initialized; first commit is this README + .gitignore + LICENSE.

When all six are checked, begin Phase 1.

End of build specification.

mainframe-mcp

Project status — v1.0.0

How to use this document (Claude Code, read this first)

Table of contents

1. Project goals

What this plugin does

Six capabilities, built in order

Non-goals

2. Target environment

Mainframe

Workstation

Why these choices

3. Tech stack

Exact versions

System requirements

Why each dependency

4. Architecture

Diagram

Three MCP servers, not one

Session model

Code-writing pattern (important)

5. Safety model

Three-tier autonomy

Hard rules (always enforced)

Escalation protocol

What this means for the AI assistant

6. Project structure

7. Configuration

Sources of truth, in priority order

config.example.yaml

Environment variables

Secret storage

7a. Known environmental requirements (host-side)

7a.1 JOB-card jobname must start with the submitter's userid

7a.1a TSO STATUS/OUTPUT/CANCEL require jobname = userid + 1 char

7a.2 Aggressive JES2 spool purge on z/OS 1.10

7a.3 Long-lived TSO sessions and ADCD's session table

8. Implementation phases

Phase 0 — Environment verification (already done)

Phase 1 — Read-only TN3270 MCP server

Phase 2 — Dataset reads (dual transport: ISPF scrape + FTP)

Phase 3 — Write operations with Git + FTP + three-tier autonomy

Phase 4 — Job submission and monitoring

Phase 5 — Testing automation

9. Tool surface specification

Naming convention

Read server tools (always available)

Session management

Screen reading

Navigation

Datasets (read only)

Jobs (status only)

Write server tools (write mode only)

Test server tools (test mode only)

Return value conventions

10. Coding conventions

General

Tool function pattern

Imports

File size

11. Testing and verification

Unit tests (pytest)

Integration tests

Manual acceptance per phase

Recommended verification flow for Claude Code

12. Distribution

GitHub repository

What users need to install

Configurability for other mainframes

Optional: list in the MCP server registry

13. Appendices

A. Abend code reference

B. ADCD-specific notes

C. Common ISPF panels and their signatures

D. Useful references

E. Glossary

14. Known deviations from this spec

14.1 Live verification status

14.1.1 Saturation pattern observed during Phase 4 verify

14.2 Three write transports (FTP + JCL + ISPF) instead of FTP-only

`config.example.yaml`

安装包（如果需要）