Agent production engineer setup guide

How To Set Up An Agent Production Engineer

A complete operating document for building an automated production engineer: Linear-first triage, bounded evidence specialists, durable automation, approval gates, redacted operational memory, and careful production DB read access.

Status Proposed
Date 2026-07-01
Scope Setup + operations
Primary rule Evidence before code
Linear-first Read-only specialists Approval gates Operational memory No sensitive data in git
This page is the setup and operating manual. It explains what to install, what permissions to grant, which MCPs and tools the agent reads, how the runner should persist state, how specialists operate, and what must be true before the agent can implement a production fix.

Decision In One Page

Consolidate production engineering work into one authoritative agent suite centered on .agents/skills/prode-triage/SKILL.md. The suite replaces split prod-agent* skills with a single evidence-first process, bounded specialists, durable runner contract, redaction controls, and an operational-memory layer.

Problem Production tickets were easy for agents to approach from the wrong layer: code first, one evidence source, stale memory, duplicated artifacts, or incomplete handoffs.
Decision Make prode-triage the coordinator and bind it to specialist routing, a durable runner contract, PRODE Wiki memory, redaction controls, and Linear-first progress.
Invariant Current production evidence is required before implementation. Operational memory can route investigation, but it cannot prove the current incident.
Owner Model The coordinator owns the ticket and final decision; specialists own bounded evidence; the runner owns durable automation; humans own production-write approval and PR merge.
Acceptance A reviewer can reproduce the evidence path from Linear, specialist outputs, evidence bundle, PR handoff, and post-merge monitor without relying on hidden chat context.

What the suite does

Fetches Linear and live production evidence first, routes specialists, builds an evidence bundle, classifies, implements only when confidence and approvals allow, opens a PR, updates Linear, monitors after merge, and writes candidate learnings.

What the suite must not do

Treat old wiki memory as proof, read broad source code before evidence, mutate production without approval, leak customer or financial data, duplicate Linear/PR artifacts, or merge PRs by itself.

Evidence gate

Linear and live evidence precede source investigation, fix planning, and branch work.

Tool gate

The coordinator inventories MCPs, connectors, CLIs, and skills before routing specialists.

Approval gate

Production writes, PR merges, and unsafe operational actions require fresh human approval.

Memory gate

Wiki pages guide query selection and negative controls, but never count as proof.

Setup Contract

An Agent Production Engineer is not just a prompt. It is a controlled operating environment with a coordinator skill, read-only evidence tools, durable state, redaction controls, and explicit human approval gates. If any part is missing, the agent should degrade to investigation or block instead of pretending it can safely fix production.

Entry point .agents/skills/prode-triage/SKILL.md is the normative workflow. Repo entrypoints such as AGENTS.md and CLAUDE.md should point to it rather than duplicating rules.
Secrets Use secure local or platform secret storage for LINEAR_API_KEY, GITHUB_TOKEN, SENTRY_AUTH_TOKEN, AWS credentials, and production read-replica credentials. Never write secret values to prompts, docs, artifacts, git config, or issue text.
Tool map The coordinator inventories Linear, Sentry, AWS/CloudWatch, GitHub, DB read-replica, vendor, support, release, browser, emulator, and code-review tools before routing specialists.
Runner state Automated operation requires a durable store for leases, phase transitions, idempotency namespaces, artifact pointers, PR URLs, approval records, and terminal outcomes.
Human gates Production writes, deploy changes, merges, Done/Cancelled Linear transitions, risky data boundaries, and broad telemetry suppression require explicit fresh approval.

Minimum viable agent

  • Can fetch Linear tickets and comments.
  • Can read linked Sentry and production evidence.
  • Can write Linear progress and open PRs.
  • Cannot write production or merge PRs.

Minimum viable runner

  • Has ticket leases and idempotency keys.
  • Persists every phase transition.
  • Stores redacted artifact pointers.
  • Resumes instead of duplicating work.

Minimum viable safety

  • Production DB is read-only by credential.
  • Specialists run with read-only tools.
  • Redaction lint checks wiki paths.
  • Approvals expire and are action-specific.

Non-negotiable setup rule

If a required evidence tool is unavailable, the run records the missing tool and confidence impact. It must not silently substitute a weaker source and continue as though the same evidence standard was met.

Canonical Files To Install

The setup should keep the operating contract in git so humans and agents read the same rules. Local chat context is not an acceptable source of truth for production engineering.

File Purpose Setup requirement
.agents/skills/prode-triage/SKILL.md Coordinator workflow, triggers, Linear requirements, evidence gates, and handoff templates. Must be the only normative top-level workflow.
.agents/skills/prode-triage/references/specialists.md Specialist role definitions, tool boundaries, input bundle, and output shape. Must be read whenever a ticket needs source-specific investigation.
.agents/skills/prode-triage/references/automation-runner.md Durable runner contract: leases, state machine, idempotency, audit log, approvals, and post-merge monitor. Required before any automated execution.
AGENTS.md / CLAUDE.md Repo-level routing so production tickets invoke the PRODE workflow. Should point to the skill instead of copying sections that can drift.
docs/prode-wiki/ Reviewed operational memory: failure modes, query recipes, and negative controls. Must distinguish memory from current evidence.
docs/learnings/inbox.md Candidate learnings from completed runs. Automation writes candidates here, not directly into durable wiki pages.
scripts/prode-wiki-redaction-lint.sh Prevents common sensitive artifact formats and risky content from entering the wiki. Must run in CI for wiki and learning paths.

Architecture

The suite is deliberately split into one coordinator skill, specialist references, durable automation rules, and a separate operational-memory ADR.

Diagram 1: Suite Components
Inputs Coordinator contract Outputs and guardrails Linear ticket + progress log Live Evidence Sentry, AWS, DB, vendor current incident proof PRODE Wiki memory and recipes routing only prode-triage authoritative coordinator skill 1. fetch ticket and inventory tools 2. read wiki as routing context 3. route specialists and classify Evidence Bundle proof, gaps, confidence classification source Automation Runner leases, retries, artifacts durable execution wrapper Redaction Controls lint, .gitignore, review safe persisted outputs fetch first current proof memory produces may run under
Correctness check

prode-triage is shown as a coordinator boundary, not as a node fanning out to peers. Inputs flow into it; controlled artifacts flow out.

Important boundary

The dashed wiki arrow is deliberate: operational memory can influence routing, but only the evidence bundle can support classification.

Coordinator

Owns the ticket, Linear updates, classification, implementation, PR, handoff, and final decision.

Specialists

Return bounded evidence. They do not mutate Linear, GitHub, AWS, Sentry state, files, or production data.

Runner

Makes automation retry-safe through leases, idempotency keys, artifacts, and explicit terminal states.

Evidence-First Workflow

The most important design choice is the ordering. Agents do not start from code. They start from the ticket and production evidence, then consult memory, then route specialists.

Diagram 2: Ticket To Handoff
Linear fetch ticket Tools MCP inventory Wiki routing memory Specialists live evidence Bundle proof + gaps Classify 70% gate Code Path read then fix Verify tests + review PR + Linear handoff Stop / Exit BLOCKED_NEEDS_INFO APPROVAL_REQUIRED NO_CODE_CHANGE NOISE_NO_FIX VENDOR_OR_INFRA CANNOT_AUTO_FIX ESCALATED_INCIDENT SUPERSEDED / FAILED memory can change specialist routing implementation starts only after proof, confidence, and approvals missing tools lower confidence or block automation instead of silently weakening evidence
Correction made

The workflow now includes tool/MCP inventory before wiki and specialist routing, matching the suite contract.

Stop paths are first-class

The red lane names non-happy exits so automation does not collapse every ticket into "fix and PR".

Fetch

Read Linear and linked production evidence. No source-code investigation yet.

Route

Consult PRODE Wiki as memory, then route the minimum required specialists.

Prove

Build a live evidence bundle with classification, confidence, and gaps.

Deliver

Implement, verify, review, open PR, update Linear, monitor after merge.

Ticket runbook

For each PRODE ticket: fetch Linear; inventory tools; consult PRODE Wiki; route specialists; persist specialist outputs; classify with confidence; update Linear; only then read code, create a worktree, implement, verify, review, open a PR, write the handoff, await human merge, monitor, and propose learning.

Specialist Routing

Specialists are bounded roles. They gather evidence from one source or one path, return structured findings, and stop. The coordinator owns the ticket.

Specialist Phase Use when Boundary
Sentry Before code Sentry issue, crash, trace tags, production exception. Read-only Sentry.
AWS / CloudWatch Before code 5xxs, ECS/RDS/ELB/Batch, log correlation, infra symptoms. Read-only AWS allowlist.
Vendor / Integration Before code Webhook, provider, payment, KYC, banking, broker, notification failures. No provider-side mutation.
Product / Support Before code Screenshots, support reports, user-visible symptoms, expected behaviour. Ticket-linked context only.
Release / Deploy Before code Deploy spikes, app versions, feature flags, Sentry releases. Read-only release metadata.
DB Read-Replica Before code Production state must be confirmed to classify or reproduce. Configured read-only SELECT only.
Reproduction After evidence source known Behaviour can be safely reproduced outside production writes. Local, staging, emulator, browser, or read-only API paths.
Code Path After evidence bundle Root cause needs source-code grounding before implementation. Source reads only; no edits.
Review After diff PR readiness needs independent challenge. Diff review only unless explicitly delegated.

How specialists know which tools to read

Specialists do not discover tools broadly. The coordinator first inventories available MCP tools, connectors, CLIs, and skills, records the chosen tool map in the run artifact, then passes each specialist only the read-only tools relevant to its evidence source.

Linear: ticket snapshot, labels, comments, attachments, status.
Sentry: issue details, representative events, tags, releases, breadcrumbs.
AWS: STS identity, CloudWatch logs, ECS/RDS/ELB/Batch read-only evidence.
Vendor: provider status, webhook IDs, idempotency, retries, correlation IDs.
Release: deploy windows, Sentry releases, GitHub commits, feature flags.
DB: configured read-only SQL path, narrow SELECT queries only.
Reproduction: local, staging, emulator, browser, or read-only API paths.
Code path: local source reads only after the evidence bundle exists.

Automation Runner

The runner is the durable control plane. It prevents duplicate comments, branches, PRs, and half-finished hidden-context work.

Diagram 3: Runner State Bands
Preflight QUEUED -> PRECHECK -> FETCH_LINEAR -> READ_PRODE_WIKI Evidence ROUTE_SPECIALISTS -> WAIT_FOR_EVIDENCE -> CLASSIFY Delivery CREATE_WORKTREE -> IMPLEMENT -> VERIFY -> REVIEW -> OPEN_PR -> LINEAR_HANDOFF Human + Monitor AWAIT_HUMAN_REVIEW -> POST_MERGE_MONITOR -> WRITE_CANDIDATE_LEARNING -> RUN_COMPLETE Terminal: blocked, no-code, vendor/infra, escalated, superseded, failed
State grouping

The bands compress the full runner enum into operating phases; the canonical state names remain in the runner reference.

Human boundary

Automation may await review and monitor after merge, but merge and production-write approvals remain human-owned gates.

Durable by design

  • Ticket lease
  • Idempotency namespace
  • Persisted phase state
  • Audit log and artifact pointers

Stops instead of guessing

  • Confidence below 70%
  • Evidence missing
  • Approval required
  • Specialists materially conflict

Evidence Bundle Contract

The evidence bundle is the handoff between investigation and implementation. The agent cannot implement until this bundle exists, confidence is at least 70%, and required approvals are resolved.

Bundle field Required content Why it matters
Linear snapshot Title, labels, status, comments, attachments, reporter context, linked resources. Prevents the agent from solving a stale or partial version of the ticket.
Tool map Available MCPs, connectors, CLIs, skills, blocked tools, and selected routes. Shows whether the run had the right evidence sources.
Operational memory Wiki pages read, skipped, stale, or absent; query recipes and negative controls reused. Explains routing decisions without converting memory into proof.
Specialist outputs Required/skipped status, sources, findings, correlation IDs, confidence, gaps, next pivot. Makes investigation reproducible by a reviewer.
Classification Allowed classification, severity, confidence, root-cause hypothesis, and evidence gaps. Determines whether the run may fix, block, no-code, vendor-route, or escalate.
Approval state N/A, pending, approved, approver, timestamp, exact action, expiry. Prevents implicit approval from leaking across risky actions.

Implementation may start when

The bundle exists, required specialists have returned or been explicitly skipped with reason, confidence is at least 70%, approvals are satisfied, and the fix scope is narrow enough for a PR.

Implementation must stop when

Evidence is missing, tool access is blocked, specialists conflict materially, confidence is below 70%, production mutation is required, or the next action belongs to a human, vendor, or incident process.

PRODE Wiki: Memory, Not Evidence

ADR-001 introduces a Karpathy-style LLM-readable wiki for operational memory. It helps route the investigation, but live systems remain canonical.

Diagram 4: Evidence Versus Memory
Operational Memory wiki pages, prior incidents, recipes suggests routes and controls Live Evidence Linear, Sentry, AWS, DB, vendor proves or disproves the ticket Evidence Bundle classification source current facts only Decision fix, no-code, block, vendor/infra, escalate routing only proof
Memory lane

Wiki pages can supply known failure modes, query recipes, and negative controls, but they stay outside classification evidence.

Evidence lane

Current Linear, Sentry, AWS, DB, vendor, release, support, and reproduction facts feed the bundle and decision.

Hard boundary

The wiki can say "this resembles an old request-aborted pattern; check these tags and negative controls." It cannot say "this ticket is request-aborted noise because a wiki page says similar tickets were noise."

Safety And Redaction

The suite is intentionally conservative because Linear, PRs, run artifacts, and wiki pages are all potential leak surfaces.

Allowed in handoffs

  • Request IDs
  • Sentry issue IDs
  • Releases and versions
  • Log groups
  • Redacted source names

Never include

  • Secrets, tokens, cookies, auth headers
  • Customer contact details
  • Payment identifiers and KYC fields
  • Copied DB rows
  • Full request, response, or vendor payloads

Controls added

The design includes a wiki-specific .gitignore, the docs/learnings/inbox.md learning path, scripts/prode-wiki-redaction-lint.sh, and the prode-wiki-redaction GitHub Actions workflow.

Tool And Permission Model

Production engineering agents should be powerful at reading and conservative at mutating. Tool access is granted by role, phase, and evidence source rather than by convenience.

Capability Coordinator Specialists Automation runner
Linear read/write Reads and writes progress/handoff. Read supplied context only; no writes. Writes idempotent run-owned sections and comments.
Sentry Reads and records evidence; mutations require approval. Sentry specialist reads issue/events/tags only. Never resolves or ignores issues without approval.
AWS / CloudWatch Read-only evidence with verified account and region. AWS specialist uses read allowlist only. Blocks or requests approval for writes.
Production DB Read-only narrow checks when needed. DB specialist uses configured SELECT-only path. Blocks non-SELECT, locks, migrations, backfills, and side effects.
GitHub / PR Creates and updates PR after evidence gate. No branch, commit, push, or PR mutation. Creates or resumes one run-owned branch/PR.
Merge / deploy Never without explicit human instruction. Blocked. Blocked unless an explicit approval record exists; default is await human review.

Production DB Read Access

Production DB read access is part of the workflow, but it is not a blank cheque. The DB specialist confirms narrow production state only when needed to classify, reproduce, or explain a backend symptom.

Connection

Use a physically read-only role when available. If unknown, state the gap before querying.

Query shape

SELECT only, narrow predicates, explicit LIMIT, no SELECT *.

Output

Prefer counts, statuses, timestamps, and aggregate checks. Do not copy rows into handoffs.

Stop condition

If a write, lock, migration, backfill, side-effecting function, FOR UPDATE, or production mutation seems necessary, stop and request explicit human approval through the coordinator.

Rollout Plan

Phase 1: Canonical suite
Keep prode-triage authoritative, retain specialist and runner references, update repo entrypoints, and remove or wrap old prod-agent* skills.
Phase 2: Safety substrate
Add the learning inbox, wiki .gitignore, redaction lint script, and GitHub Actions workflow.
Phase 3: Specialist expansion
Keep Sentry/AWS first-class and add vendor, support, release, DB, reproduction, code-path, and review specialists.
Phase 4: Automation runner
Implement durable state, leases, idempotency, evidence bundles, approval gates, post-merge monitoring, and candidate-learning writes.
Phase 5: PRODE Wiki
Seed query recipes and failure modes, register active pages in index.md, and keep promotion behind review.
Phase 6: Hardening
Confirm physically read-only DB credentials, add platform-specific redaction patterns, and define where full evidence bundles live outside git.

Acceptance Checklist

Treat the Agent Production Engineer as ready only when these checks can be demonstrated in a dry run or low-risk ticket. A pretty diagram is not enough; the setup must survive retries, missing tools, and review.

Triggering: PRODE tickets and sentry-issue labels reliably invoke the skill.
Tool inventory: The run records available and missing MCPs/connectors/CLIs before specialists run.
Evidence gate: Source reads and implementation do not begin before Linear and required live evidence.
Specialists: Sentry, AWS, vendor, support, release, DB, reproduction, code-path, and review roles have bounded outputs.
DB safety: Production SQL uses read-only credentials and narrow SELECT queries only.
Idempotency: Retrying a run does not duplicate Linear comments, branches, PRs, or specialist artifacts.
Approval gates: Risky actions pause with exact approval records and expiry.
Redaction: Handoffs and wiki paths reject secrets, direct identifiers, raw rows, and full payloads.
Review: The PR explains evidence, fix, tests, residual risk, and does not claim more than the bundle proves.
Monitoring: Post-merge monitoring is read-only and records resolved, still recurring, or inconclusive.

Open Questions

  1. Is production DB access physically constrained to read-only credentials in every runner environment?
  2. Which exact customer/account/entity/KYC identifier patterns should redaction lint block?
  3. Where should full evidence bundles live when they are too sensitive for git but need durable retention?
  4. Who owns the review cadence for active PRODE Wiki pages?
  5. Which CI status should be required before merging PRs that touch wiki or learning paths?
  6. Which MCP/tool names should be canonical in each host environment?

Source Documents

Read the full ADRs when you want the complete detail: