How to Develop SKILL.md: A Production Guide for Engineering Teams

12 Mar 2026

How to Develop SKILL.md: A Production Guide for Engineering Teams

Most AI coding agents start every session with zero memory of your project. No context on your stack, no awareness of your conventions, no understanding of the design decisions your team made six months ago.

Every prompt you write has to carry that weight - and when you're doing it across a team of ten engineers, across dozens of tasks per day, the cost adds up fast.

This is the problem SKILL.md was built to solve.

A SKILL.md file is a structured markdown document that packages reusable instructions for AI coding agents. You write it once. The agent - Claude Code, Roo Code, OpenAI Codex, Cursor - reads it automatically, applies it when relevant, and stops asking you to explain things it should already know.

This guide is for senior engineers, AI engineers, and technical decision-makers who are ready to move beyond one-off prompts and build the kind of AI agent infrastructure that actually works under production conditions. By the end, you'll know how SKILL.md files are structured, how to write ones that hold up in real projects, and how to deploy them across an engineering organisation.

Section 1: What is SKILL.md and Why Does It Exist?

Here is the core problem: documentation is written for humans, not agents. It is spread across multiple pages, assumes varying levels of technical knowledge, and is meant to be browsed - not processed all at once. AI agents work differently. They can handle complex technical content just fine, but they cannot load your entire docs site into context and still produce reliable output.

SKILL.md sits between those two realities. It is the distilled, agent-optimised version of what the agent needs to always know when working on a specific task.

A skill is a folder containing a SKILL.md file as its entrypoint, plus optional supporting files - templates, scripts, reference documents, example outputs. The SKILL.md itself has two distinct parts:

YAML frontmatter - metadata the agent uses to decide when to activate the skill. The name field becomes the slash command (/skill-name). The description field is used for automatic matching: when your request semantically matches the description, the agent loads and applies the skill without you having to invoke it explicitly.

Markdown body - the actual instructions the agent follows once the skill is active.

.claude/skills/
└── pr-review/
    ├── SKILL.md          ← Required entrypoint
    ├── checklist.md      ← Optional reference file
    └── examples/
        └── good-review.md ← Optional example output

Skills are cross-tool. The Agent Skills open standard is supported by Claude Code, Roo Code, OpenAI Codex, and Cursor, among others. A well-written SKILL.md generally works across tools with minimal modification, which matters when your team uses more than one agent.

Section 2: SKILL.md Architecture - What Belongs Where

Understanding the anatomy of a production-ready SKILL.md prevents the most common mistakes teams make. Here is the full structure with annotations:

---
name: api-design-review         # becomes /api-design-review command
description: >
  Review REST API design for conventions, security headers, rate
  limiting, versioning, and input validation. Use when reviewing
  API route definitions, controller files, or when the user asks
  "review my API", "check this endpoint", or "audit these routes".
  Do NOT use for frontend code, database schemas, or CLI scripts.
---

The description is the most important field you will write. It controls both automatic and manual invocation. Treat it as a trigger rule, not a summary. Three things make a good description:

Positive triggers - specific phrases and task types that should activate the skill. Concrete examples of what the user might say.

Negative scope - explicit statements of what the skill does NOT cover. This prevents the agent from over-triggering and applying the skill in the wrong context.

Task framing - a clear statement of what the skill produces, so the agent understands the goal before it reads the body.

After the frontmatter, the body follows this structure:

## Inputs
What the agent should expect to receive.

## Steps
The ordered instructions the agent follows.

## Outputs
The exact format of what the agent should produce.

## Gotchas
Mistakes the agent makes repeatedly without explicit guidance.

## Escalation Rules
When the agent should stop and involve a human.

Each section has a specific job. Inputs and Outputs create consistency across runs. Steps encode your team's process. Gotchas are the highest-leverage section - they capture the delta between what the agent does by default and what your team actually needs. Escalation rules are what separates a safe production skill from a liability.

Section 3: Writing Your First Production SKILL.md - Node.js PR Review

Here is a complete, production-ready SKILL.md for a Node.js pull request review workflow. This is the type of skill MTechZilla deploys at the start of every AI-assisted development engagement.

---
name: pr-review
description: >
  Full pull request review for Node.js codebases. Covers correctness,
  test coverage, security, performance, and code style. Use when
  reviewing a PR, auditing a diff, or when the user says "review this
  PR", "check my diff", or "audit these changes". Do NOT use for
  infrastructure-as-code, database migration files, or frontend-only
  React components (use the frontend-review skill instead).
---

# PR Review - Node.js

## Inputs
- File paths or diff of changed files
- Target branch (main, staging, feature)
- PR description or issue link (optional but preferred)

## Review Steps

### 1. Correctness
- Verify logic matches the stated intent of the PR description
- Check error paths: null inputs, empty arrays, missing env variables
- Look for race conditions in async flows (unhandled Promise.all failures,
  missing await, concurrent writes to shared state)
- Confirm no breaking changes to exported interfaces without a version bump

### 2. Test Coverage
- Unit tests exist for every changed function
- Happy path and at least one error/edge case covered per function
- No tests removed without a documented reason
- Integration tests updated if API contracts changed

### 3. Security
- No secrets, tokens, or credentials hardcoded or in comments
- User input validated before it enters business logic (Zod, Joi, or equivalent)
- No SQL built via string concatenation - parameterised queries only
- No new npm packages with known CVEs (run: npm audit)
- Authentication/authorisation middleware present on all new routes

### 4. Performance
- No N+1 database query patterns introduced
- Pagination applied to any query that could return unbounded results
- No blocking operations (fs.readFileSync, heavy CPU loops) in request handlers
- New indexes added for any new query patterns on high-volume tables

### 5. Code Style
- Follows existing naming conventions in the file (camelCase for JS vars,
  PascalCase for classes)
- Functions under 40 lines; complex logic extracted with a descriptive name
- No commented-out code committed
- Error messages are descriptive and loggable (include context, not just "Error")

## Output Format

Respond in this exact structure:

**Summary**: [2-3 sentences describing what this PR does and its overall quality]

**Issues**:
1. [CRITICAL] Description - file:line - Recommended fix
2. [MAJOR] Description - file:line - Recommended fix
3. [MINOR] Description - file:line - Recommended fix

**Suggestions** (optional improvements, not blockers):
- Item

**Verdict**: APPROVE / REQUEST CHANGES / NEEDS DISCUSSION

## Gotchas

- Never recommend removing existing middleware without confirming it is safe
  to remove - it may be referenced elsewhere
- Do not suggest architectural refactors unless the PR explicitly introduces
  a design problem. Scope feedback to what changed
- Flag SQL injection and auth bypass vulnerabilities as [CRITICAL] regardless
  of how "unlikely" they seem in context
- Never approve a PR that adds a hardcoded secret, even a "placeholder" one

## Escalation Rules

- If a [CRITICAL] security issue is found: add a note that this PR should not
  merge until a human security review is completed
- If the PR modifies authentication, payment processing, or PII handling:
  recommend a compliance review before merging to production
- If the PR description is missing and the intent is unclear: ask 2-3
  clarifying questions before reviewing

This skill encodes several months of review patterns. Every rule in the Gotchas section exists because an agent made that mistake on a real project.

Section 4: Skill Invocation - Auto vs. Manual

Skills can be invoked two ways, and choosing the right one for each skill affects how useful it is in practice.

Automatic invocation happens when the agent's task semantically matches the skill's description. The agent loads and applies the skill without any slash command. This works well for skills with a narrow, clearly defined trigger - a PR review skill that fires when you say "review this PR", an API audit skill that fires when you paste route definitions.

The risk is over-triggering. An imprecise description causes the skill to activate on unrelated tasks, injecting irrelevant instructions into the agent's context. This is why the Do NOT use for... clause in your description is not optional - it is the primary guard against this.

Manual invocation uses a slash command and works well for skills that should only run when explicitly requested - a changelog generator, a database migration script template, a security audit. You invoke it directly:

/pr-review
/generate-changelog
/security-audit

For most production teams, the pattern that works is: broad, high-frequency tasks use automatic invocation with tight descriptions; specialised, less frequent tasks use manual invocation.

Section 5: Real-World Use Cases - How MTechZilla Deploys Skills in Production

Case Study 1: SaaS Platform - Consistent Code Reviews Across a Distributed Team

A SaaS startup came to MTechZilla with an inconsistent code review process. Senior engineers were thorough; junior engineers missed security checks. AI-assisted reviews were producing different outputs depending on who wrote the prompt.

The solution was a suite of three skills deployed to the team's Claude Code setup:

  • pr-review - the Node.js PR review skill shown above

  • migration-safety - a skill that checks database migration files for destructive operations, missing rollback strategies, and index creation on live tables

  • api-contract-check - a skill that catches breaking changes to REST API responses before they reach the frontend team

Within the first two weeks, the team caught a missing await in a payment processing function (flagged [CRITICAL] by the pr-review skill), three migrations that lacked rollback scripts, and one endpoint that removed a field the mobile app was still consuming.

The key architectural decision: all three skills lived in .claude/skills/ at the monorepo root, so they were available to every engineer without any individual setup.

Case Study 2: Founder Ops - Turning Repetitive Workflows into Reusable Skills

An engineering-led startup founder was spending 2–3 hours per week triaging inbound partnership emails, writing outreach responses, and updating their CRM. This is exactly the workflow that benefits from an ops-focused skill.

---
name: inbound-triage
description: >
  Classify, score, and draft a response for an inbound partnership or
  sales enquiry. Use when processing emails, form submissions, or
  messages from potential partners or clients. Do NOT use for
  investor communications or legal correspondence.
---

# Inbound Triage

## Steps
1. Classify the enquiry: sales / partnership / press / support / spam
2. Extract: company name, contact name, budget signals, urgency signals
3. Score lead quality: high / medium / low against ICP criteria below
4. Draft a response using brand voice guidelines below
5. Produce a CRM summary row

## ICP Criteria
- B2B SaaS or tech startup
- 10–200 employees, engineering team of 3+
- Budget signals: mentions of funding round, scaling, team size

## Brand Voice
- Direct and confident, not salesy
- Short paragraphs, no buzzwords
- Every response ends with a specific next step

## Output
Produce three labelled blocks:
1. Classification & Score
2. Draft Response (ready to send, minor edits expected)
3. CRM Row: Name | Company | Type | Score | Next Action

## Gotchas
- Do not invent details about the sender that aren't in the source text
- If budget signals are absent, default score to medium, not low
- Draft response must not make commitments about pricing or timelines

The result: what previously took the founder 15 minutes per enquiry now takes under two minutes to review and send.

Case Study 3: Policy & Compliance - SME Internal Documentation

A fintech SME needed to maintain a library of internal InfoSec policies aligned to ISO 27001. Policy drafting was slow, inconsistent between authors, and always slightly out of date.

---
name: policy-draft
description: >
  Draft or update an internal InfoSec or compliance policy. Use when
  asked to write, update, or review a policy, SOP, or procedure
  document. Do NOT use for contracts, legal agreements, or
  customer-facing terms of service.
---

# Policy Drafting

## Required Structure
Every policy output must include these sections in order:
1. Purpose
2. Scope (who and what it applies to)
3. Policy Statement (the core rules)
4. Procedures (numbered steps for compliance)
5. RACI (Responsible, Accountable, Consulted, Informed)
6. Change Log (version, date, author, summary)

## Quality Standards
- Reading age ≤ 12 (short sentences, active voice in procedure steps)
- Every "must" has a stated consequence or rationale
- External standards cited with version (ISO 27001:2022, not just ISO 27001)

## Escalation Rules
- Do not make definitive legal claims - flag to Legal for review
- If the policy touches PII, financial data, or audit trails: add a
  Compliance Review Required note before the Change Log
- Flag any rule that appears to contradict an existing policy in context

Section 6: Production Readiness Checklist

Before deploying a skill to your team, run it through this checklist. Skills that fail multiple checks should be revised before rollout.

Structure

  • [ ] YAML frontmatter includes name and description

  • [ ] Description contains explicit positive triggers (what should activate the skill)

  • [ ] Description contains explicit negative scope (Do NOT use for...)

  • [ ] Inputs section declares what the agent should expect

  • [ ] Outputs section specifies the exact format the agent should produce

  • [ ] All instructions are written in imperative form (Do X, not You should X)

Quality

  • [ ] Gotchas section present with at least 3 entries based on real observed failures

  • [ ] Escalation rules defined for high-stakes outputs

  • [ ] Output format is specific enough that two runs of the same task produce comparable results

  • [ ] Skill has been tested against 5+ real tasks before team rollout

Security and Compliance

  • [ ] No hardcoded secrets, tokens, internal URLs, or client-specific data in the skill file

  • [ ] PII handling instructions included if the skill touches personal data

  • [ ] Escalation to human review defined for critical or regulated outputs

  • [ ] Skill version tracked in a comment or change log inside the file

Team Governance

  • [ ] Skill lives in version control (.claude/skills/ checked into the repo)

  • [ ] PR review required to change a skill file (treat it like changing shared config)

  • [ ] A named owner is responsible for maintaining the skill

  • [ ] Skills are included in the quarterly tech debt review cycle

Section 7: Anti-Patterns That Kill Skills in Production

The Everything Skill. One SKILL.md covering PR reviews, documentation generation, database migrations, security audits, and changelog writing. Skills should be narrow and composable. One trigger, one type of output. When a skill does too much, the agent loads it for the wrong tasks and produces inconsistent results. Split broad skills into focused ones that can be used together.

A Vague Description. Writing description: helps with code quality guarantees the skill either never triggers or triggers constantly in the wrong context. The description is a semantic matching rule, not a marketing tagline. Write it with the specificity you'd use to brief a contractor: what exactly does this do, when exactly should it run, and what is explicitly out of scope.

No Gotchas Section. This is the highest-ROI section of any skill and the most commonly skipped. Without it, the agent repeats the same mistakes in every session. The gotchas section is where you capture the gap between what a generic AI response looks like and what your team's standards actually require. Start with three entries minimum. Add more after every session where the agent does something unexpected.

Skills That Never Get Updated. A SKILL.md written against your v1 stack becomes misleading by the time you've migrated to a new framework, changed your database ORM, or adopted new security standards. Skills rot. Add them to your quarterly tech review alongside dependencies and documentation. If a skill is wrong, it is worse than no skill at all - it gives the agent false confidence.

Skipping Escalation Rules. Especially critical for skills that touch security reviews, compliance documentation, financial workflows, or any output that goes to customers or regulators. Every skill that can produce a consequential output should know when to stop and ask a human. Define those boundaries explicitly, not as an afterthought.

Build It With MTechZilla

Moving from ad-hoc AI prompting to a production skills infrastructure takes more than knowing the file format. It takes a clear picture of which workflows in your organisation are actually worth systematising, the experience to write skill descriptions that don't over-trigger or under-trigger under real conditions, and the operational discipline to keep skills current as your stack and processes evolve.

At MTechZilla, this is exactly the kind of infrastructure we build for startups and growing engineering teams as part of our AI Assistant as a Service offering. We work with your team to map high-value workflows to skills, write and test the initial skill library, integrate it into your existing development process, and set up the governance model that keeps it reliable over time.

If your team is moving from AI POCs to production deployments and wants to build something that holds up under real load - get in touch.