TASK ──▶ PLAN ──▶ [approve]
                  │
                  ▼
            implement ≤3
                  │
                  ▼
   QA ◀── verify ◀── run
                  │
                  ▼
              RELEASE

CALL 7.HARNESS · PLAYBOOK

[ CONTROL PLANE ]

A PRACTICAL PLAYBOOK FOR BUILDERS

THE
HARNESS

Most people use AI coding tools like smarter autocomplete. That works for tiny tasks. It breaks the moment work becomes multi-step, stateful, or spread across frontend, backend, tests, docs, and release notes. The shift is not a better model — it is wrapping the model in a repeatable system of instructions, tools, files, and verification. That system is the harness.

FREQ 7.00 · HUMANS STEER · AGENTS EXECUTE · THE REPO IS THE CONTROL PLANE

CH 000|FREQ 000.01|FAILURE MODE

WHY MOST PEOPLE USE AGENTS BADLY

NOT THE MODEL

A builder opens an AI coding tool, pastes a feature request, watches it edit twelve files, then spends an hour cleaning up the mess. The problem is usually not the model. The agent was given no structure, no memory outside the prompt, no stopping conditions, and no way to verify whether the result was correct. So the right question is no longer "what prompt should I use" — it is "what environment should I build so the agent can work safely, incrementally, and verifiably."

CH 001|FREQ 001.01|ANATOMY

WHAT A HARNESS ACTUALLY IS

FIVE PARTS · NO LAB BUDGET

PART 1

WORKING DIRECTORY

A place the agent reads and modifies — docs/ holds the task, plan, QA, and release notes as files, not chat scrollback.

PART 2

DURABLE CONTRACT

.agent/AGENTS.md — global rules, roles, guardrails, and stopping conditions that live in the repo, not in a prompt.

PART 3

SKILL LIBRARY

.agent/SKILLS/ — small procedural playbooks for recurring work. Reduce prompting, standardize output, support many product workflows.

PART 4

VERIFICATION LAYER

Tests, QA steps, and output artifacts. The agent has a real way to check whether the result is actually correct.

PART 5

STOP-AND-REVIEW LOOP

Never more than three plan steps before the human reviews. You stay in control; the agent stays incremental.

MINIMUM LAYOUT

.agent/
AGENTS.md           # global rules, roles, guardrails, stopping conditions
  SKILLS/             # reusable playbooks for recurring workflows
docs/
TASK.md             # the original task or spec
  PLAN.md             # the plan, written before any code
  QA.md               # verification notes, bugs, screenshots, test results
  RELEASE.md          # release summary, rollout + upgrade guidance
tests/ # your safety net
CHANGELOG.md          # running log of meaningful changes
README.md             # human-facing setup + context

CH 002|FREQ 002.01|THE LOOP

THE CORE LOOP

EVERY PRODUCT TYPE · SAME LOOP

Write the task         →  docs/TASK.md
Convert to a plan      →  docs/PLAN.md
Review & approve       →  status: approved
Implement a chunk      →  ≤ 3 steps, then stop
Run verification       →  tests · browser · fixtures
Write the QA report    →  docs/QA.md
Prepare release notes  →  docs/RELEASE.md

A good TASK.md is simple and concrete. The Planner converts it into a file-based plan; the human edits it; only after approval does the Implementer write code.

TASK
Add email and password login to the product.

Requirements:
- Keep existing OAuth login working.
- Add frontend form validation.
- Add backend auth endpoint.
- Add tests for successful and failed login.
- Do not change the billing flow.

CH 003|FREQ 003.01|ADAPTATION

ONE HARNESS, FOUR PRODUCT TYPES

SKELETON CONSTANT · MEANING SHIFTS

The skeleton stays the same. What changes is the meaning of implementation and verification. The harness is not one magic prompt — it is a repeatable operating model that adapts to the product surface.

TYPE A

SAAS WEB APP

MAIN CONCERNFull-stack feature delivery — routes, state, auth

VERIFICATIONBrowser checks · tests · regression coverage

KEY SKILLSPlanning · feature impl · browser QA · regression guard

TYPE B

CLI TOOL

MAIN CONCERNCommands, flags, output, errors

VERIFICATIONFixtures · snapshots · exit codes

KEY SKILLSCommand planning · impl · fixture validation

TYPE C

MCP SERVER

MAIN CONCERNTool schemas, predictable interfaces, examples

VERIFICATIONSchema validation · example calls

KEY SKILLSTool design · schema validation · usage generation

TYPE D

MOBILE APP

MAIN CONCERNScreens, navigation, state transitions

VERIFICATIONFlow QA · state handling · permissions

KEY SKILLSScreen planning · flow impl · state guard

CH 004|FREQ 004.01|OPERATE

HOW TO RUN THIS THIS WEEK

COPY-PASTE CONTROL LOOP

1 · PLAN Read docs/TASK.md. Use the correct planning skill. Write a
           detailed plan to docs/PLAN.md. Do not edit code yet.
2 · APPROVE Edit the plan, narrow scope, mark status: approved.
3 · BUILD Use the implementation skill. Execute the next 3 approved
           steps. Update tests + CHANGELOG. Stop and summarize.
4 · VERIFY Use the verification skill. Write results to docs/QA.md.
           Highlight broken states, regressions, missing coverage.
5 · SHIP Use the release skill. Read plan, QA, changelog.
           Write release + rollout notes to docs/RELEASE.md.

THE REPO IS THE CONTROL PLANE

The future of builder workflows is probably not one giant autonomous agent that does everything perfectly. It is a compact harness where humans steer, agents execute, and the repo itself becomes the control plane. A useful agent workflow does not start with a new foundation model. It starts with a folder, a few Markdown files, a clean loop, and the discipline to make the system repeatable.

THE HARNESS