PaperChaseLabs
PAPERCHASE LABS
  TASK ──▶ PLAN ──▶ [approve]
                  │
                  ▼
            implement ≤3
                  │
                  ▼
   QA ◀── verify ◀── run
                  │
                  ▼
              RELEASE
CALL 7.HARNESS · PLAYBOOK
[ CONTROL PLANE ]
A PRACTICAL PLAYBOOK FOR BUILDERS

THE
HARNESS

Most people use AI coding tools like smarter autocomplete. That works for tiny tasks. It breaks the moment work becomes multi-step, stateful, or spread across frontend, backend, tests, docs, and release notes. The shift is not a better model — it is wrapping the model in a repeatable system of instructions, tools, files, and verification. That system is the harness.

FREQ 7.00 · HUMANS STEER · AGENTS EXECUTE · THE REPO IS THE CONTROL PLANE
CH 000|FREQ 000.01|FAILURE MODE

WHY MOST PEOPLE USE AGENTS BADLY

NOT THE MODEL

A builder opens an AI coding tool, pastes a feature request, watches it edit twelve files, then spends an hour cleaning up the mess. The problem is usually not the model. The agent was given no structure, no memory outside the prompt, no stopping conditions, and no way to verify whether the result was correct. So the right question is no longer "what prompt should I use" — it is "what environment should I build so the agent can work safely, incrementally, and verifiably."

CH 001|FREQ 001.01|ANATOMY

WHAT A HARNESS ACTUALLY IS

FIVE PARTS · NO LAB BUDGET
PART 1

WORKING DIRECTORY

A place the agent reads and modifies — docs/ holds the task, plan, QA, and release notes as files, not chat scrollback.

PART 2

DURABLE CONTRACT

.agent/AGENTS.md — global rules, roles, guardrails, and stopping conditions that live in the repo, not in a prompt.

PART 3

SKILL LIBRARY

.agent/SKILLS/ — small procedural playbooks for recurring work. Reduce prompting, standardize output, support many product workflows.

PART 4

VERIFICATION LAYER

Tests, QA steps, and output artifacts. The agent has a real way to check whether the result is actually correct.

PART 5

STOP-AND-REVIEW LOOP

Never more than three plan steps before the human reviews. You stay in control; the agent stays incremental.

MINIMUM LAYOUT
.agent/ AGENTS.md # global rules, roles, guardrails, stopping conditions SKILLS/ # reusable playbooks for recurring workflows docs/ TASK.md # the original task or spec PLAN.md # the plan, written before any code QA.md # verification notes, bugs, screenshots, test results RELEASE.md # release summary, rollout + upgrade guidance tests/ # your safety net CHANGELOG.md # running log of meaningful changes README.md # human-facing setup + context
CH 002|FREQ 002.01|THE LOOP

THE CORE LOOP

EVERY PRODUCT TYPE · SAME LOOP
1 Write the task → docs/TASK.md 2 Convert to a plan → docs/PLAN.md 3 Review & approve → status: approved 4 Implement a chunk → ≤ 3 steps, then stop 5 Run verification → tests · browser · fixtures 6 Write the QA report → docs/QA.md 7 Prepare release notes → docs/RELEASE.md

A good TASK.md is simple and concrete. The Planner converts it into a file-based plan; the human edits it; only after approval does the Implementer write code.

TASK Add email and password login to the product. Requirements: - Keep existing OAuth login working. - Add frontend form validation. - Add backend auth endpoint. - Add tests for successful and failed login. - Do not change the billing flow.
CH 003|FREQ 003.01|ADAPTATION

ONE HARNESS, FOUR PRODUCT TYPES

SKELETON CONSTANT · MEANING SHIFTS

The skeleton stays the same. What changes is the meaning of implementation and verification. The harness is not one magic prompt — it is a repeatable operating model that adapts to the product surface.

TYPE A

SAAS WEB APP

MAIN CONCERNFull-stack feature delivery — routes, state, auth
VERIFICATIONBrowser checks · tests · regression coverage
KEY SKILLSPlanning · feature impl · browser QA · regression guard
TYPE B

CLI TOOL

MAIN CONCERNCommands, flags, output, errors
VERIFICATIONFixtures · snapshots · exit codes
KEY SKILLSCommand planning · impl · fixture validation
TYPE C

MCP SERVER

MAIN CONCERNTool schemas, predictable interfaces, examples
VERIFICATIONSchema validation · example calls
KEY SKILLSTool design · schema validation · usage generation
TYPE D

MOBILE APP

MAIN CONCERNScreens, navigation, state transitions
VERIFICATIONFlow QA · state handling · permissions
KEY SKILLSScreen planning · flow impl · state guard
CH 004|FREQ 004.01|OPERATE

HOW TO RUN THIS THIS WEEK

COPY-PASTE CONTROL LOOP
1 · PLAN Read docs/TASK.md. Use the correct planning skill. Write a detailed plan to docs/PLAN.md. Do not edit code yet. 2 · APPROVE Edit the plan, narrow scope, mark status: approved. 3 · BUILD Use the implementation skill. Execute the next 3 approved steps. Update tests + CHANGELOG. Stop and summarize. 4 · VERIFY Use the verification skill. Write results to docs/QA.md. Highlight broken states, regressions, missing coverage. 5 · SHIP Use the release skill. Read plan, QA, changelog. Write release + rollout notes to docs/RELEASE.md.

THE REPO IS THE CONTROL PLANE

The future of builder workflows is probably not one giant autonomous agent that does everything perfectly. It is a compact harness where humans steer, agents execute, and the repo itself becomes the control plane. A useful agent workflow does not start with a new foundation model. It starts with a folder, a few Markdown files, a clean loop, and the discipline to make the system repeatable.