WORKING DIRECTORY
A place the agent reads and modifies — docs/ holds the task, plan, QA, and release notes as files, not chat scrollback.
TASK ──▶ PLAN ──▶ [approve]
│
▼
implement ≤3
│
▼
QA ◀── verify ◀── run
│
▼
RELEASE Most people use AI coding tools like smarter autocomplete. That works for tiny tasks. It breaks the moment work becomes multi-step, stateful, or spread across frontend, backend, tests, docs, and release notes. The shift is not a better model — it is wrapping the model in a repeatable system of instructions, tools, files, and verification. That system is the harness.
A builder opens an AI coding tool, pastes a feature request, watches it edit twelve files, then spends an hour cleaning up the mess. The problem is usually not the model. The agent was given no structure, no memory outside the prompt, no stopping conditions, and no way to verify whether the result was correct. So the right question is no longer "what prompt should I use" — it is "what environment should I build so the agent can work safely, incrementally, and verifiably."
A place the agent reads and modifies — docs/ holds the task, plan, QA, and release notes as files, not chat scrollback.
.agent/AGENTS.md — global rules, roles, guardrails, and stopping conditions that live in the repo, not in a prompt.
.agent/SKILLS/ — small procedural playbooks for recurring work. Reduce prompting, standardize output, support many product workflows.
Tests, QA steps, and output artifacts. The agent has a real way to check whether the result is actually correct.
Never more than three plan steps before the human reviews. You stay in control; the agent stays incremental.
A good TASK.md is simple and concrete. The Planner converts it into a file-based plan; the human edits it; only after approval does the Implementer write code.
The skeleton stays the same. What changes is the meaning of implementation and verification. The harness is not one magic prompt — it is a repeatable operating model that adapts to the product surface.
The future of builder workflows is probably not one giant autonomous agent that does everything perfectly. It is a compact harness where humans steer, agents execute, and the repo itself becomes the control plane. A useful agent workflow does not start with a new foundation model. It starts with a folder, a few Markdown files, a clean loop, and the discipline to make the system repeatable.