PatternTechIndividual & TeamVerified

Loop Engineering

The layer above harness engineering: instead of prompting an agent yourself -- prompt, wait, read the diff, repeat -- you build the outer system that prompts it. A goal written to files, a trigger that is not a keystroke, fresh context each iteration, verification the agent cannot bypass, and a defined point where it stops to ask a human.

The Pattern

"I don't prompt Claude anymore. I have loops that are running. They're the ones prompting Claude and figuring out what to do. My job is to write loops." -- Boris Cherny, creator of Claude Code (source)

Loop engineering sits on top of harness engineering and context engineering: a good harness makes a single agent run well and good context keeps the right tokens in front of it -- loop engineering takes that working agent, gives it a clear goal, and iterates toward it after you walk away. Instead of prompting the agent yourself, you build the outer system that prompts it.

A production loop needs five things: a goal written into files that outlive the session, a trigger other than a keystroke (a schedule, a CI event), fresh context on each iteration, verification the agent cannot bypass, and a defined stop condition for handing back to a human (Jazz Tong). Every coding agent already runs an inner cycle -- gather context, act, verify, repeat (Anthropic); loop engineering wraps an outer, goal-seeking loop around that inner one.

The canonical minimal example is Geoffrey Huntley's "Ralph" -- while :; do cat PROMPT.md | claude-code; done -- the goal in a file, looped, one task per pass. "The technique is deterministically bad in an undeterministic world," Huntley notes; it works on eventual consistency and on tuning the loop when it drifts -- the loop, not the prompt, is what you engineer -- not on any single run being right (Ralph; everything is a ralph loop).

Why It Matters

The shift is recent and widely echoed: "you don't prompt agents anymore, you design loops that prompt them" (Akshay Pachaar). And the loop itself is the easy part -- "the loop is six lines, and nobody competes on it; every serious agent framework lands on the same tiny while-loop" -- so the engineering is everything around it: the goal, the context fed each turn, and the verification (Akshay Pachaar). It is enabled by a measurable trend -- the length of task a frontier model can complete unattended has been doubling roughly every seven months, moving another class of work from supervised to autonomous with each step (METR). The proof point everyone now cites: Cursor turned a swarm of GPT-5.2 agents loose to build a working web browser from scratch, which ran for a full week with no human intervention and produced millions of lines across thousands of files (Cursor, via Fortune).

Two engineering constraints decide whether a loop holds up. First, state must live outside the model -- in files, a progress log, and git -- because the context window is a cache that resets, not a memory; this is context engineering applied across iterations. Second, the loop cannot use the generating model as its own quality gate -- models struggle to correct their own reasoning -- so verification has to be external and mechanical: tests, contract checks, and a separate adversarial verifier with a fresh context.

The honest caveat is the obvious one: a loop running unattended is also a loop failing unattended (Akshay Pachaar). A well-built loop still keeps one person at the gate, watching the gauges and tuning the inputs. And it depends on a real spec: once the model stops being the limiting factor, the weakest part of the system becomes the precision of what you asked for.

Sources

Last reviewed: 2026-06-25