Cost Management
FinOps for agents: making agentic spend visible, attributable, and controllable. Agentic workloads are expensive in a specific way -- an agent uses ~4x the tokens of a chat turn, a multi-agent system ~15x -- so the unit that matters is cost per merged pull request, tracked by task class against the human baseline.
The Pattern
"A loop runs the meter whether or not the output ships." -- Jazz Tong (source)
Cost management is the platform discipline of making agentic spend visible, attributable, and controllable -- FinOps for agents. Agentic workloads are expensive in a specific way: an agent uses roughly four times the tokens of a chat turn and a multi-agent system about fifteen times, with a ~25:1 input-to-output ratio that means per-run cost is driven by how much context accumulates, not by what the agent writes (Tokenomics).
The levers live mostly at the model router: prompt caching (cached reads at a fraction of base input), model tiering (cheap models on triage, expensive ones on hard reasoning), and per-task metering. But the discipline is its own concern -- the number worth reporting is cost per merged pull request, tracked by task class against the fully loaded human baseline (Jazz Tong).
Why It Matters
Without metering, agent spend is invisible until the bill arrives, and you cannot tell which task classes pay for themselves. With it, autonomy becomes an economic decision: classes where the loop beats human cost earn more autonomy; classes where it loses get narrowed or moved down a model tier. As fleets grow, cost management is what keeps "let it run" from quietly becoming the largest line in the budget.
Sources
Last reviewed: 2026-06-24