PatternProcessOrganizationalAI draft

Calculating ROI

Putting an honest number on whether AI-enabled engineering pays off -- net time gained after the cost of review, rework, and tokens, not gross speed-up. The honest version is contested: perceived gains and measured gains often diverge.

The Pattern

ROI here means putting an honest number on whether AI-enabled engineering pays off -- net value after the cost of review, rework, training, and tokens, not the gross speed-up of the first draft. The unit that travels is outcome per merged, shipped unit of work, tracked by task class against a human baseline, rather than anecdote, vibes, or lines of code generated.

The headline difficulty is that credible measurements disagree, often by task. A controlled METR study found experienced open-source developers were about 19% slower with early-2025 AI tools on familiar codebases -- even though they reported feeling faster -- because time spent reviewing and correcting AI output outweighed the gains. A separate Google randomized study of 96 engineers found task completion time dropped roughly 21%. Both can be true: AI helps sharply on some task classes and hurts on others, so a single org-wide "productivity number" is almost always misleading.

The honest version also accounts for the full cost stack. The vendor measurement firm Jellyfish estimates the true cost of AI coding tools typically lands at 2-3x the subscription fee once you add usage/token fees, training time, an adoption-phase productivity dip, governance, and tool sprawl -- and warns that lines-of-code and velocity/story-point metrics are too easy to game to mean anything. Treat such vendor estimates as directional, but the direction -- license cost is a fraction of total cost -- is consistent across practitioner accounts.

Why It Matters

This is genuinely contested, and the gap between perceived and measured gains is the core trap. METR's developers felt faster while measuring slower; surveys amplify the same bias. IBM's "Race for ROI" study reports 66% of EMEA enterprise leaders claiming significant AI productivity gains and 92% expecting agentic ROI within two years -- but that is self-reported executive sentiment, not measured delivery, and should be read as mood, not evidence. ROI claims built on feeling are unreliable in both directions.

What survives scrutiny: measure team-level outcomes, not individual scores (individual metrics get gamed instantly); use a control group where you can (Jellyfish's own Copilot research compared 133 users to 750 non-users at the same firms to attribute cycle-time changes); pair quantitative delivery metrics with qualitative feedback so you catch teams that look fast but are burning out untangling AI output; and segment by cohort, because a 30% average can hide juniors gaining 70% while seniors go 15% slower. Expect a 2-3 sprint lag before delivery metrics move at all.

The honest caveat is that some of the largest reported gains come from changing how the team works, not from the tool -- moving leverage upstream to specs and verification, as practitioners scaling agents describe -- which makes attribution genuinely hard. The discipline is to measure net outcomes by task class, account for downstream costs, resist a single vanity number, and accept that for some work the honest answer is that AI does not yet pay off. See driving adoption and cost management for the levers that move these numbers.

Sources

Last reviewed: 2026-06-25