PatternProcessAcross TeamsAI draft

AI-Code Provenance

Knowing which code an agent wrote -- and recording what produced it (which agent, model, prompt or spec) -- so review, metrics, and compliance can treat it accordingly. Tag commits, PRs, or regions as agent-authored and keep that metadata with the code.

The Pattern

AI-code provenance is recording what produced a change -- which agent, which model, which prompt or spec -- and keeping that record attached to the code, so review, metrics, and compliance can treat agent-authored work differently from hand-written work. The cheapest place to capture it is the commit. Engineer Jamie Tanna, who attributes LLM-derived code in his own repositories, settled on a Git Co-authored-by trailer naming the model (for example Co-authored-by: gpt-oss:20b <...>) over inline comments or PR descriptions, precisely because the trailer is durable: commit metadata survives platform migrations and code churn, whereas comments rot and PR threads get lost when a project moves hosts. His reasons are mundane and practical -- giving reviewers honest context, and "making it easier for my colleagues in legal" if anyone later needs to find where AI-derived code was used. Some open-source projects have started to require this; Ghostty, for instance, asks contributors to disclose AI assistance on pull requests.

The two common capture points trade off against each other. Commit trailers are permanent but only as honest as the tooling that writes them; PR-level disclosure (the Ghostty style) is easy to mandate but coarse and easy to forget. Either way the principle is the same as tracking origin or license for any other code: capture it at the moment of generation, not after.

Why It Matters

You cannot review, measure, or govern what you cannot distinguish. Provenance routes review attention to where it is needed, keeps productivity metrics honest -- machine-generated volume is not human output (rethinking performance) -- and answers the licensing and audit questions that sharpen as more of the codebase becomes machine-written. It also matters for incident response: as SRE Roxane Fischer observes, AI-generated code raises the rate of incidents, and without root-cause attribution those incidents "pile up" on already-stretched on-call engineers who cannot tell which changes a human reasoned about. Vendor framing pushes this further -- Beyond Identity argues provenance should be a cryptographic link between every commit and a verified human-or-machine identity, citing roughly 41% of new code as AI-generated; treat that figure as directional self-reported marketing, but the direction is real.

The honest caveat: provenance is only worth anything if it is captured automatically and truthfully at generation time. A Co-authored-by trailer written by an honest tool is evidence; one a developer can silently strip, or that never fires when someone pastes from a chat window, is theater. Mandating disclosure does not make it accurate, and bolting attribution onto history after the fact is mostly guesswork. The value is in the plumbing, not the policy.

Sources

Last reviewed: 2026-06-25