PatternTechAcross TeamsAI draft

Agent Observability

The layer that records what an agent actually did -- every reasoning step, tool call, and handoff -- as traces you can inspect. It matters more than in normal software because the execution path changes every run; OpenTelemetry is the emerging standard, and it is the substrate evals attach to.

The Pattern

"Your agent called tool B before tool A, and B has a dependency on A. You did not catch it because nothing in your code audits agents. The telemetry does." -- Dat Ngo, Arize (source)

Agent observability is the platform layer that records what an agent actually did -- every reasoning step, tool call, input, and handoff -- as traces and spans you can inspect after the fact. It matters more than in traditional software because the system is non-deterministic: the execution path changes with every run, so you cannot reason about behavior from the code alone (Dat Ngo, Arize). The emerging standard is OpenTelemetry, which keeps instrumentation portable across whatever harness, model, or framework a team uses.

Observability is the substrate the eval service sits on: traces are where evals attach, and where new failure modes surface before you have a metric for them. It also catches the regression trap of non-deterministic systems -- when you "fix the thing you thought you fixed, you might have produced two or three regressions you didn't know about" (Dat Ngo).

Why It Matters

When humans stop reviewing every line -- the dark factory end state -- traceability becomes the control that replaces review. Every action logged and attributable to a human-defined intent is what makes autonomy auditable, and what lets regulated industries adopt agents at all. Observability is not a debugging nicety here; it is the governance layer of the whole platform.

Sources

Last reviewed: 2026-06-24