Measuring what matters in the age of AI agents – O’Reilly

by February 2, 2026

by February 2, 2026 0 comments

Measuring what matters in the age of AI agents - O'Reilly

This post first appeared on Mike Amundsen Hints from our future past Republished here with permission from the newsletter and the author.

We are well past the novelty phase of AI-assisted coding. The new challenge is measurement. How do we know in which cases all this growth—CoPilot, Cursor, Goose, Gemini—is actually making us better?

on the team DX Provides one of the first credible attempts to answer that question. their AI Measurement Framework Focuses on three dimensions: use, impact and cost. they associate these with dx core 4: 1) change failure rate, 2) PR throughput, 3) estimated delivery speed, and 4) developer experience. Together they help companies see how AI changes the dynamics of production systems.

For example, at Booking.com this means a 16 percent throughput increase in a few months. At Block, this informed the design of their internal AI agent, Goose. The broader context of this work is included in Gergely Oroz’s practical engineer detailed analysisWhich combines research from Laura Tacho, CTO of DX, into how 18 leading tech companies are learning to track the impact of AI on engineering performance.

Agent as Extension

The message running through DX’s framework is both simple and fundamental: Treat coding agents as extensions of teams, not independent contributors. That thought changes everything. It redefines productivity as a property of hybrid teams (humans and their AI extensions) and measures performance the way we already measure leadership: how effectively humans direct their “teams” of agents.

It also calls for a rebalancing of our metrics. Increasing AI speed cannot come at the expense of maintainability or clarity. The most mature organizations are tracking time saved and time lost as each gain in automation creates new complexity elsewhere in the system. When that feedback loop closes, AI stops being a novelty and becomes an expense that unravels a living part of the organization’s ecology.

shared understanding

The deeper point here is not about dashboards or KPIs. It’s about how we adapt meaningfully to a world where the boundaries between developer, agent, and system blur.

The DX Framework reminds us that metrics are only useful when they reflect shared understanding. No fear, no surveillance. Poor use leads to poor measurement control. When used wisely it becomes a lesson. In that sense, it is not simply a framework for tracking AI adoption. This is a field guide to co-evolution. To design new interfaces between people and their digital counterparts.

Because in the end, the question is not how fast AI can code. Is it helping us create human, technological and organizational systems that can learn, adapt and remain consistent as we grow?

key takeaway

Each developer will increasingly serve as a lead for a team of AI agents.

Measuring what matters in the age of AI agents – O’Reilly

Agent as Extension

shared understanding

key takeaway

Critical First Steps to Designing a Successful Enterprise AI System

WTF is a parameter?!? – KDnGates

Related Articles

Leave a Comment Cancel Reply