This is the most misunderstood graph in AI

by
0 comments
This is the most misunderstood graph in AI

This was certainly the case with Cloud Opus 4.5, the latest version of Anthropic’s most powerful model, which was released in late November. In December, METR announced that Opus 4.5 appeared to be capable of independently completing a task that would take a human about five hours — a vast improvement over even the exponential trend predicted. An Anthropic security researcher tweeted that he would change the direction of his research in light of those results; Another company employee simply wrote, “Mom come pick me up, I’m scared.”

Credit: METR.ORG

But the truth is more complex than those dramatic reactions. For one thing, METR’s estimates about the capabilities of specific models come with substantial error bars. As METER clearly stated on Given the uncertainties inherent in the method, it was impossible to know for sure.

“There are a lot of ways that people are reading too much into the graphs,” says Sydney von Arx, a member of METR’s technical staff.

More fundamentally, the METR plot does not measure AI capabilities at scale, nor does it claim to. To generate graphs, METR primarily tests models on coding tasks, evaluating the difficulty of each by measuring or estimating how long it takes a human to complete it – a metric that not everyone accepts. Cloud Opus 4.5 may be able to complete some tasks that take humans five hours, but that doesn’t mean it’s even close to replacing a human worker.

METR was established to assess the risks posed by frontier AI systems. Although it is best known for the exponential trend plot, it has also worked with AI companies to evaluate their systems in more detail and has published several other independent research projects, including a July 2025 study covered widely Suggesting that AI coding assistants may actually slow down software engineers.

But the exponential plot has made METR’s reputation, and it appears the organization has a complicated relationship with the often breathless reception of that graph. In January, Thomas Qua, one of the lead authors of the paper that introduced it, wrote a blog post Responding to some of the criticism and clarifying its limitations, METR is currently working on a more comprehensive FAQ document. But Qua is not optimistic that these efforts will meaningfully change the discourse. “I think the publicity machine will take away all the warnings, basically, no matter what we do,” he says.

Still, the METR team thinks the storyline has something worthwhile to say about the trajectory of AI progress. “You shouldn’t tie your life to this graph at all,” says von Arx. “But at the same time,” she adds, “I’m sure this trend will continue.”

Related Articles

Leave a Comment