Is benchmarking AI capabilities positive EV?

By Charlie_Guthmann @ 2026-02-17T22:30 (+23)

I think this discussion was had on LW a few years ago (and probably sporadically since then).  Just quickly some parameters off the top of my head. 

Pro: 

Cons:

Unsure


MichaelDickens @ 2026-02-18T01:05 (+9)

Know when to sound alarm bells

What is the situation where people coordinate to sound alarm bells over an AI benchmark? I basically don't think people pay attention to benchmarks in the way that matters: a benchmark comes out demonstrating some new potential danger, AI safety people raise concerns about it, and the people with the actual power continue to ignore them.

Thinking of some historical examples:

MichaelDickens @ 2026-02-18T01:09 (+8)

A relevant question I'm not sure about: for people who talk to politicians about AI risk, how useful are benchmarks? I'm not involved in those conversations so I can't really say. My guess is that politicians are more interested in obvious capabilities (e.g. Claude can write good code now) than they are in benchmark performance.

NickLaing @ 2026-02-18T07:46 (+7)

I think its incredibly hard to determine, but probably negative EV, as has been the history (IMO) of many (perhaps even most?) EA AI safety interventions. Well intended but for-the-worse-in-the-end. As a side note, I think the history of EA intervention in AI is a great example of just how hard it is to intentionally positively influence even the mid-term future.

It would only be positive if it actually contributed to some kind of slowdown or policy change, which seems pretty unlikely to me. I don't think an eval reslut will ever sound an alarm bell, that would only come from some kind of obvious capability shift or warning shot that affected the general public. 

More likely is that it slightly speeds up AI development basically like @MichaelDickens said.

1. Adding to the race dynamic with the clear targets and competition to be at the 'top'
2. Giving AI companies better tools to measure progress
3. Incentivising investment 

MichaelDickens @ 2026-02-18T00:57 (+4)

Increases the speed of ai development

This is the biggest con by far. There are at least two mechanisms by which it could happen: