Is There An AI Safety GiveWell?

By Michaël Trazzi @ 2025-09-05T10:59 (+22)

Are there any public cost-effectiveness analyses of different AI Safety charities? For instance, I'm aware of Larks' AI Alignment Literature Review and Charity Comparison, but he didn't include any concrete impact measures and stopped doing it in 2021.

I'm looking for things like "donating $10M to this org would reduce extinction risk from AI by 2035 by 0.01-0.1%" or even "donating $10M to this org would result in X-Y QALYs".

(I understand there are many uncertain variables here that could affect results quite a lot, but I think that some quantitative estimates, even quite rough, would be quite useful for donors).

Ideally would also include comparison to the most cost-effective charities listed by GiveWell, though I understand this would imply comparing things with very different error bars.

MichaelDickens @ 2025-09-05T13:32 (+14)

The simple answer is no, there is no AI safety GiveWell.

For two reasons:

It is not possible to come up with a result like "donating $10M to this org would reduce extinction risk from AI by 2035 by 0.01-0.1%", at anywhere close to the same level of rigor as GiveWell does. (But sounds like you already knew that.)
It would be possible to come up with extremely rough cost-effectiveness estimates. To make the estimates, you'd have to make up some numbers for highly uncertain inputs, readers would widely disagree about what those inputs should be, and changing the inputs would radically change the results. Someone could still make that estimate if they wanted to, ~~but nobody has done it.~~ Nuno Sempere created some cost-effectiveness models in 2021. This is the only effort like this that I'm aware of.

In spite of the problems with cost-effectiveness estimates in AI safety, I still think they're underrated and that people should put more work into quantifying their beliefs.

MichaelDickens @ 2025-09-05T13:33 (+9)

BTW I am open to commissions to do this sort of work; DM me if you're interested. For example I recently made a back-of-the-envelope model comparing the cost-effectiveness of AnimalHarmBench vs. conventional animal advocacy for improving superintelligent AI's values with respect to animal welfare. That model should give a sense of what kind of result to expect. (Note for example the extremely made-up inputs.)

Artūrs Kaņepājs @ 2025-09-05T15:48 (+3)

Thanks for the estimate, very helpful! The cost for AnimalHarmBench in particular was approximately 10x lower than assumed. 30k is my quick central overall estimate, mostly time cost of the main contributors. This excludes previous work, described in the article, and downstream costs. In-house implementation in AI companies could add to the cost a lot but I don't think this should enter the cost effectiveness calculation for comparison vs other forms of advocacy.

Mo Putera @ 2025-09-05T13:57 (+4)

Someone could still make that estimate if they wanted to, but nobody has done it.

My impression was Nuno Sempere did a lot of this, e.g. here way back in 2021?

NunoSempere @ 2025-09-05T15:21 (+4)

You might also enjoy https://forum.effectivealtruism.org/s/AbrRsXM2PrCrPShuZ and https://github.com/NunoSempere/SoGive-CSER-evaluation-public

MichaelDickens @ 2025-09-05T14:15 (+2)

My mistake, yes he did do that. I'll edit my answer.

aog @ 2025-09-06T13:55 (+13)

Agreed with the other answers on the reasons why there's no GiveWell for AI safety. But in case it's helpful, I should say that Longview Philanthropy offers advice to donors looking to give >$100K per year to AI safety. Our methodology is a bit different from GiveWell’s, but we do use cost-effectiveness estimates. We investigate funding opportunities across the AI landscape from technical research to field-building to policy in the US, EU, and around the world, trying to find the most impactful opportunities for the marginal donor. We also do active grantmaking, such as our calls for proposals on hardware-enabled mechanisms and digital sentience. More details here. Feel free to reach out to aidan@longview.org or simran@longview.org if you'd like to learn more.

ClaireZabel @ 2025-09-05T19:12 (+11)

In addition to what Michael said, there are a number of other barriers:

Compared to many global health interventions, AI is a more rapidly-changing field and many believe we have less time to have an impact, leading to a lot more updates-per-time about cost effectiveness, and making each estimate less useful. E.g. interventions like research on mechanistic interpretability can come into and out of fashion in a small number of years. Organizations focused on working with one political party might drop vastly in expected effectiveness after an election, etc. In contrast, GiveWell relies on studies that took longer to conduct than most of the AI safety field has existed (e.g. my understanding is Cisse et al 2016 took 8 years from start to publication; 8 years ago, about 2.5x longer than ChatGPT has existed in any form)
There is probably a much smaller base of small-to-mid-sized donors responsive to these estimates, making them less valuable
There are a large number of quite serious philosophical and empirical complexities associated with comparing GiveWell and longtermist-relevant charities, like your views about population ethics, total utilitarianism vs preference utilitarianism (vs others), the expected number of moral patients in the far future, acausal trade, etc.
[I work at Open Phil on AI safety and used to work at GiveWell, but my views are my own]

Austin @ 2025-09-07T07:31 (+10)

We've been considering an effort like this on Manifund's side, and will likely publish some (very rudimentary) results soon!

Here are some of my guesses why this hasn't happened already:

As others mentioned, longtermism/xrisk work has long feedback loops, and the impact of different kinds of work is very sensitive to background assumptions
AI safety is newer as a field -- it's more like early-stage venture funding (which is about speculating on unproven teams and ideas) or academic research, rather than public equities (where there's lots of data for analysts to go over)
AI safety is also a tight-knit field, so impressions travel by word of mouth rather than through public analyses
It takes a special kind of person to be able to do Givewell-type analyses well; grantmaking skill is rare. It then takes some thick skin to publish work that's critical of people in a tight-knit field
OpenPhil and Longview don't have much incentive to publish their own analyses (as opposed to just showing them to their own donors); they'll get funded either way, and on the flip side, publishing their work exposes them to downside risk

Neel Nanda @ 2025-09-06T19:12 (+6)

Agreed with the other comments for why this is doomed. The thing closest to this that I think might make sense, is something like, "conditioned on the following assumptions/worldview we estimate that this intervention for an extra million dollars can have the following effect". I think that anything that doesn't acknowledge the fact that there are enormous fundamental cruxes here is pretty doomed. but that there might be something productive about clustering the space of worldviews and talking about what makes sense by the lights of each