Is There An AI Safety GiveWell?

By Michaël Trazzi @ 2025-09-05T10:59 (+24)

Are there any public cost-effectiveness analyses of different AI Safety charities? For instance, I'm aware of Larks' AI Alignment Literature Review and Charity Comparison, but he didn't include any concrete impact measures and stopped doing it in 2021.

I'm looking for things like "donating $10M to this org would reduce extinction risk from AI by 2035 by 0.01-0.1%" or even "donating $10M to this org would result in X-Y QALYs".

(I understand there are many uncertain variables here that could affect results quite a lot, but I think that some quantitative estimates, even quite rough, would be quite useful for donors).

Ideally would also include comparison to the most cost-effective charities listed by GiveWell, though I understand this would imply comparing things with very different error bars.


MichaelDickens @ 2025-09-05T13:32 (+14)

The simple answer is no, there is no AI safety GiveWell.

For two reasons:

  1. It is not possible to come up with a result like "donating $10M to this org would reduce extinction risk from AI by 2035 by 0.01-0.1%", at anywhere close to the same level of rigor as GiveWell does. (But sounds like you already knew that.)
  2. It would be possible to come up with extremely rough cost-effectiveness estimates. To make the estimates, you'd have to make up some numbers for highly uncertain inputs, readers would widely disagree about what those inputs should be, and changing the inputs would radically change the results. Someone could still make that estimate if they wanted to, but nobody has done it. Nuno Sempere created some cost-effectiveness models in 2021. This is the only effort like this that I'm aware of.

In spite of the problems with cost-effectiveness estimates in AI safety, I still think they're underrated and that people should put more work into quantifying their beliefs.

MichaelDickens @ 2025-09-05T13:33 (+9)

BTW I am open to commissions to do this sort of work; DM me if you're interested. For example I recently made a back-of-the-envelope model comparing the cost-effectiveness of AnimalHarmBench vs. conventional animal advocacy for improving superintelligent AI's values with respect to animal welfare. That model should give a sense of what kind of result to expect. (Note for example the extremely made-up inputs.)

Artūrs Kaņepājs @ 2025-09-05T15:48 (+3)

Thanks for the estimate, very helpful! The cost for AnimalHarmBench in particular was approximately 10x lower than assumed. 30k is my quick central overall estimate, mostly time cost of the main contributors. This excludes previous work, described in the article, and downstream costs. In-house implementation in AI companies could add to the cost a lot but I don't think this should enter the cost effectiveness calculation for comparison vs other forms of advocacy.  

Mo Putera @ 2025-09-05T13:57 (+4)

Someone could still make that estimate if they wanted to, but nobody has done it.

My impression was Nuno Sempere did a lot of this, e.g. here way back in 2021?

NunoSempere @ 2025-09-05T15:21 (+4)

You might also enjoy https://forum.effectivealtruism.org/s/AbrRsXM2PrCrPShuZ and  https://github.com/NunoSempere/SoGive-CSER-evaluation-public

MichaelDickens @ 2025-09-05T14:15 (+2)

My mistake, yes he did do that. I'll edit my answer.

aog @ 2025-09-06T13:55 (+13)

Agreed with the other answers on the reasons why there's no GiveWell for AI safety. But in case it's helpful, I should say that Longview Philanthropy offers advice to donors looking to give >$100K per year to AI safety. Our methodology is a bit different from GiveWell’s, but we do use cost-effectiveness estimates. We investigate funding opportunities across the AI landscape from technical research to field-building to policy in the US, EU, and around the world, trying to find the most impactful opportunities for the marginal donor. We also do active grantmaking, such as our calls for proposals on hardware-enabled mechanisms and digital sentience. More details here. Feel free to reach out to aidan@longview.org or simran@longview.org if you'd like to learn more. 

ClaireZabel @ 2025-09-05T19:12 (+11)

In addition to what Michael said, there are a number of other barriers:

Austin @ 2025-09-07T07:31 (+10)

We've been considering an effort like this on Manifund's side, and will likely publish some (very rudimentary) results soon!

Here are some of my guesses why this hasn't happened already:

Neel Nanda @ 2025-09-06T19:12 (+6)

Agreed with the other comments for why this is doomed. The thing closest to this that I think might make sense, is something like, "conditioned on the following assumptions/worldview we estimate that this intervention for an extra million dollars can have the following effect". I think that anything that doesn't acknowledge the fact that there are enormous fundamental cruxes here is pretty doomed. but that there might be something productive about clustering the space of worldviews and talking about what makes sense by the lights of each