Survey on AI existential risk scenarios

By Sam Clarke, ac, jonasschuett @ 2021-06-08T17:12 (+154)

Cross-posted to LessWrong.

Summary

Motivation

It has been argued that AI could pose an existential risk. The original risk scenarios were described by Nick Bostrom and Eliezer Yudkowsky. More recently, these have been criticised, and a number of alternative scenarios have been proposed. There has been some useful work exploring these alternative scenarios, but much of this is informal. Most pieces are only presented as blog posts, with neither the detail of a book, nor the rigour of a peer-reviewed publication. For further discussion of this dynamic, see work by Ben Garfinkel, Richard Ngo and Tom Adamczewski.

The result is that it is no longer clear which AI risk scenarios experts find most plausible. We think this state of affairs is unsatisfactory for at least two reasons. First, since many of the proposed scenarios seem underdeveloped, there is room for further work analyzing them in more detail. But this is time-consuming and there are a wide range of scenarios that could be analysed, so knowing which scenarios leading experts find most plausible is useful for prioritising this work. Second, since the views of top researchers will influence the views of the broader AI safety and governance community, it is important to make the full spectrum of views more widely available. The survey is intended to be a first step in this direction.

The survey

We asked researchers to estimate the probability of five AI risk scenarios, conditional on an existential catastrophe due to AI having occurred. There was also a catch-all “other scenarios” option.

These were the five scenarios we asked about, and the descriptions we gave in the survey:

We chose these five scenarios because they have been most prominent in previous discussions about different AI risk scenarios. For more details about the survey, you can find a copy of it at this link.

Key results

There was considerable disagreement among researchers about which risk scenarios are most likely

Researchers are uncertain about which risk scenarios are most likely

The median self-reported confidence level given by respondents was 2, on a seven point Likert scale from 0 to 6, where:

Researchers put substantial credence on “other scenarios”

The “other scenarios” option had the highest median probability, at 20%. Some researchers left free-form comments describing these other scenarios. Most of them have seen no public write-up, and the others have been explored in less detail than the five scenarios we asked about.

Key takeaway

Together, these three results suggest that there is a lot of value in exploring the likelihood of different risk scenarios in more detail. This could look like:

Recent “failure stories” by Andrew Critch and Paul Christiano - which seem to have been well-received and appreciated - also suggest that there is value in exploring different risk scenarios in more detail. Likewise, Rohin Shah advocates for this kind of work, and AI Impacts has recently compiled a collection of stories to clarify, explore or appreciate possible future AI scenarios.

Caveats

One important caveat is the tractability of exploring the likelihood of different AI risk scenarios in more detail. The existence of considerable disagreement, despite there having been some attempts to clarify and discuss these issues, could suggest that making progress on this is difficult. However, we think there has been relatively little effort towards this kind of work so far, and that there is still a lot of low-hanging fruit.

Additionally, there were a number of limitations in the survey design, which are summarised in this document. If we were to run the survey again, we would do many things differently. Whilst we think that our main findings stand up to these limitations, we nonetheless advise taking them cautiously, and as just one piece of evidence - among many - about researchers’ views on AI risk.

Other notable results

Most of this community’s discussion about existential risk from AI focuses on scenarios involving one or more powerful, misaligned AI systems that take control of the future. This kind of concern is articulated most prominently in “Superintelligence” and “What failure looks like”, corresponding to three scenarios in our survey (the “Superintelligence” scenario, part 1 and part 2 of “What failure looks like”). The median respondent’s total (conditional) probability on these three scenarios was 50%, suggesting that this kind of concern about AI risk is still prevalent, but far from the only kind of risk that researchers are concerned about today.

69% of respondents reported that they have lowered their probability estimate in the “Superintelligence” scenario (as described above)[7] since the first year they were involved in AI safety/governance. This may be because they now assign relatively higher probabilities to other risk scenarios happening first, and not necessarily because they think that fast takeoff or other premises of the “Superintelligence” scenario are less plausible than they originally did.

Full version

At this time, we are only publishing this abbreviated version of the results. We have a version of the full results that we may publish at a later date. Please contact one of us if you would like access to this, and include a sentence on why the results would be helpful or what you intend to use them for.

Acknowledgements

We would like to thank all researchers who participated in the survey. We are also grateful for valuable comments and feedback from JJ Hepburn, Richard Ngo, Ben Garfinkel, Max Daniel, Rohin Shah, Jess Whittlestone, Rafe Kennedy, Spencer Greenberg, Linda Linsefors, David Manheim, Ross Gruetzemacher, Adam Shimi, Markus Anderljung, Chris McDonald, David Kreuger, Paolo Bova, Vael Gates, Michael Aird, Lewis Hammond, Alex Holness-Tofts, Nicholas Goldowsky-Dill, the GovAI team, the AI:FAR group, and anyone else we ought to have mentioned here. This project grew out of AISC and FHI SRF. All errors are our own.


  1. We will not look at any responses from now on; this is intended just to show what questions were asked, and in case any readers are interested in thinking through their own responses. ↩︎

  2. AI existential risk scenarios are sometimes called threat models. ↩︎

  3. Bostrom describes many scenarios in the book “Superintelligence”. We think that this scenario is the one that most people remember from the book, but nonetheless, we think it was probably a mistake to refer to this particular scenario by this name. ↩︎

  4. Likewise, the mean responses for the five given scenarios are all between 15% and 18%, and the mean response for “other scenarios” was 25%. ↩︎

  5. Other similar results: 77% of respondents assigned ≤5% (conditional) probability to at least one scenario; 51% of respondents assigned ≤5% (conditional) probability to at least two scenarios. ↩︎

  6. For another way of interpreting this, consider that if respondents were evenly split into six completely “polarised” camps, each of which put 100% probability on one option and 0% on the others, then the mean absolute deviation for each scenario would be ~28%. ↩︎

  7. As per footnote 3, the particular scenario we are referring to here is not the only scenario described in “Superintelligence”. ↩︎


RobBensinger @ 2021-06-08T19:28 (+25)

Fascinating results! I really appreciate the level of thought and precision you all put into the survey questions.

Were there any strong correlations between which of the five scenarios respondents considered more likely?

SamClarke @ 2021-06-11T13:56 (+15)

Thanks Rob, interesting question. Here are the correlation coefficients between pairs of scenarios (sorted from max to min):

So it looks like there are only weak correlations between some scenarios.

It's worth bearing in mind that we asked respondents not to give an estimate for any scenarios they'd thought about for less than 1 hour. The correlations could be stronger if we didn't have this requirement.

MichaelA @ 2021-06-09T06:19 (+15)

Thanks for sharing these interesting and findings and write-up! This seems like a quite useful survey to have done, for similar reasons  to why I think/hope work like Clarifying some key hypotheses in AI alignment, Crucial questions for longtermists, Database of existential risk estimates, and much of the prior work you cite would be useful.

People could also check out that database for a sense of how these respondents' views compare to some other quantitative estimates that have been publicly made (which is of course only a subset of all relevant views from all relevant people).

I've also now added to my database of existential risk estimates the six estimates indicated by the sentence "If you take the median response for each scenario and compare them, those (conditional) probabilities are fairly similar (between 10% and 12.5% for the five given scenarios, and 20% for “other scenarios”)", in the "Conditional existential-risk-level estimates". 

jonasschuett @ 2021-06-11T07:55 (+1)

Thanks :)

Maxime_Riche @ 2023-03-27T10:58 (+5)

I am confused by this survey. Taken at face value, working on improving Cooperation would only be x2 less impactful than working on hard AI alignment (only looking at the importance of the problem). And working on partial/naive alignment would be as impactful as working on AI alignment (looking only at the importance).
Does that make sense?

(I make a bunch of assumptions to come up with these values. The starting point is the likelihood of the 5-6 X-risks scenarios. Then I associate each scenario with a field (AI alignment, naive AI alignment, Cooperation) that reduces its likelihood. Then I produce the value above, and they stay similar even if I assume a 2-step model where some scenarios happen before others. Google sheet)

Sam Clarke @ 2023-03-29T23:29 (+4)

Thanks for your comment!

I doubt that it's reasonable to draw these kinds of implications from the survey results, for a few reasons:

  • respondents were very uncertain
  • there's overlap between the scenarios
  • there's no 1-1 mapping between "fields" and risk scenarios (e.g. I'd strongly bet that improved cooperation of certain kinds would make both catastrophic misalignment and war less likely) (though maybe your model tries to account for this, I didn't look at it)

A broader point: I think making importance comparisons (between interventions) on the level of abstraction of "improving cooperation", "hard AI alignment" and "partial/naive alignment" doesn't make much sense. I expect comparing specific plans/interventions to be much more useful.

Maxime_Riche @ 2023-04-04T09:05 (+7)

Thanks for your response! 

Yet, I am still not clearly convinced that my reading doesn't make sense. Here are some comments:

  • "respondents were very uncertain"
    This seems to be, at the same time, the reason why you could want to diversify your portfolio of interventions for reducing X-risks. And the reason why someone could want to improve such estimates (of P(Nth scenario|X-risk)). But it doesn't seem to be a strong reason to discard the conclusion of the survey (It would be, if we had more reliable information elsewhere).
  • "there's overlap between the scenarios":
    I am unsure, but it seems that the overlaps are not that big overall. Especially, the overlap between {1,2,3} and {4,5} doesn't seem huge. (I also wonder if these overlaps also illustrate that you could reduce X-risks using a broader range of interventions (than just "AI alignment" and "AI governance"))
  1. The “Superintelligence” scenario (Bostrom, 2014)
  2. Part 2 of “What failure looks like” (Christiano, 2019)
  3. Part 1 of “What failure looks like” (Christiano, 2019)
  4. War (Dafoe, 2018)
  5. Misuse (Karnofsky, 2016)
  6. Other existential catastrophe scenarios.
  • "no 1-1 mapping between "fields" and risk scenarios"
    Sure, this would benefit from having a more precise model.
  • "Priority comparison of interventions is better than high-level comparisons"
    Right. High-level comparisons are so much cheaper to do, that it seems worth it to stay at that level for now.


The point I am especially curious about is the following:
- Is this survey pointing to the fact that the importance of working on "Technical AI alignment", "AI governance", "Cooperative AI" and "Misuse limitation" are all within one OOM?
By importance here I mean, the importance as in the ITN framework of 80k, not the overall priority, which should include neglectedness, tractabilities and looking at object-level interventions.

NunoSempere @ 2023-04-04T15:39 (+1)

I'd strongly bet that improved cooperation of certain kinds would make both catastrophic misalignment and war less likely

I dislike the usage of "strongly bet" here, given that a literal bet here seems hard to arrive at. See here: <https://nunosempere.com/blog/2023/03/02/metaphorical-bets> for some background.

Sam Clarke @ 2023-04-05T17:05 (+3)

Thanks for this, I won't use "bet" in this context in the future

ben.smith @ 2022-09-01T19:05 (+3)

Are there similar surveys being taken on (1) AI researchers in general (2) technology experts in general and (3) voters?

The first group is relevant because they are the ones currently building AGI; the second group is relevant because they are the ones who will implement it; the third group is relevant because they set up incentives for policymakers.

I am a social science researcher, and if there is no work being done on this, I would consider research on it myself.

Sam Clarke @ 2022-09-21T15:12 (+3)

Re (1) See When Will AI Exceed Human Performance? Evidence from AI Experts (2016) and the 2022 updated version. These surveys don't ask about x-risk scenarios in detail, but do ask about the overall probability of very bad outcomes and other relevant factors.

Re (1) and (3), you might be interested in various bits of research that GovAI has done on the American public and AI researchers.

You also might want to get in touch with Noemi Dreksler, who is working on surveys at GovAI.