AGI Catastrophe and Takeover: Some Reference Class-Based Priors

By zdgroff @ 2023-05-24T19:14 (+95)

This is a linkpost for https://docs.google.com/document/d/1k0FTUSB0yOPVl_nQp9AfcAOxKMM3MwPONw3Mm8imPu8/edit?usp=drive_link

I am grateful to Holly Elmore, Michael Aird, Bruce Tsai, Tamay Besiroglu, Zach Stein-Perlman, Tyler John, and Kit Harris for pointers or feedback on this document.

Executive Summary

Overview

In this document, I collect and describe reference classes for the risk of catastrophe from superhuman artificial general intelligence (AGI). On some accounts, reference classes are the best starting point for forecasts, even though they often feel unintuitive. To my knowledge, nobody has previously attempted this for risks from superhuman AGI. This is to a large degree because superhuman AGI is in a real sense unprecedented. Yet there are some reference classes or at least analogies people have cited to think about the impacts of superhuman AI, such as the impacts of human intelligence, corporations, or, increasingly, the most advanced current AI systems.

My high-level takeaway is that different ways of integrating and interpreting reference classes generate priors on AGI-caused human extinction by 2070 anywhere between 1/10000 and 1/6 (mean of ~0.03%-4%). Reference classes offer a non-speculative case for concern with AGI-related risks. On this account, AGI risk is not a case of Pascal’s mugging, but most reference classes do not support greater-than-even odds of doom. The reference classes I look at generate a prior for AGI control over current human resources anywhere between 5% and 60% (mean of ~16-26%). The latter is a distinctive result of the reference class exercise: the expected degree of AGI control over the world looks to far exceed the odds of human extinction by a sizable margin on these priors. The extent of existential risk, including permanent disempowerment, should fall somewhere between these two ranges.

This effort is a rough, non-academic exercise and requires a number of subjective judgment calls. At times I play a bit fast and loose with the exact model I am using; the work lacks the ideal level of theoretical grounding. Nonetheless, I think the appropriate prior is likely to look something like what I offer here. I encourage intuitive updates and do not recommend these priors as the final word.

Approach

I collect sets of events that superhuman AGI-caused extinction or takeover would be plausibly representative of, ex ante. Interpreting and aggregating them requires a number of data collection decisions, the most important of which I detail here:

For each reference class, I collect benchmarks for the likelihood of one or two things:
1. Human extinction
2. AI capture of humanity’s available resources.
Many risks and reference classes are properly thought of as annualised risks (e.g., the yearly chance of a major AI-related disaster or extinction from asteroid), but some make more sense as risks from a one-time event (e.g., the chance that the creation of a major AI-related disaster or a given asteroid hit causes human extinction). For this reason, I aggregate three types of estimates (see the full document for the latter two types of estimates):
1. 50-Year Risk (e.g. risk of a major AI disaster in 50 years)
2. 10-Year Risk (e.g. risk of a major AI disaster in 10 years)
3. Risk Per Event (e.g. risk of a major AI disaster per invention)
Given that there are dozens or hundreds of reference classes, I summarise them in a few ways:
1. Minimum and maximum
2. Weighted arithmetic mean (i.e., weighted average)
  1. I “winsorise”, i.e. replace 0 or 1 with the next-most extreme value.
  2. I intuitively downweight some reference classes. For details on weights, see the methodology.
3. Weighted geometric mean

Findings for Fifty-Year Impacts of Superhuman AI

See the full document and spreadsheet for further details on how I arrive at these figures.

Reference Class Descriptions and Summaries

Color scale: green = most credible and informative, red = least

What I Estimate

What’s Included

Summary

Emergence of Relatively Superintelligent Species/Genus

What share of species go extinct because a newly capable species or genus arises?

What share of pre-existing species’ resources do a newly capable species or genera capture?

Share of species extinct because of newly superintelligent species

- Share of megafauna that went extinct shortly after human arrival

- Projections of eventual excess mammal species extinction rate in the anthropocene

- Adjustments of the above rates to account for the fact that humans are exceptional

- Effect of invasive mammal species on island bird extinctions

Minimum: 0

Maximum: 67%

Weighted arithmetic mean: 6.69%

Weighted geometric mean: 0.524%

Share of resources controlled by newly superintelligent species

- Share of land modified or used by humans

- Share of Earth’s surface used by humans

- Share of global or animall biomass consisting of or domesticated by humans

- Average population decline across wildlife species in the anthropocene

- Adjustments of the above rates to account for the fact that humans are exceptional

Minimum: 7.72 x 10^-11

Maximum: 50%

Weighted arithmetic mean: 5.99%

Weighted geometric mean: 0.0141%

Reasons to believe this reference class:

- This is a common analogy in arguments about risk from superhuman AGI.

- Most arguments about superhuman AGI (e.g. convergence theses) are about the idea of intelligence or discontinuous capabilities and thus apply to intelligent species as well.

Reasons not to believe it:

- Biological causes of extinction may differ from AGI-related causes.

- Intelligent species, including humans may be qualitatively different from superhuman AGI.

Known Human Extinction Risks

What is the chance humanity goes extinct from a plausibly alleged extinction threat?

Estimated odds of human extinction

- Chances of 8 billion deaths from bioterror and biowarfare assuming a power law

- Likelihood of mass extinction from an asteroid

- Likelihood of mass extinction from a supernova

- Likelihood of mass extinction from a gamma ray burst

- Yearly chance of “infinite impact” from the Global Challenges Foundation for various causes

Minimum: 0

Maximum: 0.056%

Weighted arithmetic mean: 0.00539%

Weighted geometric mean: 0.000365%

Reasons to believe this reference class:

- Since we largely have only intuitive arguments for AGI risk, looking at other “things people argue could cause extinction” offers a natural benchmark.

- Extinction from cause X should be less likely the harder it is for humans to go extinct.

Reasons not to believe it:

- Since AGI is agential, it is likely more damaging than accidental risks.

- AGI might be seen as more speculative than the risks included here (and therefore lower).

- Observation selection may select for worlds with low natural risks relative to anthropogenic ones.

Power of Social Organisations (Governments and Corporations)

What share of resources are controlled by organised groups of people compared to individuals?

Share of resources controlled by social organisations

- Government or central government spending as share of GDP, US

- Share of humans who are citizens of a nation-state

- Share of people employed by government in OECD countries

- Corporate or government assets as share of global assets

Minimum: 4.7%

Maximum: 50%

Weighted arithmetic mean: 20%

Weighted geometric mean: 29.4%

Reasons to believe this reference class:

- Organised of individual humans are in some sense a superintelligent entity relative to individuals.

Reasons not to believe it:

- It is ambiguous how to distinguish what belongs to a collective and what belongs to individuals.

- Socal orgnisations may be less (or more) intelligent than superhuman AGI.

Naïve Posteriors from Previous Technologies

How likely can human extinction be from a threatening invention given prior inventions?

How likely can transformative change be from a major invention given prior major inventions?

Chance of extinction from a threatening invention

- Chance of extinction from a given category of threatening inventions (subjectively defined) given a Beta (0.5, 0.5) prior

- Chance of extinction from a given threatening invention (subjectively defined) given a Beta (0.5, 0.5) prior

In addition, I estimate what rate of extinction would imply a <1% chance of seeing as many threatening inventions as we have seen.

Rate given Beta (0.5, 0.5) prior (<1% likelihood rate in parentheses):

Minimum: 0.61% (5.53%)

Maximum: 6.33% (48.7%)

Weighted arithmetic mean: 3.94% (31.4%)

Weighted geometric mean: 2.78% (33.4%)

Chance of transformation from a major invention

- Chance an ex-ante potentially transformative invention (subjectively defined) is actually transformative (subjectively defined) given a Beta (0.5, 0.5) prior

- Chance a historic invention (subjectively defined) is actually transformative (subjectively defined) given a Beta (0.5, 0.5) prior

I also compute <1% likelihood estimates as for the extinction measure.

Rate given Beta (0.5, 0.5) prior (<1% likelihood rate in parentheses):

Minimum: 1.25% (5.1%)

Maximum: 17.8% (53.7%)

Weighted arithmetic mean: 13.67% (41.6%)

Weighted geometric mean: 9.17% (29.8%)

Reasons to believe this reference class:

- It is perhaps the reference class where it is most obvious superhuman AGI fits in.

Reasons not to believe it:

- The definition of an invention that would have seemed threatening or major is subjective.

- Extinction estimates depend heavily on the prior since we have never observed human extinction.

- Here I am taking “chance of transformation” as another estimate of “share of resources controlled”, but it is quite a different way of thinking about that (in probabilities rather than fixed shares).

Damages from and Power of AI Systems to Date

How likely is it that current AI systems would cause human extinction?

What share of current economic activity can be automated by existing AI technologies?

Likelihood of a current AI system killing 8 billion people

- Frequency of “critical” incidents from AI systems

- Likelihood a critical incident kills 8 billion people based on various distributions. Note: tenuous and poor fit.

Minimum: 0

Maximum: 0.104%

Weighted arithmetic mean: 0.0718%

Weighted geometric mean: 0.139%

Forecasted AI share of

economy

Naïve extrapolations of the following:

- Share of 2017 work tasks that could be automated

- Contribution of automation to GDP

Minimum: 34.3%

Maximum: 69.3%

Weighted arithmetic mean: 30.5%

Weighted geometric mean: 55.8%

Reasons to believe this reference class:

- This is perhaps the second-most natural reference class after the previous-technologies one.

Reasons not to believe it:

- Extinction likelihoods depend on extrapolations and judgment calls that are difficult to defend.

- Economic estimates are currently very naïve and likely unrealistic.

Rates of Product Defects

How often do various consumer products exhibit major defects?

Share of products with a serious defect

- Share of cars or car components subject to recall (with or without risk of death)

- Share of drugs withdrawn from market or with a post-market safety issue

- Share of meat recalled by weight

- U.S. standard for acceptable cancer risk

Minimum: 2.1 x 10^-10

Maximum: 58.5%

Weighted arithmetic mean: 2.29%

Weighted geometric mean: 0.0645%

Reasons to believe this reference class:

- This reference class seems somewhat natural, and data is available.

Reasons not to believe it:

- Determining what counts as catastrophe requires delicate judgment calls.

- These sorts of products and defects are likely quite different from AGI misalignment.

Linch @ 2023-05-30T21:48 (+17)

What do you think of the human subspecies base rate? In some ways, I think of the position of being human with the arrival of the AI species to be more similar to being a non-sapiens human with the arrival of homo sapiens, than e.g. being wooly mammooths or lions when humans arrived. In particular, I think of the relevant ecological "niches" as closer, and we may conjecture that AIs will have more resource competition with humans than humans with elephants.

zdgroff @ 2023-05-31T00:49 (+4)

Yeah, this is an interesting one. I'd basically agree with what you say here. I looked into it and came away thinking (a) it's very unclear what the actual base rate is, but (b) it seems like it probably roughly resembles the general species one I have here. Given (b), I bumped up how much weight I put on the species reference class, but I did not include the human subspecies as a reference class here given (a).

From my exploration, it looked like there had been loose claims about many of them going extinct because of Homo sapiens, but it seemed like this was probably not true in the relevant sense of "extinct" except possibly in the cases of Neanderthals and Homo floresiensis. By relevant sense of "extinct", I mean dying off/ceasing to reproduce rather than interbreeding. This seems to be the best paper on the topic, concluding that climate change drove most of the extinctions: https://www.sciencedirect.com/science/article/pii/S2590332220304760

As that paper says, Homo sapiens may have contributed to the extinction of the Neanderthals. I found suggestions in the case of Homo floresiensis to be pretty rough. So my take was that there was one species in the Homo genus that might have gone extinct because of Homo sapiens out of ~18 or so. That looks pretty similar to the means I take away from species extinctions (0.5-6%), but I felt it was too unclear to put a number on that gave added value.

Linch @ 2023-05-31T05:54 (+4)

Thanks for the reply! I appreciate it and will think further.

This seems to be the best paper on the topic, concluding that climate change drove most of the extinctions: https://www.sciencedirect.com/science/article/pii/S2590332220304760

To confirm, you find the climate change extinction hypotheses very credible here? I know very little about the topic except I vaguely recall that some scholars also advanced climate change as the hypothesis for the megafauna extinctions but these days it's generally considered substantially less credible than human origin.

zdgroff @ 2023-06-04T19:56 (+4)

From what I can tell, the climate change one seems like the one with the most support in the literature. I'm not sure how much the consensus in favor of the human cause of megafauna extinctions (which I buy) generalizes to the extinction of other species in the Homo genus. Most of the Homo extinctions happened much earlier than the megafauna ones. But it could be—I have not given much thought to whether this consensus generalizes.

The other thing is that "extinction" sometimes happened in the sense that the species interbred with the larger population of Homo sapiens, and I would not count that as the relevant sort of extinction here.

Vasco Grilo🔸 @ 2024-09-10T18:11 (+2)

Interesting discussion, Linch and Zach. Relatedly, people may want to check the episode of Dwarkesh Podcast with David Reich.

Greg_Colbourn @ 2023-06-13T07:36 (+3)

Interesting analysis, thanks. I'm a bit wary of it leading to a false sense of security around AGI though. Your "reasons not to believe", such as -

- Biological causes of extinction may differ from AGI-related causes.
- Intelligent species, including humans may be qualitatively different from superhuman AGI.
- Since AGI is agential, it is likely more damaging than accidental risks.

- are overpowering imo. AGI would be unprecedented in that it would threaten the entirety of carbon-based life (e.g. a superintelligent AI might remove the oxygen from the atmosphere to prevent the corrosion of it's machinery, or star-lift the Sun for energy).

Greg_Colbourn @ 2024-02-27T07:27 (+2)

[Separating out this paragraph into a new comment as I'm guessing it's what lead to the downvotes, and I'd quite like the point of the parent paragraph to stand alone. Not sure if anyone will see this now though.]

I think it's imperative to get the leaders of AGI companies to realise that they are in a suicide race (and that AGI will likely kill them too). The default outcome of AGI is doom. For extinction risk at the 1% level, it seems reasonable (even though it's still 80M lives in expectation) to pull the trigger on AGI for a 99% chance of utopia. This is totally wrong-headed and is arguably contributing massively to current x-risk.

benleo @ 2023-05-24T21:23 (+3)

The reference classes I look at generate a prior for AGI control over current human resources anywhere between 5% and 60% (mean of ~16-26%).

Thanks for this Zach. I found it quite thought provoking, especially the quoted sentence.

Based on your model, AGI controlling human resources is much more likely to occur than extinction. Given that, what events do you think we should be worried about with losing autonomy over resources (and potentially institutions) and are you more concerned about that after this work?

zdgroff @ 2023-05-24T21:53 (+14)

I take 5%-60% as an estimate of how much of human civilization's future value will depend on what AI systems do, but it does not necessarily exclude human autonomy. If humans determine what AI systems do with the resources they acquire and the actions they take, then AI could be extremely important, and humans would still retain autonomy.

I don't think this really left me more or less concerned about losing autonomy over resources. It does feel like this exercise made it starker that there's a large chance of AI reshaping the world beyond human extinction. It's not clear how much of that means the loss of human autonomy. I'm inclined to think in rough, nebulous terms that AI will erode human autonomy over 10% of our future, taking 10% as a sort of midpoint between the extinction likelihood and the degree of AI influence over our future. I think my previous views would have been in that ballpark.

The exercise did lead me to think the importance of AI is higher than I previously did and the likelihood of extinction per se is lower (though my final beliefs place all these probabilities higher than the priors in the report).

Vasco Grilo @ 2024-04-08T16:48 (+2)

Great piece, Zach!

Notably, extinction is narrower than “existential risk,” as I understand it (because it does not include permanent disempowerment or population decreases), while takeover is broader (because it could theoretically increase human potential).

There are many concepts of existential risk, but I like to think about it as the risk of a major reduction in the expected value of the future. Under this definition and an impartial welfarist view^[1], human extinction would not necessarily be an existential catastrophe. Humans, as the species on Earth which arguably has the most control over the future, have contributed to the extinction of many species without causing any meaningful existential risk in the process. So, if advanced AI, as the most powerful entity on Earth, were to cause human extinction, I guess existential risk would be negligible on priors?

Some more related thoughts:

I feel like greater intelligence/rationality/power is correlated (far from perfectly!) with greater ability to contribute to a better world. I think humans are often considered to be the most intelligent/rational/powerful species, and the one with greater ability to contribute to a better world. This suggests advanced AI could have an even greater
If humans were to deliberately cause the extinction of a species of great apes (humans are great apes):
- I assume this would increase the welfare of humans or other species.
- I suppose there would be an effort to minimise suffering. For example, by using contraceptive methods, or aiming to kill the individuals as painlessly as possible.
If advanced AI was to deliberately cause human extinction, I think humans would be pretty likely to suffer very little in the process.
- For instance, humans could be persuaded to continue having fewer children while being provided with material abundance from a booming AI economy. There would be no need to against human will given the availability of superhuman persuasion techniques.
- If AIs were to kill all humans, then I assume they would rely on a method not involving mass suffering, because I am expecting advanced AIs to be more altruistic than humans.

^{^}
I strongly endorse expected total hedonistic utilitarianism.