AI Could Defeat All Of Us Combined

By Holden Karnofsky @ 2022-06-10T23:25 (+143)

Cross-posted from Cold Takes

I've been working on a new series of posts about the most important century.

Many people have trouble taking this "misaligned AI" possibility seriously. They might see the broad point that AI could be dangerous, but they instinctively imagine that the danger comes from ways humans might misuse it. They find the idea of AI itself going to war with humans to be comical and wild. I'm going to try to make this idea feel more serious and real.

As a first step, this post will emphasize an unoriginal but extremely important point: the kind of AI I've discussed could defeat all of humanity combined, if (for whatever reason) it were pointed toward that goal. By "defeat," I don't mean "subtly manipulate us" or "make us less informed" or something like that - I mean a literal "defeat" in the sense that we could all be killed, enslaved or forcibly contained.

I'm not talking (yet) about whether, or why, AIs might attack human civilization. That's for future posts. For now, I just want to linger on the point that if such an attack happened, it could succeed against the combined forces of the entire world.

By contrast, if you don't believe that AI could defeat all of humanity combined, I expect that we're going to be miscommunicating in pretty much any conversation about AI. The kind of AI I worry about is the kind powerful enough that total civilizational defeat is a real possibility. The reason I currently spend so much time planning around speculative future technologies (instead of working on evidence-backed, cost-effective ways of helping low-income people today - which I did for much of my career, and still think is one of the best things to work on) is because I think the stakes are just that high.

Below:

How AI systems could defeat all of us

There's been a lot of debate over whether AI systems might form their own "motivations" that lead them to seek the disempowerment of humanity. I'll be talking about this in future pieces, but for now I want to put it aside and imagine how things would go if this happened.

So, for what follows, let's proceed from the premise: "For some weird reason, humans consistently design AI systems (with human-like research and planning abilities) that coordinate with each other to try and overthrow humanity." Then what? What follows will necessarily feel wacky to people who find this hard to imagine, but I think it's worth playing along, because I think "we'd be in trouble if this happened" is a very important point.

The "standard" argument: superintelligence and advanced technology

Other treatments of this question have focused on AI systems' potential to become vastly more intelligent than humans, to the point where they have what Nick Bostrom calls "cognitive superpowers."2 Bostrom imagines an AI system that can do things like:

(Wait But Why reasons similarly.3)

I think many readers will already be convinced by arguments like these, and if so you might skip down to the next major section.

But I want to be clear that I don't think the danger relies on the idea of "cognitive superpowers" or "superintelligence" - both of which refer to capabilities vastly beyond those of humans. I think we still have a problem even if we assume that AIs will basically have similar capabilities to humans, and not be fundamentally or drastically more intelligent or capable. I'll cover that next.

How AIs could defeat humans without "superintelligence"

If we assume that AIs will basically have similar capabilities to humans, I think we still need to worry that they could come to out-number and out-resource humans, and could thus have the advantage if they coordinated against us.

Here's a simplified example (some of the simplifications are in this footnote4) based on Ajeya Cotra's "biological anchors" report:

To me, this is most of what we need to know: if there's something with human-like skills, seeking to disempower humanity, with a population in the same ballpark as (or larger than) that of all humans, we've got a civilization-level problem.

A potential counterpoint is that these AIs would merely be "virtual": if they started causing trouble, humans could ultimately unplug/deactivate the servers they're running on. I do think this fact would make life harder for AIs seeking to disempower humans, but I don't think it ultimately should be cause for much comfort. I think a large population of AIs would likely be able to find some way to achieve security from human shutdown, and go from there to amassing enough resources to overpower human civilization (especially if AIs across the world, including most of the ones humans were trying to use for help, were coordinating).

I spell out what this might look like in an appendix. In brief:

Some quick responses to objections

This has been a brief sketch of how AIs could come to outnumber and out-resource humans. There are lots of details I haven't addressed.

Here are some of the most common objections I hear to the idea that AI could defeat all of us; if I get much demand I can elaborate on some or all of them more in the future.

How can AIs be dangerous without bodies? This is discussed a fair amount in the appendix. In brief:

If lots of different companies and governments have access to AI, won't this create a "balance of power" so that nobody is able to bring down civilization?

Won't we see warning signs of AI takeover and be able to nip it in the bud? I would guess we would see some warning signs, but does that mean we could nip it in the bud? Think about human civil wars and revolutions: there are some warning signs, but also, people go from "not fighting" to "fighting" pretty quickly as they see an opportunity to coordinate with each other and be successful.

Isn't it fine or maybe good if AIs defeat us? They have rights too.

Risks like this don't come along every day

I don't think there are a lot of things that have a serious chance of bringing down human civilization for good.

As argued in The Precipice, most natural disasters (including e.g. asteroid strikes) don't seem to be huge threats, if only because civilization has been around for thousands of years so far - implying that natural civilization-threatening events are rare.

Human civilization is pretty powerful and seems pretty robust, and accordingly, what's really scary to me is the idea of something with the same basic capabilities as humans (making plans, developing its own technology) that can outnumber and out-resource us. There aren't a lot of candidates for that.12

AI is one such candidate, and I think that even before we engage heavily in arguments about whether AIs might seek to defeat humans, we should feel very nervous about the possibility that they could.

What about things like "AI might lead to mass unemployment and unrest" or "AI might exacerbate misinformation and propaganda" or "AI might exacerbate a wide range of other social ills and injustices"13? I think these are real concerns - but to be honest, if they were the biggest concerns, I'd probably still be focused on helping people in low-income countries today rather than trying to prepare for future technologies.

Special thanks to Carl Shulman for discussion on this post.

Appendix: how AIs could avoid shutdown

This appendix goes into detail about how AIs coordinating against humans could amass resources of their own without humans being able to shut down all "misbehaving" AIs.

It's necessarily speculative, and should be taken in the spirit of giving examples of how this might work - for me, the high-level concern is that a huge, coordinating population of AIs with similar capabilities to humans would be a threat to human civilization, and that we shouldn't count on any particular way of stopping it such as shutting down servers.

I'll discuss two different general types of scenarios: (a) Humans create a huge population of AIs; (b) Humans move slowly and don't create many AIs.

How this could work if humans create a huge population of AIs

I think a reasonable default expectation is that humans do most of the work of making AI systems incredibly numerous and powerful (because doing so is profitable), which leads to a vulnerable situation. Something roughly along the lines of:

(Paul Christiano's What Failure Looks Like examines this general flavor of scenario in a bit more detail.)

In this scenario, the AIs face a challenge: if it becomes obvious to everyone that they are trying to defeat humanity, humans could attack or shut down a few concentrated areas where most of the servers are, and hence drastically reduce AIs' numbers. So the AIs need a way of getting one or more "AI headquarters": property they control where they can safely operate servers and factories, do research, make plans and construct robots/drones/other military equipment.

Their goal is ultimately to have enough AIs, robots, etc. to be able to defeat the rest of humanity combined. This might mean constructing overwhelming amounts of military equipment, or thoroughly infiltrating computer systems worldwide to the point where they can disable or control most others' equipment, or researching and deploying extremely powerful weapons (e.g., bioweapons), or a combination.

Here are some ways they could get to that point:

In any of these cases, once one or more "AI headquarters" are established:

What if humans move slowly and don't create many AIs?

The above scenario has humans creating large numbers of AIs, such that the AIs just need to find a way to coordinate and acquire a safe "headquarters" in order to defeat us.

What if humans moved more slowly, intentionally restricting human-level AI to a tiny portion of the available computing resources? Could a small number of AIs pose a risk to humanity?

In this world, we would have what Carl Shulman refers to as "dry tinder everywhere, waiting for sparks." Anyone who can buy or rent a large amount of computing power can create a large number of AIs, which can produce a large amount of money and research, leading to still more AIs.

So a single AI could hack into enough servers17 to make a few copies of itself; recruit a few human allies; and start making money, acquiring more server space, etc. until its human allies are running a huge number of AIs. This could all be done in difficult-to-detect ways (it might e.g. just look like a set of humans renting a bunch of servers to run quantitative finance strategies).

So in this world, I think our concern should be any AI that is able to find enough security holes to attain that kind of freedom. Given the current state of cybersecurity, that seems like a big concern.

Footnotes

  1. Assuming you accept other points made in the most important century series, e.g. that AI that can do most of what humans do to advance science and technology could be developed this century. 

  2. See Superintelligence chapter 6. 

  3. See the "Nanotechnology blue box," in particular. 

    • The report estimates the amount of computing power it would take to train (create) a transformative AI system, and the amount of computing power it would take to run one. This is a bounding exercise and isn't supposed to be literally predicting that transformative AI will arrive in the form of a single AI system trained in a single massive run, but here I am interpreting the report that way for concreteness and simplicity.
    • As explained in the next footnote, I use the report's figures for transformative AI arriving on the soon side (around 2036). Using its central estimates instead would strengthen my point, but we'd then be talking about a longer time from now; I find it helpful to imagine how things could go in a world where AI comes relatively soon. 
  4. I assume that transformative AI ends up costing about 10^14 FLOP/s to run (this is about 1/10 the Bio Anchors central estimate, and well within its error bars) and about 10^30 FLOP to train (this is about 10x the Bio Anchors central estimate for how much will be available in 2036, and corresponds to about the 30th-percentile estimate for how much will be needed based on the "short horizon" anchor). That implies that the 10^30 FLOP needed to train a transformative model could run 10^16 seconds' worth of transformative AI models, or about 300 million years' worth. This figure would be higher if we use Bio Anchors's central assumptions, rather than assumptions consistent with transformative AI being developed on the soon side. 

  5. They might also run fewer copies of scaled-up models or more copies of scaled-down ones, but the idea is that the total productivity of all the copies should be at least as high as that of several hundred million copies of a human-ish model. 

  6. Intel, Google 

  7. Working-age population: about 65% * 7.9 billion =~ 5 billion. 

  8. Humans could rent hardware using money they made from running AIs, or - if AI systems were operating on their own - they could potentially rent hardware themselves via human allies or just via impersonating a customer (you generally don't need to physically show up in order to e.g. rent server time from Amazon Web Services). 

  9. (I had a speculative, illustrative possibility here but decided it wasn't in good enough shape even for a footnote. I might add it later.) 

  10. I don't go into detail about how AIs might coordinate with each other, but it seems like there are many options, such as by opening their own email accounts and emailing each other.  

  11. Alien invasions seem unlikely if only because we have no evidence of one in millions of years. 

  12. Here's a recent comment exchange I was in on this topic. 

  13. E.g., individual AI systems may occasionally get caught trying to steal, lie or exploit security vulnerabilities, due to various unusual conditions including bugs and errors. 

  14. E.g., see this list of high-stakes security breaches and a list of quotes about cybersecurity, both courtesy of Luke Muehlhauser. For some additional not-exactly-rigorous evidence that at least shows that "cybersecurity is in really bad shape" is seen as relatively uncontroversial by at least one cartoonist, see: https://xkcd.com/2030/  

  15. Purchases and contracts could be carried out by human allies, or just by AI systems themselves with humans willing to make deals with them (e.g., an AI system could digitally sign an agreement and wire funds from a bank account, or via cryptocurrency). 

  16. See above note about my general assumption that today's cybersecurity has a lot of holes in it. 


Vasco Grilo @ 2022-06-17T15:43 (+3)

Won't we see warning signs of AI takeover and be able to nip it in the bud? I would guess we would see some warning signs, but does that mean we could nip it in the bud? Think about human civil wars and revolutions: there are some warning signs, but also, people go from "not fighting" to "fighting" pretty quickly as they see an opportunity to coordinate with each other and be successful.

After seeing warning signs of an incoming AI takeover, I would expect people to go from "not fighting" to "fighting" the AI takeover. As it would be bad for virtually everyone (outside of the apocalyptic residual), there would be an incentive to coordinate against the AIs. This seemingly contrasts to civil wars and revolutions, in which there are many competing interests. 

That being said, I do not think the above is a strong objection, because humanity does not have a flawless track record of coordinating to solve global groblems.

[Footnote 11] I don't go into detail about how AIs might coordinate with each other, but it seems like there are many options, such as by opening their own email accounts and emailing each other.

I can see "how AIs might coordinate with each other". How about the questions:

Holden Karnofsky @ 2023-03-18T00:42 (+4)

These questions are outside the scope of this post, which is about what would happen if AIs were pointed at defeating humanity.

I don't think there's a clear answer to whether AIs would have a lot of their goals in common, or find it easier to coordinate with each other than with humans, but the probability of each seems at least reasonably high if they are all developed using highly similar processes (making them all likely more similar to each other in many ways than to humans).

Michael_Wiebe @ 2022-06-11T20:34 (+2)

once the first human-level AI system is created, whoever created it could use the same computing power it took to create it in order to run several hundred million copies for about a year each.

How does computing power work here? Is it:

  1. We use a supercomputer to train the AI, then the supercomputer is just sitting there, so we can use it to run models. Or:
  2. We're renting a server to do the training, and then have to rent more servers to run the models.

In (2), we might use up our whole budget on the training, and then not be able to afford to run any models.

Holden Karnofsky @ 2023-03-18T00:40 (+2)

Sorry for chiming in so late! The basic idea here is that if you have 2x the resources it would take to train a transformative model, then you have enough to run a huge number of them.

It's true that the first transformative model might eat all the resources its developer has at the time. But it seems likely that (a) given that they've raised $X to train it as a reasonably speculative project, once it turns out to be transformative there will probably be at least a further $X available to pay for running copies; (b) not too long after, as compute continues to get more efficient, someone will have the 2x the resources needed to train the model.

Michael_Wiebe @ 2022-06-11T20:36 (+2)

Basically, is the computing power for training a fixed cost or a variable cost? If it's a fixed cost, then there's no further cost to using the same computing power to train models.

Charles He @ 2022-06-11T20:44 (+6)

I haven’t read the OP (I haven’t read a full forum post in weeks and I don’t like reading, it’s better to like, close your eyes and try coming up with the entire thing from scratch and see if it matches, using high information tags to compare with, generated with a meta model) but I think this is a referral to the usual training/inference cost differences.

For example, you can run GPT-3 Davinci in a few seconds at trivial cost. But the training cost was millions of dollars and took a long time.

There are further considerations. For example, finding the architecture (stacking more things in Torch, fiddling with parameters, figuring out how to implement the Key Insight , etc.) for finding the first breakthrough model is probably further expensive and hard.

Michael_Wiebe @ 2022-06-11T21:00 (+3)

Let  be the computing power used to train the model. Is the idea that "if you could afford  to train the model, then you can also afford  for running models"? 

Because that doesn't seem obvious. What if you used 99% of your budget on training? Then you'd only be able to afford  for running models.

Or is this just an example to show that training costs >> running costs?

aogara @ 2022-06-11T21:47 (+2)

Yes, that's how I understood it as well. If you spend the same amount on inference as you did on training, then you get a hell of a lot of inference. 

I would expect he'd also argue that, because companies are willing to spend tons of money on training, we should also expect them to be willing to spend lots on inference. 

Michael_Wiebe @ 2022-06-11T22:16 (+2)

Do we know the expected cost for training an AGI? Is that within a single company's budget?

aogara @ 2022-06-12T00:38 (+10)

Nearly impossible to answer. This report by OpenPhil gives it a hell of an effort, but could still be wrong by orders of magnitude. Most fundamentally, the amount of compute necessary for AGI might not be related to the amount of compute used by the human brain, because we don’t know how similar our algorithmic efficiency is compared to the brain’s.

https://www.cold-takes.com/forecasting-transformative-ai-the-biological-anchors-method-in-a-nutshell/

Charles He @ 2022-06-11T21:03 (+2)

Yes, the last sentence is exactly correct.

So like the terms of art here are “training” versus “inference”. I don’t have a reference or guide (because the relative size is not something that most people think about versus the absolute size of each individually) but if you google them and scroll through some papers or posts I think you will see some clear examples.

Charles He @ 2022-06-11T21:10 (+2)

Just LARPing here. I don’t really know anything about AI or machine learning.

I guess in some deeper sense you are right and (my simulated version of) what Holden has written is imprecise.

We don’t really see many “continuously” updating models where training continues live with use. So the mundane pattern we see today of inference, where we trivially running the instructions from the model (often on specific silicon made for inference) being much cheaper than training, may not apply for some reason, to the pattern that the out of control AI uses.

It’s not impossible that if the system needs to be self improving, it has to provision a large fraction of its training cost, or something, continually.

It’s not really clear what the “shape” of this “relative cost curve” would be, if this would be a short period of time, and it doesn’t make it any less dangerous.

Tristan Williams @ 2023-11-07T17:22 (+1)

What sort of goals might be common to human level AIs emerging in different contexts for different purposes? Why wouldn't these AIs (in this situation where they're developed slowly and in an embedded context) have just as much diversity in goals as humans do? Or is the argument that they, at an incredibly high level, are just going to end up wanting things more similar to other AIs than humans?