We're Not Ready: thoughts on "pausing" and responsible scaling policies

By Holden Karnofsky @ 2023-10-27T15:19 (+150)

Views are my own, not Open Philanthropy’s. I am married to the President of Anthropic and have a financial interest in both Anthropic and OpenAI via my spouse.

Over the last few months, I’ve spent a lot of my time trying to help out with efforts to get responsible scaling policies adopted. In that context, a number of people have said it would be helpful for me to be publicly explicit about whether I’m in favor of an AI pause. This post will give some thoughts on these topics.

I think transformative AI could be soon, and we’re not ready

I have a strong default to thinking that scientific and technological progress is good and that worries will tend to be overblown. However, I think AI is a big exception here because of its potential for unprecedentedly rapid and radical transformation.1

I think sufficiently advanced AI would present enormous risks to the world. I’d put the risk of a world run by misaligned AI (or an outcome broadly similar to that) between 10-90% (so: above 10%) if it is developed relatively soon on something like today’s trajectory. And there are a whole host of other issues (e.g.) that could be just as important if not more so, that it seems like no one has really begun to get a handle on.

Is that level of AI coming soon, and could the world be “ready” in time? Here I want to flag that timelines to transformative or even catastrophically risky AI are very debatable, and I have tried to focus my work on proposals that make sense even for people who disagree with me on the below points. But my own views are that:

If it were all up to me, the world would pause now - but it isn’t, and I’m more uncertain about whether a “partial pause” is good

In a hypothetical world where everyone shared my views about AI risks, there would (after deliberation and soul-searching, and only if these didn’t change my current views) be a global regulation-backed pause on all investment in and work on (a) general3 enhancement of AI capabilities beyond the current state of the art, including by scaling up large language models; (b) building more of the hardware (or parts of the pipeline most useful for more hardware) most useful for large-scale training runs (e.g., H100’s); (c) algorithmic innovations that could significantly contribute to (a).

The pause would end when it was clear how to progress some amount further with negligible catastrophic risk and reinstitute the pause before going beyond negligible catastrophic risks. (This means another pause might occur shortly afterward. Overall, I think it’s plausible that the right amount of time to be either paused or in a sequence of small scaleups followed by pauses could be decades or more, though this depends on a lot of things.) This would require a strong, science-backed understanding of AI advances such that we could be assured of quickly detecting early warning signs of any catastrophic-risk-posing AI capabilities we didn’t have sufficient protective measures for.

I didn’t have this view a few years ago. Why now?

All of that said, I think that advocating for a pause now might lead instead to a “partial pause” such as:

It’s much harder for me to say whether these various forms of “partial pause” would be good.

To pick a couple of relatively simple imaginable outcomes and how I’d feel about them:

Overall I don’t have settled views on whether it’d be good for me to prioritize advocating for any particular policy.5 At the same time, if it turns out that there is (or will be) a lot more agreement with my current views than there currently seems to be, I wouldn’t want to be even a small obstacle to big things happening, and there’s a risk that my lack of active advocacy could be confused with opposition to outcomes I actually support.

I feel generally uncertain about how to navigate this situation. For now I am just trying to spell out my views and make it less likely that I’ll get confused for supporting or opposing something I don’t.

Responsible scaling policies (RSPs) seem like a robustly good compromise with people who have different views from mine (with some risks that I think can be managed)

My sense is that people have views all over the map about AI risk, such that it would be hard to build a big coalition around the kind of pause I’d support most.

I’m excited about RSPs partly because it seems like people in those categories - not just people who agree with my estimates about risks - should support RSPs. This raises the possibility of a much broader consensus around conditional pausing than I think is likely around immediate (unconditional) pausing. And with a broader consensus, I expect an easier time getting well-designed, well-enforced regulation.

I think RSPs represent an opportunity for wide consensus that pausing under certain conditions would be good, and this seems like it would be an extremely valuable thing to establish.

Importantly, agreeing that certain conditions would justify a pause is not the same as agreeing that they’re the only such conditions. I think agreeing that a pause needs to be prepared for at all seems like the most valuable step, and revising pause conditions can be done from there.

Another reason I am excited about RSPs: I think optimally risk-reducing regulation would be very hard to get right. (Even the hypothetical, global-agreement-backed pause I describe above would be hugely challenging to design in detail.) When I think something is hard to design, my first instinct is to hope for someone to take a first stab at it (or at least at some parts of it), learn what they can about the shortcomings, and iterate. RSPs present an opportunity to do something along these lines, and that seems much better than focusing all efforts and hopes on regulation that might take a very long time to come.

There is a risk that RSPs will be seen as a measure that is sufficient to contain risks by itself - e.g., that governments may refrain from regulation, or simply enshrine RSPs into regulation, rather than taking more ambitious measures. Some thoughts on this:

Footnotes

  1. The other notable exception I’d make here is biology advances that could facilitate advanced bioweapons, again because of how rapid and radical the destruction potential is. I default to optimism and support for scientific and technological progress outside of these two cases. 

  2. I like this discussion of why improvements on pretty narrow axes for today’s AI systems could lead quickly to broadly capable transformative AI. 

  3. People would still be working on making AI better at various specific things (for example, resisting attempts to jailbreak harmlessness training, or just narrow applications like search and whatnot). It’s hard to draw a bright line here, and I don’t think it could be done perfectly using policy, but in the “if everyone shared my views” construction everyone would be making at least a big effort to avoid finding major breakthroughs that were useful for general enhancement of very broad and hard-to-bound suites of AI capabilities. 

  4. Examples include improved fine-tuning methods and datasets, new plugins and tools for existing models, new elicitation methods in the general tradition of chain-of-thought reasoning, etc. 

  5. I do think that at least someone should be trying it. There’s a lot to be learned from doing this - e.g., about how feasible it is to mobilize the general public - and this could inform expectations about what kinds of “partial victories” are likely.  


Geoffrey Miller @ 2023-10-27T20:52 (+58)

Holden - these are reasonable points. But I have two quibbles.

First, the recent surveys of the general public's attitudes towards AI risk suggest that a strongly enforced global pause would actually get quite a bit of support. It's not outside the public's Overton Window. It might be considered an 'extreme solution' by AI industry insiders and e/acc cultists. But the public seems to understand that it's just fundamentally dangerous to invent Artificial General Intelligence that's as smart as smart humans (and much, much faster), or to invent Artificial Superintelligence. AI experts might patronize the public by claiming they're just reacting to sensationalized Hollywood depictions of AI risk. But I don't care. If the public understands the potential risks, through whatever media they've been exposed to, and if it leads them to support a pause, we might as well capitalize on public sentiment.

Second, I worry that EAs generally have a 'policy fetish', in assuming that the only way to slow down a technological field is through formal, government-sanctioned regulation and 'good policy' solutions. I think this is incorrect, both historically and logically. In this piece on moral stigmatization of AI, I argued that an informal, grass-roots, public moral backlash against the AI industry could accomplish almost everything formal regulation can accomplish, without many of the loopholes and downsides that regulation would face. If the general public realizes that AGI-directed research is just fundamentally stupid and reckless and a huge extinction risk, they can stigmatize AI researchers, funders, suppliers, etc in ways that shut down the industry -- potentially for decades. If that public stigmatization goes global, the AI industry globally could be put on 'pause' for quite a while. Sure, we might delay some potential benefits from some narrow AI applications. But that's a tradeoff most reasonable people would be willing to accept. (For example, if my generation misses out on AI-created longevity treatments, and we die, but our kids survive, without facing AGI-imposed extinction risks, that's fine with me -- and I think it would be OK with most parents.)

I understand that harnessing the power of moral stigmatization to shut down a promising-but-dangerous technology like AI isn't the usual EA style, but at this point, it might be the only practical solution to pausing dangerous AI development.

Greg_Colbourn @ 2023-10-28T18:11 (+8)

Fully agree. A potential taboo on AGI is something that is far too often overlooked by people who worry about pauses not working well (e.g. see also Scott Alexander, Matthew Barnett, Nora Belrose).

Zed Tarar @ 2023-12-01T13:45 (+1)

This is true--it's the same tactic anti-GMO lobbies, the NRA, NIMBYs, and anti-vaxxers have used. The public as a whole doesn't need to be anti-AI, even a vocal minority will be enough to swing elections and ensure an unfavorable regulatory environment. If I had to guess, AI would end up like nuclear fission--not worth the hassle, but with no off-ramp, no way to unring the alarm bell. 

Ryan Greenblatt @ 2023-10-30T03:36 (+4)

First, the recent surveys of the general public's attitudes towards AI risk suggest that a strongly enforced global pause would actually get quite a bit of support. It's not outside the public's Overton Window. It might be considered an 'extreme solution' by AI industry insiders and e/acc cultists. But the public seems to understand that it's just fundamentally dangerous to invent Artificial General Intelligence that's as smart as smart humans (and much, much faster), or to invent Artificial Superintelligence. AI experts might patronize the public by claiming they're just reacting to sensationalized Hollywood depictions of AI risk. But I don't care. If the public understands the potential risks, through whatever media they've been exposed to, and if it leads them to support a pause, we might as well capitalize on public sentiment.

I think the public might support a pause on scaling, but I'm much more skeptical about the sort of hardware-inclusive pause that Holden discusses here:

global regulation-backed pause on all investment in and work on (a) general3 enhancement of AI capabilities beyond the current state of the art, including by scaling up large language models; (b) building more of the hardware (or parts of the pipeline most useful for more hardware) most useful for large-scale training runs (e.g., H100’s); (c) algorithmic innovations that could significantly contribute to (a)

A hardware-inclusive pause which is sufficient for pausing for >10 years would probably effectively dismantle companies like nvidia and would be at least a serious dent in TSMC. This would involve huge job loss and a large hit to the stock market. I expect people would not support such a pause which effectively requires dismantling a powerful industry.

It's possible I'm overestimating the extent to which hardware needs to be stopped for such a ban to be robust and an improvement on the status quo.

Nick K. @ 2023-10-30T16:54 (+1)

I'm not an expert but economic damage seems to me plausibly like a question of implementation details. E.g. if you ask for a stop in hardware improvements at the same time as implementing hardware-level compute monitoring, this likely requires development of new technology to do efficiently which may allow the current companies to maintain their leading position. 

Of course, restrictions are going to have some effect, and plausibly may hit Nvidia's valuation but it is not at all clear that the economic consequences would necessarily be dramatic (the situation of the car industry and switching to E.V.'s might be vaguely analogous).  

kokotajlod @ 2023-10-29T13:26 (+3)

I think the tech companies -- and in particular the AGI companies -- are already too powerful for such an informal public backlash to slow them down significantly.

Geoffrey Miller @ 2023-10-29T20:27 (+26)

Disagree. Almost every successful moral campaign in history started out as an informal public backlash against some evil or danger.

The AGI companies involve a few thousand people versus 8 billion, a few tens of billions of funding versus 360 trillion total global assets, and about 3 key nation-states (US, UK, China) versus 195 nation-states in the world. 

Compared to actually powerful industries, AGI companies are very small potatoes. Very few people would miss them if they were set on 'pause'.

kokotajlod @ 2023-10-30T13:40 (+2)

I hope you are right.

Greg_Colbourn @ 2023-10-29T13:30 (+4)

I imagine it going hand in hand with more formal backlashes (i.e. regulation, law, treaties).

Greg_Colbourn @ 2023-10-28T18:26 (+12)

Overall I don’t have settled views on whether it’d be good for me to prioritize advocating for any particular policy.5 At the same time, if it turns out that there is (or will be) a lot more agreement with my current views than there currently seems to be, I wouldn’t want to be even a small obstacle to big things happening, and there’s a risk that my lack of active advocacy could be confused with opposition to outcomes I actually support.

You have a huge amount of clout in determining where $100Ms of OpenPhil money is directed toward AI x-safety. I think you should be much more vocal on this - at least indirectly by OpenPhil grant making. In fact I've been surprised at how quiet you (and OpenPhil) have been since GPT-4 was released!

Greg_Colbourn @ 2023-10-28T18:20 (+4)
  • There’s a serious (>10%) risk that we’ll see transformative AI2 within a few years.
  • In that case it’s not realistic to have sufficient protective measures for the risks in time.
  • Sufficient protective measures would require huge advances on a number of fronts, including information security that could take years to build up and alignment science breakthroughs that we can’t put a timeline on given the nascent state of the field, so even decades might or might not be enough time to prepare, even given a lot of effort.

If it were all up to me, the world would pause now

Reading the first half of this post, I feel that your views are actually very close to my own. It leaves me wondering how much your conflicts of interest - 

I am married to the President of Anthropic and have a financial interest in both Anthropic and OpenAI via my spouse.

- are factoring into why you come down in favour of RSPs (above pausing now) in the end.

Peter Wildeford @ 2023-10-27T16:34 (+3)

I’m guessing stopping scaling by US POTUS executive order is not even legally possible though? So I don’t think we’d have to worry about that.

dan.pandori @ 2023-10-27T17:33 (+9)

Legal or constitutional infeasibility does not always prevent executive orders from being applied (or followed). I feel like the US president declaring a state of emergency related to AI catastrophic risk (and then forcing large AI companies to stop training large models) sounds at least as constitutionally viable as the attempted executive order for student loan forgiveness.

I agree that this seems fairly unlikely to happen in practice though.

Zed Tarar @ 2023-12-01T13:41 (+1)

I think you put it well when you said: 

"Some people think that the kinds of risks I’m worried about are far off, farfetched or ridiculous."

If I made the claim that we had 12 months before all of humanity is wiped by an asteroid, you'd rightly ask me for evidence. Have I picked up a distant rock in space using radio telescopes? Some other tangible proof? Or is it a best-guess, since, hey, it's technically possible that we could be hit with an asteroid on any given year. Then imagine if I advocate we spend two percent of global GDP preparing for this event. 

That's where the state of AGI fear is--all scenarios depend on wild leaps of faith and successive assumptions that build on each other. 

I've attempted to put this all in one place with this post