What AI companies can do today to help with the most important century

By Holden Karnofsky @ 2023-02-20T17:40 (+104)

This is a linkpost to https://www.cold-takes.com/what-ai-companies-can-do-today-to-help-with-the-most-important-century/?ref=cold-takes-newsletter#footnotes

I’ve been writing about tangible things we can do today to help the most important century go well. Previously, I wrote about helpful messages to spread and how to help via full-time work.

This piece is about what major AI companies can do (and not do) to be helpful. By “major AI companies,” I mean the sorts of AI companies that are advancing the state of the art, and/or could play a major role in how very powerful AI systems end up getting used.1

This piece could be useful to people who work at those companies, or people who are just curious.

Generally, these are not pie-in-the-sky suggestions - I can name2 more than one AI company that has at least made a serious effort at each of the things I discuss below (beyond what it would do if everyone at the company were singularly focused on making a profit).3

I’ll cover:

I previously laid out a summary of how I see the major risks of advanced AI, and four key things I think can help (alignment research; strong security; standards and monitoring; successful, careful AI projects). I won’t repeat that summary now, but it might be helpful for orienting you if you don’t remember the rest of this series too well; click here to read it.

Some basics: alignment research, strong security, safety standards

First off, AI companies can contribute to the “things that can help” I listed above:

(Click to expand) The challenge of securing dangerous AI

In misalignment risk seriously) and incautious actors (those who are focused on deploying AI for their own gain, and aren't thinking much about the dangers to the whole world). Ideally, cautious actors would collectively have more powerful AI systems than incautious actors, so they could take their time doing alignment research and other things to try to make the situation safer for everyone.

But if incautious actors can steal an AI from cautious actors and rush forward to deploy it for their own gain, then the situation looks a lot bleaker. And unfortunately, it could be hard to protect against this outcome.

It's generally extremely difficult to protect data and code against a well-resourced cyberwarfare/espionage effort. An AI’s “weights” (you can think of this sort of like its source code, though not exactly) are potentially very dangerous on their own, and hard to get extreme security for. Achieving enough cybersecurity could require measures, and preparations, well beyond what one would normally aim for in a commercial context.

(Click to expand) How standards might be established and become national or international

I previously laid out a possible vision on this front, which I’ll give a slightly modified version of here:

Avoiding hype and acceleration

It seems good for AI companies to avoid unnecessary hype and acceleration of AI.

I’ve argued that we’re not ready for transformative AI, and I generally tend to think that we’d all be better off if the world took longer to develop transformative AI. That’s because:

By default, I generally think: “The fewer flashy demos and breakthrough papers a lab is putting out, the better.” This can involve tricky tradeoffs in practice (since AI companies generally want to be successful at recruiting, fundraising, etc.)

A couple of potential counterarguments, and replies:

First, some people think it's now "too late" to avoid hype and acceleration, given the amount of hype and investment AI is getting at the moment. I disagree. It's easy to forget, in the middle of a media cycle, how quickly people can forget about things and move onto the next story once the bombs stop dropping. And there are plenty of bombs that still haven't dropped (many things AIs still can't do), and the level of investment in AI has tons of room to go up from here.

Second, I’ve sometimes seen arguments that hype is good because it helps society at large understand what’s coming. But unfortunately, as I wrote previously, I'm worried that hype gives people a skewed picture.

I also am generally skeptical that there's much hope of society adapting to risks as they happen, given the explosive pace of change that I expect once we get powerful enough AI systems.

I discuss some more arguments on this point in a footnote.4

I don’t think it’s clear-cut that hype and acceleration are bad, but it’s my best guess.

Preparing for difficult decisions ahead

I’ve argued that AI companies might need to do “out-of-the-ordinary” things that don’t go with normal commercial incentives.

Today, AI companies can be building a foundation for being able to do “out-of-the-ordinary” things in the future. A few examples of how they might do so:

Public-benefit-oriented governance. I think typical governance structures could be a problem in the future. For example, a standard corporation could be sued for not deploying AI that poses a risk of global catastrophe - if this means a sacrifice for its bottom line.

I’m excited about AI companies that are investing heavily in setting up governance structures - and investing in executives and board members - capable of making the hard calls well. For example:

It could pay off in lots of ways to make sure the final calls at a company are made by people focused on getting a good outcome for humanity (and legally free to focus this way).

Gaming out the future. I think it’s not too early for AI companies to be discussing how they would handle various high-stakes situations.

Establishing and getting practice with processes for particularly hard decisions. Should the company publish its latest research breakthrough? Should it put out a product that might lead to more hype and acceleration? What safety researchers should get access to its models, and how much access?

AI companies face questions like this pretty regularly today, and I think it’s worth putting processes in place to consider the implications for the world as a whole (not just for the company’s bottom line). This could include assembling advisory boards, internal task forces, etc.

Managing employee and investor expectations. At some point, an AI company might want to make “out of the ordinary” moves that are good for the world but bad for the bottom line. E.g., choosing not to deploy AIs that could be very dangerous or very profitable.

I wouldn’t want to be trying to run a company in this situation with lots of angry employees and investors asking about the value of their equity shares! It’s also important to minimize the risk of employees and/or investors leaking sensitive and potentially dangerous information.

AI companies can prepare for this kind of situation by doing things like:

Internal and external commitments. AI companies can make public and/or internal statements about how they would handle various tough situations, e.g. how they would determine when it’s too dangerous to keep building more powerful models.

I think these commitments should generally be non-binding (it’s hard to predict the future in enough detail to make binding ones). But in a future where maximizing profit conflicts with doing the right thing for humanity, a previously-made commitment could make it more likely that the company does the right thing.

Succeeding

I’ve emphasized how helpful a successful, careful AI projects could be. So far, this piece has mostly talked about the “careful” side of things - how to do things that a “normal” AI company (focused only on commercial success) wouldn’t, in order to reduce risks. But it’s also important to succeed at fundraising, recruiting, and generally staying relevant (e.g., capable of building cutting-edge AI systems).

I don’t emphasize this or write about it as much because I think it’s the sort of thing AI companies are likely to be focused on by default, and because I don’t have special insight into how to succeed as an AI company. But it’s important, and it means that AI companies need to walk a sort of tightrope - constantly making tradeoffs between success and caution.

Some things I’m less excited about

I think it’s also worth listing a few things that some AI companies present as important societal-benefit measures, but which I’m a bit more skeptical are crucial for reducing the risks I’ve focused on.

When an AI company presents some decision as being for the benefit of humanity, I often ask myself, “Could this same decision be justified by just wanting to commercialize successfully?”

For example, making AI models “safe” in the sense that they usually behave as users intend (including things like refraining from toxic language, chaotic behavior, etc.) can be important for commercial viability, but isn’t necessarily good enough for the risks I worry about.

Footnotes


  1. Disclosure: my wife works at one such company (Anthropic) and used to work at another (OpenAI), and has equity in both. 

  2. Though I won’t, because I decided I don’t want to get into a thing about whom I did and didn’t link to. Feel free to give real-world examples in the comments! 

  3. Now, AI companies could sometimes be doing “responsible” or “safety-oriented” things in order to get good PRs, recruit employees, make existing employees happy, etc. In this sense, the actions could be ultimately profit-motivated. But that would still mean there are enough people who care about reducing AI risk that actions like these have PR benefits, recruiting benefits, etc. That’s a big deal! And it suggests that if concern about AI risks (and understanding of how to reduce them) were more widespread, AI companies might do more good things and fewer dangerous things. 

  4. You could argue that it would be better for the world to develop extremely powerful AI systems sooner, for reasons including:

    • You might be pretty happy with the global balance of power between countries today, and be worried that it’ll get worse in the future. The latter could lead to a situation where the “wrong” government leads the way on transformative AI.
    • You might think that the later we develop transformative AI, the more quickly everything will play out, because there will be more computing resources available in the world. E.g., if we develop extremely powerful systems tomorrow, there would only be so many copies we could run at once, whereas if we develop equally powerful systems in 50 years, it might be a lot easier for lots of people to run lots of copies. (More: Hardware Overhang)

    A key reason I believe it’s best to avoid acceleration at this time is because it seems plausible (at least 10% likely) that transformative AI will be developed extremely soon - as in within 10 years of today. My impression is that many people at major AI companies tend to agree with this. I think this is a very scary possibility, and if this is the case, the arguments I give in the main text seem particularly important (e.g., many key interventions seem to be in a pretty embryonic state, and awareness of key risks seems low).

    A related case one could make for acceleration is “It’s worth accelerating things on the whole to increase the probability that the particular company in question succeeds” (more here: the “competition” frame). I think this is a valid consideration, which is why I talk about tricky tradeoffs in the main text. 

  5. Note that my wife is a former employee of OpenAI, the company I link to there, and she owns equity in the company. 


Geoffrey Miller @ 2023-02-21T04:27 (+45)

Holden - thanks for this thoughtful and constructive piece.

However, I think a crucial strategy is missing here.

If we're serious that AI imposes existential risks on humanity, then the best thing that AI companies can do to help us survive this pivotal century is simple: Shut down their AI research. Do something else.  Act like they care about the fate of their kids and grandkids. 

AI research doesn't need to be shut down forever. Maybe just for the next few centuries, until we better understand the risks and how to manage them.

I simply don't understand why so many EAs are encouraging AI development as if it's too cool to question, too inevitable to challenge, and too incentivized to deter. Almost all of us agree that AI will impose potentially catastrophic risks. We all agree that AI alignment is far from solved, and many of us believe it probably won't be solved in time to save us from recklessly fast AI development.

We probably can't shut down AI research through government regulation or gentle coaxing, given the coordination problems, governance problems, arms races, and corporate incentives. But we could probably do it through promoting new social & ethical norms that impose a heavy moral stigma against AI research, AI researchers, and AI companies. Historically, intense moral stigmatization has been successful at  handicapping, delaying, pausing, defunding,  marginalizing, and/or shutting down many research fields. And moral stigmatization in the modern social media world can operate even more quickly, powerfully, globally, and effectively. (I'm working on a longer piece about this moral stigmatization strategy for reducing AI X-risk.)

In short: maybe it's time for EA to stop playing nice with the AI industry -- given that the AI industry is not playing safely with humanity's future.

And maybe it's time to call a spade a spade: if AI companies are pursuing AI capabilities at a rate that could end our species, without any credible safeguards that could protect our species, then they're evil. Maybe we should say they're evil,  treat them as evil, and encourage others to do the same, until they stop doing evil. 

Holden Karnofsky @ 2023-03-18T03:50 (+6)

If I saw a path to slowing down or stopping AI development, reliably and worldwide, I think it’d be worth considering.

But I don’t think advising particular AI companies to essentially shut down (or radically change their mission) is a promising step toward that goal.

And I think partial progress toward that goal is worse than none, if it slows down relatively caution-oriented players without slowing down others.

Miguel @ 2023-02-24T12:46 (+4)

Hello Geofrey,

 

A deeper problem to this is market forces - investments is pouring into the industry and its just not going to stop especially as we've seen how fast chatGPT was adopted (100M users in 2 months). This is a big reason why AI industries will not stop, they have the support of economics to push the boundaries of the AI. My hope is there are installed AI safety guidelines on the first one that will be adopted by billions of people.

 

Thank you.

Geoffrey Miller @ 2023-02-24T16:05 (+3)

Miguel -- the market forces are strong, but they can be over-ridden by moral stigmatization and moral disgust. 

If it becomes morally taboo to invest in AI companies, to work in AI research, to promote AI development, or to vote for pro-AI politicians, then AI research will be handicapped. Just as many other areas of research and development have been handicapped by moral taboos over the last century. 

Greed is a strong emotion driving AI investment. But moral disgust can be an even stronger emotion that could reduce AI investment.

Miguel @ 2023-02-24T16:17 (+1)

Greed is one thing. It is a human universal problem. I would say that a big chunk is greedy but there are those who seek to adapt and were just trying to help build it properly. People in the alignment research probably are those in these category but not sure of how does the moral standards is for them.

Weighing on moral disgust, my analysis is it is possible to push this concept but I believe the general public will not gravitate to this - most will choose the technology camp, as those that will defend AI will explain it from the standpoint that it will "make things easier" - an easier idea to sell.

CuriousEA @ 2023-02-20T19:39 (+18)

With ChatGPT and Open AI, what is your assessment on your and OpenPhil impact on AI safety in relation to the involvement with Open AI?

Kei @ 2023-02-21T00:55 (+13)

He talks about it here: https://www.dwarkeshpatel.com/p/holden-karnofsky#details (Ctrl+F OpenAI)

CuriousEA @ 2023-02-21T13:11 (+1)

Thanks.