Big Picture AI Safety: Introduction

By EuanMcLean @ 2024-05-23T11:28 (+32)

tldr: I conducted 17 semi-structured interviews of AI safety experts about their big picture strategic view of the AI safety landscape: how will human-level AI play out, how things might go wrong, and what should the AI safety community be doing. While many respondents held “traditional” views (e.g. the main threat is misaligned AI takeover), there was more opposition to these standard views than I expected, and the field seems more split on many important questions than someone outside the field may infer.

What do AI safety experts believe about the big picture of AI risk? How might things go wrong, what we should do about it, and how have we done so far? Does everybody in AI safety agree on the fundamentals? Which views are consensus, which are contested and which are fringe? Maybe we could learn this from the literature (as in the MTAIR project), but many ideas and opinions are not written down anywhere, they exist only in people’s heads and in lunchtime conversations at AI labs and coworking spaces.

I set out to learn what the AI safety community believes about the strategic landscape of AI safety. I conducted 17 semi-structured interviews with a range of AI safety experts. I avoided going into any details of particular technical concepts or philosophical arguments, instead focussing on how such concepts and arguments fit into the big picture of what AI safety is trying to achieve.

This work is similar to the AI Impacts surveys, Vael Gates’ AI Risk Discussions, and Rob Bensinger’s existential risk from AI survey. This is different to those projects in that both my approach to interviews and analysis are more qualitative. Part of the hope for this project was that it can hit on harder-to-quantify concepts that are too ill-defined or intuition-based to fit in the format of previous survey work.

Questions

I asked the participants a standardized list of questions.

These questions changed gradually as the interviews went on (given feedback from participants), and I didn’t always ask the questions exactly as I’ve presented them here. I asked participants to answer from their internal model of the world as much as possible and to avoid deferring to the opinions of others (their inside view so to speak).

Participants

These interviews were conducted between March 2023 and February 2024, and represent their views at the time.

A very brief summary of what people said

What will happen?

Many respondents expected the first human-level AI (HLAI) to be in the same paradigm as current large language models (LLMs) like GPT-4, probably scaled up (made bigger), with some new tweaks and hacks, and scaffolding like AutoGPT to make it agentic. But a smaller handful of people predicted that larger breakthroughs are required before HLAI. The most common story of how AI could cause an existential disaster was the story of unaligned AI takeover, but some explicitly pushed back on the assumptions behind the takeover story. Some took a more structural view of AI risk, emphasizing threats like instability, extreme inequality, gradual human disempowerment, and a collapse of human institutions.

What should we do about it?

When asked how AI safety might prevent disaster, respondents focussed most on

The research directions people were most excited about were mechanistic interpretability, black box evaluations, and governance research.

What mistakes have been made?

Participants pointed to a range of mistakes they thought the AI safety movement had made. There was no consensus and the focus was quite different from person to person. The most common themes included:

Limitations

Subsequent posts

In the following three posts, I present a condensed summary of my findings, describing the main themes that came up for each question:

  1. What will happen? What will human-level AI look like, and how might things go wrong?
  2. What should we do? What should AI safety be trying to achieve and how?
  3. What mistakes has the AI safety movement made?

You don’t need to have read an earlier post to understand a later one, so feel free to zoom straight in on what interests you.

I am very grateful to all of the participants for offering their time to this project. Also thanks to Vael Gates, Siao Si Looi, ChengCheng Tan, Adam Gleave, Quintin Davis, George Anadiotis, Leo Richter, McKenna Fitzgerald, Charlie Griffin and many of the participants for feedback on early drafts.

This work was funded and supported by FAR AI.


WillPearson @ 2024-05-23T22:48 (+2)

Does anyone have recommendations for people I should be following for structural AI risk discussion and possible implications of post-current deep learning AGI systems.

Mo Putera @ 2024-05-24T10:52 (+3)

For structural AI risk, maybe start from Allan Dafoe's writings (eg this or this with Remco Zwetsloot) and follow the links to cited authors? Also Sam Clarke (here), Justin Olive (here

SummaryBot @ 2024-05-23T15:42 (+1)

Executive summary: Interviews with 17 AI safety experts reveal a diversity of views on key questions about the future of AI and AI safety, with some dissent from standard narratives.

Key points:

  1. Respondents expect the first human-level AI to resemble scaled-up language models with additional capabilities, but some believe major breakthroughs are still needed.
  2. The standard unaligned AI takeover scenario was the most common existential risk story, but some pushed back on its assumptions. Alternative risks included instability, inequality, gradual disempowerment, and institutional collapse.
  3. Key priorities for AI safety included technical solutions, spreading safety mindsets, sensible regulation, and building AI science. Promising research directions were mechanistic interpretability, black box evaluations, and governance.
  4. Perceived mistakes by the AI safety community included overreliance on theoretical arguments, insularity, pushing fringe views, enabling race dynamics, lack of independent thought, misguided advocacy for AI pause, and neglecting policy.
  5. The interviews had some limitations, including potential selection bias and lack of input from certain key organizations.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.