Project proposal: Scenario analysis group for AI safety strategy

By Buhl @ 2023-12-18T18:31 (+35)

This is a linkpost to https://rethinkpriorities.org/longtermism-research-notes/project-proposal-scenario-analysis-group

This is a blog post, not a research report, meaning it was produced quickly and is not to Rethink Priorities’ typical standards of substantiveness and careful checking for accuracy.

Summary

This post contains a proposal for a scenario analysis group for AI safety strategy.

The group would identify important scenarios (in the form of changes to the strategic landscape for AI safety and governance that might happen over the next ~5 years), conduct scenario analysis to identify actions to take now to prepare for such scenarios, and share recommendations with organizations aiming to reduce existential risk from AI. (more)
I suspect this kind of analysis could be an underprovided public good that would help ensure that the community has a strategy that’s robust to future changes to the strategic landscape. (more)
I’d be excited for individuals / groups / existing teams in the AI existential risk space to seriously consider working on this proposal, if they feel excited about the area and have an appropriate background. (more)

I originally wrote this proposal as a suggestion for a project that we (the Existential Security Team at Rethink Priorities) might try to get off the ground as part of our incubation work. Overall, I’m moderately excited about it, though I have some reservations and wouldn’t strongly recommend it (see some arguments against here). I’m also quite excited about adjacent alternative versions, including a version targeting governments and a version focusing on either forecasting crises or building up reserve capacity for crises. (more)

If you’re considering working on this proposal or something similar, feel free to reach out to me at {marie at rethinkpriorities dot org} and I can potentially share additional thoughts. I wrote the proposal a while back and haven’t had time to update it much since, so I’ll probably have updated takes.

A few introductory caveats

The proposal is based on limited research (perhaps ~20 hours) by a generalist (me) without a background in AI strategy / scenario analysis / crisis management, and the evidence base is mainly “thinking and talking to people who’ve thought more about this”. So my views of this project area should not be taken as definitive – the post should just be considered a starting point. Also the ideas in this post are mostly not my own and the proposal is a synthesis of project ideas others have shared with me.
The fact that we’re publishing just this project proposal doesn’t mean that we consider it the most promising of the projects we’ve thought of. We just happened to have a relatively finished proposal that seemed worth sharing. (See some of the other projects we’ve thought about here.)
I’m not necessarily proposing setting up a new organization; I think this project could easily be run within an existing organization (and there are likely benefits to doing so).

What’s the impact story?

The community of people focused on addressing catastrophic and existential risks from AI (from now on “the AIXR community”) is in a unique position to influence how well the world navigates AI risk, due to having foresight of (some of) the specific risks to come and potentially having levers to influence key actors.
The strategic landscape with respect to AI risk has changed rapidly in the past few years – with rapidly improving model capabilities and increased public attention after the release of ChatGPT – and will likely continue to do so in the coming years.
Given this, the AIXR community needs to ensure that its plans are robust to potential future strategic changes and that it is prepared to take advantage of the opportunities offered by those changes.
It’s possible to, ahead of time, identify levers, resources, areas of expertise, and key uncertainties that will be important in a wide range of (or in particularly high-priority) potential future scenarios, and doing so will allow investment in these ahead of time.
This kind of analysis is likely to be an undersupplied public good, since it benefits the community as a whole more than it benefits any individual organization, giving individual organizations insufficient incentive to invest in it.
Having a third party team dedicated to this would help ensure that there’s a clear owner for this kind of work and that there’s sufficient coordination between different actors. I’m sure some degree of scenario analysis is already informing thinking in places that have high leverage over the strategy of the AIXR community (e.g., OpenPhil), though I don’t have a great sense of the extent. But I’m not aware of any projects explicitly focused on this (list of projects I’m aware of here).

What would the team do?

The team I have in mind would focus on the following two activities (at least initially):

Scenario analysis: Flesh out a high-priority scenario, analyze what the optimal response from AIXR-focused organizations would look like, and gather information on how these organizations would in fact respond. Compare different scenarios to identify commonly occurring gaps and proactively share recommendations with key actors (e.g., research question x, set up initiative y, support policy proposal z).
Simulation exercises: Create simple simulation exercises and invite either teams from individual organizations or leadership from multiple organizations to participate, with the aim to (a) gather info to use in scenario planning, (b) raise awareness about high-priority scenarios and gaps, (c) stress test teams’ and leaders’ ability to work under pressure, and (d) increase familiarity and trust between different teams.

It may be preferable to only do simulation exercises, at least to begin with, given the substantial time cost of creating simulation exercises and having many people in important roles spend time playing them. But I do think simulation exercises add unique value, especially in terms (c).

Longer-term, as the team gains expertise and trust, I’d be interested to see it experiment with other and more ambitious crisis preparedness measures, such as managing resources reserved for crises (e.g., a regranting pool or a reservist workforce) or setting up inter-org standard operating procedures (SOPs) for various crises (e.g., robust communication channels, high-level coordinated response plans).

Which scenarios would the team focus on?

I imagine the team would conduct its own analysis of which scenarios are highest-impact to analyze, but here are some examples of scenarios that seem high-priority to me:

A warning shot drastically shifts the Overton window of political action (the type of warning shot (e.g. misuse-based cyberattack, alignment scare, rapid job loss) probably matters a lot).
Strong evidence emerges that human-level AGI will be developed within the next year.
USG becomes unable to seriously regulate frontier AI for the foreseeable future (e.g. because of a scandal, a change to an administration hostile to AI safety, or polarization).
A new country (e.g. Saudi Arabia) rapidly advances towards frontier AI capabilities.
Nationalization of or significant government involvement in AI development.
US-China war breaks out.

What could a pilot look like?

I think this project (as most projects) should likely begin with a pilot to gain more information about how valuable this project seems and about the team’s fit for doing this kind of work.

A simple pilot could be to spend ~1 month analyzing and writing a report on one scenario that seems particularly high-priority to the research team. I think this would give the team a decent indication of whether they’re on track to learn useful insights, both based on their own sense and stakeholder feedback. Running a pilot like this would be fairly low-cost (e.g., assuming two people are involved, it would cost 2 months’ FTE in time and ~$16,000 in salaries).

For a rough outline of what the first year of operations could look like, see this doc.

Some arguments against this proposal

People who could run this project really well may well have better options. The best people to run this project would probably have great strategic thinking, knowledge about AI governance and safety, networks, and ideally experience with scenario analysis, e.g. from government. People with these skills would probably have great opportunities to do more direct AI governance work that might be higher-impact.
Maybe it’s not very tractable to discover important, well-justified new insights given that a fair amount of work has already gone into this kind of strategic thinking.
It seems difficult to convene and get input from some key groups, including subject matter experts (who might be important for analyzing scenarios well) and leaders in the AIXR community (whose buy-in is important to the analysis making a difference).
There are some possible downside risks, such as the leak of infohazardous or sensitive information, net negative recommendations getting implemented, or taking away focus from more important / more direct AIXR reduction work or from other similar coordination mechanisms. But I think most versions of this project are fairly low-risk.

Ideal team

Characteristics that it seems especially useful to have on the team (though if you’re interested, it seems worth considering this project even if you only have some of these / to some degree):

Experience with scenario analysis and wargaming (e.g., from government)
Being trusted, well-networked, and considered relatively neutral in the AIXR space
Extensive knowledge of AI safety and governance
Great strategic thinking

Appendix 1: Alternatives I’ve considered and why I ended up favoring this version instead

These are brief descriptions of other versions of this project I considered and a rough indication of why I didn’t end up favoring them over the mainline version described above.

Note that these reasons are obviously very incomplete and that my judgements here are far from confident. The reasons are not meant to indicate that these projects aren’t worth working on – indeed, I’m excited about many of these alternative projects (in some cases ~as excited about the main idea). I included this section anyway for reasoning transparency purposes; feel free to ask for elaboration on any of these projects/reasons in the comments.

Target audience

Government-facing: Advocate for governments (esp. USG) to prepare for AI crises via standard tools like simulation exercises, developing SOPs, and crisis management planning.
1. Main reasons for not pursuing this: Governments might not yet have enough levers to influence AI crises to develop good SOPs + at least in the short term, other policy asks seem higher-priority and advocating for crisis planning might distract from those + as far as I can tell, there’s more existing projects in this space (see below).
2. I’m still quite excited about this, though.
Lab-facing: Advocate for AI labs to make plans to effectively deal with crises such as discovering a dangerous capability or having released a dangerous model.
1. Main reasons for not pursuing this: Overlap with existing AI governance work on best practices for lab self-governance + difficult to influence AI labs.

Main activities

Forecasting: Forecast and monitor specific risks (e.g. monitor compute clusters or other proxies for capabilities) to give early warning in case of a crisis.
1. Main reasons for not pursuing this: Seems hard to specify signs of an AI crisis well enough to be able to monitor them effectively.
2. I’m still quite excited about this, though.
Reserve capacity: Build up a team of “reservists” to be called upon in crises (or potentially just a pool of funding to be regranted in a crisis).
1. Main reasons for not pursuing this: Anecdotally, some projects like this have struggled to get off the ground in the past + you might need to establish a track record to build trust (e.g., via activities like the one in the main proposal) before having enough trust to do this + difficult to establish appropriate crisis triggers + some momentum behind this already (see below).
2. I’m still quite excited about this, though.
Planning: Make playbooks of crisis response options for different crises.
1. Main reasons for not pursuing this: Difficult to predict crises and actions well enough in advance for this to be useful + difficult to influence key actors to use the plans.
Coordination / crisis response infrastructure: Set up coordination mechanisms, crisis decision-making structures, and high-level crisis management plans and secure buy-in from key stakeholders (probably mainly within the AIXR community).
1. Main reasons for not pursuing this: Seems very difficult to get people to sign up to this kind of coordination.
Continuity: Securing that key organizations can continue operating during crises, e.g. having good cybersecurity, robust communication channels, etc.
1. Main reasons for not pursuing this: I didn’t think much about this one.

Other axes of variation

Focus on crises (one-off events that need a rapid response) vs. crunch time (a period before a major event, e.g. the development of human-level AGI) vs. broad strategic landscape changes (e.g. great power conflict).
Focus on individual organizations vs. inter-organization coordination.
Worldview considerations, e.g., AI timelines, main risks considered.

Appendix 2: Existing crisis planning-related projects I’m aware of

Below is a list of existing related projects I’m aware of and very rough descriptions. I’m probably missing some projects (and possibly also important information about some of them)!

Current and upcoming projects:

Transformative Futures Institute – planning to do AI scenario analysis and foresight aimed at governments (but with broader relevance).
Intelligence Rising – conducts AIXR-themed tabletop games for various actors, including governments, with the primary aim of raising awareness about key dynamics and risks.
Dendritic – planning to do early warning forecasting, infrastructure hardening, and surge capacity in the GCBR space.
OpenAI’s Frontier Risk and Preparedness team – seems like they’re aiming to map misuse and accident threat models and come up with preparedness and response plans.
The Long Game Project – resources for running tabletop exercises; I don’t know much about them.
The Treacherous Turn – AI takeover role playing game; I don’t know much about them.
Global Shield – advocating for governments to put in place policies to manage global catastrophic risks.
Nuño Sempere – at the time of writing, Nuño is fundraising to run an emergency response team à la ALERT (see below).
Non-AIXR-focused wargaming and crisis management plans within governments.
Various people in the AIXR community, especially in leadership positions, generally thinking about community strategy (not sure how much of this is going on).

Past projects:

ALERT – pitch to form a team of “reservists” to be activated in a major crisis (any neglected area); founder search happened in 2022, but the project never got off the ground; full funding amount passed on to Dendritic.
CEA community health team – has done some research on potential measures to increase crisis preparedness within the EA ecosystem.

Acknowledgements

This research is a project of Rethink Priorities. It was written by Marie Davidsen Buhl. Thanks to Akash Wasil, Alex Demarsh, Ama Dragomir, Caroline Jeanmaire, Chana Messinger, Daniel Kokotajlo, Gavin Leech, Gregg Jones, Joe O’Brien, JueYan Zhang, Nuño Sempere, Ross Gruetzemacher and Sammy Martin for helpful input and discussions that informed this proposal (though they don’t necessarily endorse or agree with any points made in this post). Thanks to my manager Ben Snodin for providing helpful feedback on this post. If you like our work, please consider subscribing to our newsletter. You can explore our completed public work here.