Project proposal: Scenario analysis group for AI safety strategy

By Buhl @ 2023-12-18T18:31 (+35)

This is a linkpost to https://rethinkpriorities.org/longtermism-research-notes/project-proposal-scenario-analysis-group

This is a blog post, not a research report, meaning it was produced quickly and is not to Rethink Priorities’ typical standards of substantiveness and careful checking for accuracy.

Summary

This post contains a proposal for a scenario analysis group for AI safety strategy. 

I originally wrote this proposal as a suggestion for a project that we (the Existential Security Team at Rethink Priorities) might try to get off the ground as part of our incubation work. Overall, I’m moderately excited about it, though I have some reservations and wouldn’t strongly recommend it (see some arguments against here). I’m also quite excited about adjacent alternative versions, including a version targeting governments and a version focusing on either forecasting crises or building up reserve capacity for crises. (more)

If you’re considering working on this proposal or something similar, feel free to reach out to me at {marie at rethinkpriorities dot org} and I can potentially share additional thoughts. I wrote the proposal a while back and haven’t had time to update it much since, so I’ll probably have updated takes.

A few introductory caveats

What’s the impact story?

What would the team do?

The team I have in mind would focus on the following two activities (at least initially):

  1. Scenario analysis: Flesh out a high-priority scenario, analyze what the optimal response from AIXR-focused organizations would look like, and gather information on how these organizations would in fact respond. Compare different scenarios to identify commonly occurring gaps and proactively share recommendations with key actors (e.g., research question x, set up initiative y, support policy proposal z).
  2. Simulation exercises: Create simple simulation exercises and invite either teams from individual organizations or leadership from multiple organizations to participate, with the aim to (a) gather info to use in scenario planning, (b) raise awareness about high-priority scenarios and gaps, (c) stress test teams’ and leaders’ ability to work under pressure, and (d) increase familiarity and trust between different teams. 

It may be preferable to only do simulation exercises, at least to begin with, given the substantial time cost of creating simulation exercises and having many people in important roles spend time playing them. But I do think simulation exercises add unique value, especially in terms (c).

Longer-term, as the team gains expertise and trust, I’d be interested to see it experiment with other and more ambitious crisis preparedness measures, such as managing resources reserved for crises (e.g., a regranting pool or a reservist workforce) or setting up inter-org standard operating procedures (SOPs) for various crises (e.g., robust communication channels, high-level coordinated response plans).

Which scenarios would the team focus on?

I imagine the team would conduct its own analysis of which scenarios are highest-impact to analyze, but here are some examples of scenarios that seem high-priority to me: 

What could a pilot look like?

I think this project (as most projects) should likely begin with a pilot to gain more information about how valuable this project seems and about the team’s fit for doing this kind of work.

A simple pilot could be to spend ~1 month analyzing and writing a report on one scenario that seems particularly high-priority to the research team. I think this would give the team a decent indication of whether they’re on track to learn useful insights, both based on their own sense and stakeholder feedback. Running a pilot like this would be fairly low-cost (e.g., assuming two people are involved, it would cost 2 months’ FTE in time and ~$16,000 in salaries). 

For a rough outline of what the first year of operations could look like, see this doc.

Some arguments against this proposal

  1. People who could run this project really well may well have better options. The best people to run this project would probably have great strategic thinking, knowledge about AI governance and safety, networks, and ideally experience with scenario analysis, e.g. from government. People with these skills would probably have great opportunities to do more direct AI governance work that might be higher-impact.
  2. Maybe it’s not very tractable to discover important, well-justified new insights given that a fair amount of work has already gone into this kind of strategic thinking.
  3. It seems difficult to convene and get input from some key groups, including subject matter experts (who might be important for analyzing scenarios well) and leaders in the AIXR community (whose buy-in is important to the analysis making a difference). 
  4. There are some possible downside risks, such as the leak of infohazardous or sensitive information, net negative recommendations getting implemented, or taking away focus from more important / more direct AIXR reduction work or from other similar coordination mechanisms. But I think most versions of this project are fairly low-risk. 

Ideal team

Characteristics that it seems especially useful to have on the team (though if you’re interested, it seems worth considering this project even if you only have some of these / to some degree):

Appendix 1: Alternatives I’ve considered and why I ended up favoring this version instead

These are brief descriptions of other versions of this project I considered and a rough indication of why I didn’t end up favoring them over the mainline version described above. 

Note that these reasons are obviously very incomplete and that my judgements here are far from confident. The reasons are not meant to indicate that these projects aren’t worth working on – indeed, I’m excited about many of these alternative projects (in some cases ~as excited about the main idea). I included this section anyway for reasoning transparency purposes; feel free to ask for elaboration on any of these projects/reasons in the comments.

Target audience

  1. Government-facing: Advocate for governments (esp. USG) to prepare for AI crises via standard tools like simulation exercises, developing SOPs, and crisis management planning. 
    1. Main reasons for not pursuing this: Governments might not yet have enough levers to influence AI crises to develop good SOPs + at least in the short term, other policy asks seem higher-priority and advocating for crisis planning might distract from those + as far as I can tell, there’s more existing projects in this space (see below). 
    2. I’m still quite excited about this, though.
  2. Lab-facing: Advocate for AI labs to make plans to effectively deal with crises such as discovering a dangerous capability or having released a dangerous model. 
    1. Main reasons for not pursuing this: Overlap with existing AI governance work on best practices for lab self-governance + difficult to influence AI labs.

Main activities

  1. Forecasting: Forecast and monitor specific risks (e.g. monitor compute clusters or other proxies for capabilities) to give early warning in case of a crisis.
    1. Main reasons for not pursuing this: Seems hard to specify signs of an AI crisis well enough to be able to monitor them effectively. 
    2. I’m still quite excited about this, though.
  2. Reserve capacity: Build up a team of “reservists” to be called upon in crises (or potentially just a pool of funding to be regranted in a crisis). 
    1. Main reasons for not pursuing this: Anecdotally, some projects like this have struggled to get off the ground in the past + you might need to establish a track record to build trust (e.g., via activities like the one in the main proposal) before having enough trust to do this + difficult to establish appropriate crisis triggers + some momentum behind this already (see below). 
    2. I’m still quite excited about this, though.
  3. Planning: Make playbooks of crisis response options for different crises.
    1. Main reasons for not pursuing this: Difficult to predict crises and actions well enough in advance for this to be useful + difficult to influence key actors to use the plans.
  4. Coordination / crisis response infrastructure: Set up coordination mechanisms, crisis decision-making structures, and high-level crisis management plans and secure buy-in from key stakeholders (probably mainly within the AIXR community).
    1. Main reasons for not pursuing this: Seems very difficult to get people to sign up to this kind of coordination. 
  5. Continuity: Securing that key organizations can continue operating during crises, e.g. having good cybersecurity, robust communication channels, etc.
    1. Main reasons for not pursuing this: I didn’t think much about this one.

Other axes of variation

Appendix 2: Existing crisis planning-related projects I’m aware of

Below is a list of existing related projects I’m aware of and very rough descriptions. I’m probably missing some projects (and possibly also important information about some of them)!

Current and upcoming projects:

Past projects:

Acknowledgements

This research is a project of Rethink Priorities. It was written by Marie Davidsen Buhl. Thanks to Akash Wasil, Alex DemarshAma Dragomir, Caroline Jeanmaire, Chana Messinger, Daniel Kokotajlo, Gavin Leech, Gregg Jones, Joe O’Brien, JueYan Zhang, Nuño Sempere, Ross Gruetzemacher and Sammy Martin for helpful input and discussions that informed this proposal (though they don’t necessarily endorse or agree with any points made in this post). Thanks to my manager Ben Snodin for providing helpful feedback on this post. If you like our work, please consider subscribing to our newsletter. You can explore our completed public work here.