GovAI: Towards best practices in AGI safety and governance: A survey of expert opinion

By Zach Stein-Perlman @ 2023-05-15T01:42 (+68)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
Lizka @ 2023-05-26T11:03 (+4)

I'm surprised at how much agreement there is about the top ideas. The following ideas all got >70% "Strongly agree" and at most "3%" "strong disagree" (note that not everyone answered each question, although most of these 14 did have all 51 responses):

  1. Pre-deployment risk assessment. AGI labs should take extensive measures to identify, analyze, and evaluate risks from powerful models before deploying them.
  2. Dangerous capability evaluations. AGI labs should run evaluations to assess their models’ dangerous capabilities (e.g. misuse potential, ability to manipulate, and power-seeking behavior).
  3. Third-party model audits. AGI labs should commission third-party model audits before deploying powerful models.
  4. Safety restrictions. AGI labs should establish appropriate safety restrictions for powerful models after deployment (e.g. restrictions on who can use the model, how they can use the model, and whether the model can access the internet).
  5. Red teaming. AGI labs should commission external red teams before deploying powerful models.
  6. Monitor systems and their uses. AGI labs should closely monitor deployed systems, including how they are used and what impact they have on society.
  7. Alignment techniques. AGI labs should implement state-of-the-art safety and alignment techniques.
  8. Security incident response plan. AGI labs should have a plan for how they respond to security incidents (e.g. cyberattacks).
  9. Post-deployment evaluations. AGI labs should continually evaluate models for dangerous capabilities after deployment, taking into account new information about the model’s capabilities and how it is being used.
  10. Report safety incidents. AGI labs should report accidents and near misses to appropriate state actors and other AGI labs (e.g. via an AI incident database).
  11. Safety vs capabilities. A significant fraction of employees of AGI labs should work on enhancing model safety and alignment rather than capabilities.
  12. Internal review before publication. Before publishing research, AGI labs should conduct an internal review to assess potential harms.
  13. Pre-training risk assessment. AGI labs should conduct a risk assessment before training powerful models.
  14. Emergency response plan. AGI labs should have and practice implementing an emergency response plan. This might include switching off systems, overriding their outputs, or restricting access.

The ideas that had the most disagreement seem to be: 

And

 

 (Ideas copied from here — thanks!)

Lizka @ 2023-05-26T10:41 (+4)

In case someone finds it interesting, Jonas Schuett (one of the authors) shared a thread about this: https://twitter.com/jonasschuett/status/1658025266541654019?s=20 

He says that the thread is to discuss the survey's:

 

Also, there's a nice graphic from the paper in the thread: 

Image
Lizka @ 2023-05-26T11:10 (+2)

I also find the following chart interesting (although I think none of this is significant) — particularly the fact that pausing training of dangerous models and security standards have more agreement from people who aren't in AGI labs, and (at a glance):

got more agreement from people who are in labs (in general, apparently "experts from AGI labs had higher average agreement with statements than respondents from academia or civil society").

Note that 43.9% of respondents (22 people?) are from AGI labs.