Thoughts on the AI Safety Summit company policy requests and responses

By So8res @ 2023-10-31T23:54 (+42)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
vaniver @ 2023-11-01T00:37 (+12)

(I'm Matthew Gray)

Inflection is a late addition to the list, so Matt and I won’t be reviewing their AI Safety Policy here.

My sense from reading Inflection's response now is that they say the right things about red teaming and security and so on, but I am pretty worried about their basic plan / they don't seem to be grappling with the risks specific to their approach at all. Quoting from them in two different sections:

Inflection’s mission is to build a personal artificial intelligence (AI) for everyone. That means an AI that is a trusted partner: an advisor, companion, teacher, coach, and assistant rolled into one.

Internally, Inflection believes that personal AIs can serve as empathetic companions that help people grow intellectually and emotionally over a period of years or even decades.** Doing this well requires an understanding of the opportunities and risks that is grounded in long-standing research in the fields of psychology and sociology.** We are presently building our internal research team on these issues, and will be releasing our research on these topics as we enter 2024.

I think AIs thinking specifically about human psychology--and how to convince people to change their thoughts and behaviors--are very dual use (i.e. can be used for both positive and negative ends) and at high risk for evading oversight and going rogue. The potential for deceptive alignment seems quite high, and if Inflection is planning on doing any research on those risks or mitigation efforts specific to that, it doesn't seem to have shown up in their response.

I don't think this type of AI is very useful for closing the acute risk window, and so probably shouldn't be made until much later.

SummaryBot @ 2023-11-01T12:40 (+1)

Executive summary: The post provides thoughts on AI safety policies requested from AI labs by the UK government. It argues the policies are inadequate but some labs like Anthropic and OpenAI are relatively better. It suggests alternative priorities like compute limits, risk assessments, and contingency planning.

Key points:

  1. The UK government's policy categories seem reasonable but miss key issues like independent risk assessments and contingency planning.
  2. Current AI systems pose unacceptable risks; progress should halt until risks are addressed. But policies help labs acknowledge risks.
  3. Anthropic and OpenAI's policies seem best, taking risks more seriously. DeepMind's is much worse. Meta's is far worse.
  4. Governments should also institute compute limits, monitor chips, halt chip progress, require risk assessments, and develop contingency plans.
  5. Independent risk assessments from actuaries could help determine which labs can continue operating.
  6. If risks appear unaddressable before wide availability, governments need a plan for that scenario now.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Oliver Sourbut @ 2023-11-01T08:49 (+1)

Another high(er?) priority for governments: