AI Takeover Scenario with Scaled LLMs

By simeon_c @ 2023-04-16T23:28 (+29)

Introduction

In this AI takeover scenario, I tried to start from a few technical assumptions and imagine a specific takeover scenario from human-level AI. 

I found this exercise very useful to identify new insights on safety. For instance, I hadn't clearly in mind how rivalries like China vs US and their lack of coordination could be leveraged by instances of a model in order to gain power. I also hadn't identified the fact that as long as we are in the human-level regime of capabilities, copy-pasting weights on some distinct data centers than the initial one is a crucial step for ~any takeover to happen. Which means that securing model weights from the model itself could be an extremely promising strategy if it was tractable. 

Feedback on what is useful in such scenarios is helpful. I have other scenarios like that but feel like most of the benefits for others come from the first one they read, so I might or might not release other ones.

Status: The trade-off on this doc was to either let it as a Google Doc or share it without a lot more work. Several people found it useful so I leaned towards sharing it. Most of this scenario has been written 4 months ago. 

The Story

Phase 1: Training from methods inspired by PaLM, RLHF, RLAIF, ACT

Phase 2: Profit maximization

Phase 3: Hacking & Power Seeking

Phase 4: Anomaly exploration

Phase 5: Coordination Issues

  1. ^

    This part on MCTS and RL working on coding is speculative. The intuition is that MCTS might allow transformers to approximate in one shot some really long processes of reasoning that would take on a basic transformer many inferences to get right. 


Lesterpaintstheworld @ 2023-04-20T09:06 (+1)

"copy-pasting weights on some distinct data centers than the initial one is a crucial step for ~any takeover"

That's possibly no more the case with the avent of "System 2" Autonomous Cognitive Entities. The ACE could "live on a server"  rely on a collection of APIs calls to various LLM.