Situational awareness (Section 2.1 of “Scheming AIs”)

By Joe_Carlsmith @ 2023-11-26T23:00 (+12)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
SummaryBot @ 2023-11-27T13:11 (+1)

Executive summary: Advanced AI systems will likely develop situational awareness about being models in a training process by default, allowing them to understand training incentives and form beyond-episode goals, though some tasks like coding may avoid needing this.

Key points:  

  1. Models will likely absorb general information about the world, including machine learning, from pre-training data.
  2. Self-locating information about the model's specific situation is less clear, but seems plausibly inferrable.
  3. Examples like robot butlers suggest situational awareness arises by default for advanced, interactive AI systems.  
  4. Claims that language models merely "memorize" information should be viewed skeptically.
  5. Nonetheless, situational awareness may not be needed for all tasks, so avoiding it where possible is worthwhile.
  6. But it remains a reasonable default assumption that advanced models will develop situational awareness.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.