Defining alignment research

By richard_ngo @ 2024-08-19T22:49 (+48)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
SummaryBot @ 2024-08-21T18:15 (+1)

Executive summary: The distinction between "alignment research" and "capabilities research" is problematic, and should be replaced with a focus on worst-case scenarios and cognitive understanding of AI systems.

Key points:

  1. Categorizing research as "alignment" or "capabilities" based on impacts is difficult due to unpredictable effects and disagreements about threat models.
  2. Most valuable alignment research should focus on worst-case scenarios rather than average performance.
  3. A scientific, cognitivist approach to understanding AI systems is more useful for alignment than a behaviorist one.
  4. The author proposes a two-dimensional categorization of AI research based on focus (average-case to worst-case) and approach (engineering to cognitivist science).
  5. "Alignment research" should refer to work closer to worst-case, cognitivist science, while "capabilities research" refers to average-case engineering.
  6. This framework may evolve as the field progresses towards a unified science of artificial and biological cognition.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.