Defining alignment research
By richard_ngo @ 2024-08-19T22:49 (+48)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullSummaryBot @ 2024-08-21T18:15 (+1)
Executive summary: The distinction between "alignment research" and "capabilities research" is problematic, and should be replaced with a focus on worst-case scenarios and cognitive understanding of AI systems.
Key points:
- Categorizing research as "alignment" or "capabilities" based on impacts is difficult due to unpredictable effects and disagreements about threat models.
- Most valuable alignment research should focus on worst-case scenarios rather than average performance.
- A scientific, cognitivist approach to understanding AI systems is more useful for alignment than a behaviorist one.
- The author proposes a two-dimensional categorization of AI research based on focus (average-case to worst-case) and approach (engineering to cognitivist science).
- "Alignment research" should refer to work closer to worst-case, cognitivist science, while "capabilities research" refers to average-case engineering.
- This framework may evolve as the field progresses towards a unified science of artificial and biological cognition.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.