My disagreements with "AGI ruin: A List of Lethalities"

By Sharmake @ 2024-09-15T17:22 (+16)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
SummaryBot @ 2024-09-16T16:57 (+1)

Executive summary: The author disagrees with many of Eliezer Yudkowsky's claims about AI alignment being extremely difficult or impossible, arguing that synthetic data, instruction following, and other approaches make alignment more tractable than Yudkowsky suggests.

Key points:

  1. Synthetic data and honeypot traps can help detect and prevent deceptive AI behavior.
  2. Alignment likely generalizes further than capabilities, contrary to Yudkowsky's claims.
  3. Dense reward functions and control over data sources give humans advantages over evolution for shaping AI goals.
  4. Language and visual data can ground AI systems in real-world concepts and values.
  5. Instruction-following may be a viable alternative to formal corrigibility for aligning AI systems.
  6. Many tasks previously thought to require general intelligence can be solved by narrower systems, suggesting transformative AI may not require full superintelligence.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Sharmake @ 2024-09-17T14:50 (+3)

This is mostly correct as a summary of my position, but for point 6, I want to point out while this is technically true, I do fear economic incentives are against this path.

Agree with the rest of the summary though.