[linkpost] Christiano on agreement/disagreement with Yudkowsky's "List of Lethalities"

By Owen Cotton-Barratt @ 2022-06-19T22:47 (+130)

This is a linkpost to https://www.lesswrong.com/posts/CoZhXrhpQxpy9xw9y/where-i-agree-and-disagree-with-eliezer

Eliezer Yudkowsky's recent post on AGI ruin seems to have sparked a good amount of thinking and discussion (e.g. in the comments there; on LessWrong; in this post from today). I'm glad this is happening. This is a link post for Paul Christiano's response, which is I think worth reading for anyone following the discussion.

Personally I like Christiano's response a good bit. More than anything else I've read in the vicinity I find myself nodding along and thinking "yeah that's a good articulation of my feelings on this". I think it's interesting that it lacks the rhetorical oomph of Yudkowsky's post; on the whole I'm into the rhetorical oomph for pulling attention onto a cluster of topics which I think are super important, although I feel a bit sad that the piece which is most attention-pulling doesn't seem to be the one which is most truth-tracking.

Shadbolt @ 2022-06-23T18:32 (+18)

After reading this linkpost, I’ve updated toward thinking that there’s actually more agreement between Yudkowsky and Christiano than I thought there was. In summary, they seem to agree on:

-AI systems are being developed that could deliberately and irreversibly disempower humanity. These systems could exist soon, and there won’t necessarily be a "fire alarm."

-Many of the projects intended to help with AI alignment aren't making progress on key difficulties, and don’t really address the “real” problem of reducing risk of catastrophic outcomes.

-AI risk seems especially hard to communicate, because people want to either hear that everything is fine or that the world is ending. See need for closure / ambiguity aversion. The truth is much more confusing, and human minds have a tough time dealing with this.

These points of agreement might seem trivial, but at least Yudkowsky and Christiano are, in my opinion, speaking the same language, i.e., not talking past each other like what appeared to be going on in the ‘08 Hanson-Yudkowsky AI Foom debate.