Compiling resources comparing AI misuse, misalignment, and incompetence risk and tractability

By Peter4444 @ 2022-05-05T16:16 (+3)

Status: haven't spent long thinking about

I think it would be useful to have a rough idea of what experts think about the relative AI x-risk from misalignment (AI doesn't try do what we want), misuse (AI does what someone - say, a hostile or incompetent actor - wants), or incompetence (AI tries to do what we want, but fails - e.g. because it's unreliable or doesn't understand humans well), and also of how tractable work on each of these would be at reducing x-risk. These categories are from Paul Christiano's 'Current work in AI alignment' talk. Obviously, these are extremely difficult questions and we won't get highly robust estimates, but I think something is better than nothing here.

I think (but not with high confidence) that incompetence seems much less risky than misalignment or misuse:

But I find it hard to know where to start when comparing misalignment and misuse. My first attempt was to find existing work. I found

It's quite possible I'm missing resources, so this is a call for resources, which could eventually be compiled into a document.

I'm not sure this is the most important set of resources to compile. But besides being of interest to me, I think it could be useful for anyone comparing working in AI governance or alignment, for whom personal fit considerations didn't dominate.


rohinmshah @ 2022-05-05T18:11 (+5)

I think this mostly hasn't been done, but here's one survey that finds large disagreement.

Peter4444 @ 2022-05-05T18:30 (+1)

Thanks. Yes, that was the survey I mentioned.