Does AI risk “other” the AIs?

By Joe_Carlsmith @ 2024-01-09T17:51 (+22)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
Matthew_Barnett @ 2024-01-10T22:26 (+11)

(Cross-posted from LW)

I think there's an additional element of Hanson's argument that is both likely true and important, and as far as I can tell unaddressed in your post. When Hanson talks about "othering" AIs, he's often talking about the stuff you mentioned — projecting a propensity to do bad things onto the AIs — but he's also saying that future AIs won't necessarily form a natural, unified coalition against us. In other words, he's rejecting a form of out-group homogeneity used to portray AIs.

As an analogy, among humans, the class of "smart people" are not a natural coalition, even though they could in-principle get together and defeat all of the non-smart people in a one-on-one fight. Why don't smart people do that? Well, one reason is that smart people don't usually see themselves as being part of a coherent identity that includes all the other smart people. Poetically, there isn't much class consciousness among smart people as a unified group. They have diverse motives and interests that wouldn't be furthered much by attempting to join such a front. The argument Hanson makes is that AIs will also not form a natural, unified front against humans in the same sense. The relevant boundaries in future conflicts over power will likely be drawn across other lines.

The idea that AIs won't form a natural coalition has a variety of implications for the standard AI risk arguments. Most notably, it undermines the single-agent model that underlies many takeover stories and arguments for risk. More specifically, if AIs won't form a natural coalition, then,

In my opinion, these examples only scratch the surface of the ways in which your story of AI might depart from the classic AI risk analysis if you don't think AIs will form a natural, unified coalition. When you start to read standard AI risk stories (including from people like Ajeya who do not agree with Eliezer on a ton of things), you can often find the assumption that "AIs will form a natural, unified coalition" written all over it.

Nick K. @ 2024-01-11T16:29 (+1)

I'm just noting that you are assuming that we have many robustly aligned AI's, in which case I agree that take-over seems less likely.

Absent this assumption, I don't think that "AIs will form a natural, unified coalition" is the necessary outcome, but it seems reasonable that the other outcomes will look functionally the same for us.

Vasco Grilo @ 2024-01-14T15:27 (+2)

Nice post, Joe.

For two: clearly some sorts of otherness warrant some sorts of fear. For example: maybe you, personally, don't like to murder. But Bob, well: Bob is different. If Bob gets a bunch of power, then: yep, it's OK to hold your babies close. And often OK, too, to try to "control" Bob into not-killing-your-babies. Cf, also, the discussion of getting-eaten-by-bears in the first essay. And the Nazis, too, were different in their own way. Of course, there's a long and ongoing history of mistaking "different" for "the type of different that wants to kill your babies." We should, indeed, be very wary. But liberal tolerance has never been a blank check; and not all fear is hatred.

I think the strength of this point may be misleading. The word "murder" usually has a negative connotation in language, so if one hears that "a murderer got lots of power", then one should reasonably expect more negative stuff. However, although killing humans is almost always bad, it is not necessarily bad. For example, I think it would be totally fine to kill a terrorist to prevent the death of e.g. 1 M people. To assess the value of a given action, we have to ask what is better from the point of view of the universe. Once one does that, AI killing all humans is no longer necessarily bad, but it will also not be good for free.