Ryan Greenblatt's Quick takes

By Ryan Greenblatt @ 2024-02-03T17:55 (+3)

null
Ryan Greenblatt @ 2024-02-03T17:55 (+12)

Reducing the probability that AI takeover involves violent conflict seems leveraged for reducing near-term harm

Often in discussions of AI x-safety, people seem to assume that misaligned AI takeover will result in extinction. However, I think AI takeover is reasonably likely to not cause extinction due to the misaligned AI(s) effectively putting a small amount of weight on the preferences of currently alive humans. Some reasons for this are discussed here. Of course, misaligned AI takeover still seems existentially bad and probably eliminates a high fraction of future value from a longtermist perspective.

(In this post when I use the term “misaligned AI takeover”, I mean misaligned AIs acquiring most of the influence and power over the future. This could include “takeover” via entirely legal means, e.g., misaligned AIs being granted some notion of personhood and property rights and then becoming extremely wealthy.)

However, even if AIs effectively put a bit of weight on the preferences of current humans it's possible that large numbers of humans die due to violent conflict between a misaligned AI faction (likely including some humans) and existing human power structures. In particular, it might be that killing large numbers of humans (possibly as collateral damage) makes it easier for the misaligned AI faction to take over. By large numbers of deaths, I mean over hundreds of millions dead, possibly billions.

But, it's somewhat unclear whether violent conflict will be the best route to power for misaligned AIs and this also might be possible to influence. See also here for more discussion.

So while one approach to avoid violent AI takeover is to just avoid AI takeover, it might also be possible to just reduce the probability that AI takeover involves violent conflict. That said, the direct effects of interventions to reduce the probability of violence don't clearly matter from an x-risk/longtermist perspective (which might explain why there hasn't historically been much effort here).

(However, I think trying to establish contracts and deals with AIs could be pretty good from a longtermist perspective in the case where AIs don't have fully linear returns to resources. Also, generally reducing conflict seems maybe slightly good from a longtermist perspective.)

So how could we avoid violent conflict conditional on misaligned AI takeover? There are a few hopes:

I'm pretty unsure about what the approaches here look best or are even at all tractable.  (It’s possible that some prior work targeted at reducing conflict from the perspective of S-risk could be somewhat applicable.)

Separately, this requires that the AI puts at least a bit of weight on the preferences of current humans (and isn't spiteful), but this seems like a mostly separate angle and it seems like there aren't many interventions here which aren't covered by current alignment efforts. Also, I think this is reasonably likely by default due to reasons discussed in the linked comment above. (The remaining interventions which aren’t covered by current alignment efforts might relate to decision theory (and acausal trade or simulation considerations), informing the AI about moral uncertainty, and ensuring the misaligned AI faction is importantly dependent on humans.)

Returning back to the topic of reducing violence given a small weight on the preferences of current humans, I'm currently most excited about approaches which involve making negotiation between humans and AIs more likely to happen and more likely to succeed (without sacrificing the long run potential of humanity).

A key difficulty here is that AIs might have a first mover advantage and getting in a powerful first strike without tipping its hand might be extremely useful for the AI. See here for more discussion (also linked above). Thus, negotiation might look relatively bad to the AI from this perspective.

We could try to have a negotiation process which is kept secret from the rest of the world or we could try to have preexisting commitments upon which we'd yield large fractions of control to AIs (effectively proxy conflicts).

More weakly, just making negotiation at all seem like a possibility, might be quite useful.

I’m unlikely to spend much if any time working on this topic, but I think this topic probably deserves further investigation.

Ryan Greenblatt @ 2024-02-09T19:00 (+2)

After thinking about this somewhat more, I don't really have any good proposals, so this seems less promising than I was expecting.

harfe @ 2024-02-03T23:37 (+1)

In the context of a misaligned AI takeover, making negotiations and contracts with a misaligned AI in order to allow it to take over does not seem useful to me at all.

A misaligned AI that is in power could simply decide to walk back any promises and ignore any contracts it agreed to. Humans cannot do anything about it because they lost all the power at that point.

Ryan Greenblatt @ 2024-02-04T00:59 (+2)

Some objections:

- Building better contract enforcement ability might be doable. (Though pretty tricky on enforcing humans to do things and on forcing AIs to do things.)
- Negotation could involve tit-for-tat arrangements.
- Unconditional surrenders can reduce the need for violence if we ensure there is some process for demonstrating that the AI would have succeeded at takeover with very high probabilty. (Note that I'm assuming the AI puts some small amount of weight on the preferences of currently alive humans as I discuss in the parent.)