Possible Divergence in AGI Risk Tolerance between Selfish and Altruistic agents

By Brad West🔸 @ 2023-09-09T00:22 (+11)

TLDR: There is a conflict of interest between the interests of the currently living and the entirety of future generations with regarding AGI because the prospective benefits AGI could provide in the upsides (such as extreme longevity and happiness) may be worth the risk of death at much higher likelihoods than would be the case if you were integrating the interests of all future generations (slight acceleration of AGI possible benefits almost never being worth risks of extinction).

It strikes me that there may be a conflict of interest between the interests of a living set of generations of humans and the entirety of future of humans regarding the cost-benefit analysis associated with ushering in AGI. For the purposes of this post, I will refer to the set of values reflecting the self-interest of an individual agent as “selfish” and the set of values reflecting the interests of all future generations in addition to the current as “altruistic.”

Let us consider a world in which the creation of AGI entails X-risks (risks of extinction) that decrease over time with development of alignment capabilities. In this world, in the event that an X-risk does not materialize, AGI would also be likely to able to cure aging and allow for lives of joy beyond imagination for the current generation and all future generations.

So, consider a situation in which one could activate AGI today with an X-risk of 50% or it would be released in 70 years with an X-risk of 1%. To an agent who is 55 years old, the coin-flip may make sense, from a selfish perspective: there is a 50% chance of living millennia in bliss and a 50% chance of dying vs. a very high chance of living 15-50 years longer. From a strictly selfish perspective, this gamble might make sense. Conversely, from an altruistic perspective, such a coin flip would be abhorrent: you are risking quadrillions upon quadrillions of blissful future lives for the benefit of yourself and the relatively small cohort who could benefit from the acceleration.

Of course, the addition of S-Risks (risks of creating worlds of immense suffering) could change the selfish equation as a selfish agent may view significant risks of subjecting him or herself to hell to be unacceptable. But agents may believe themselves capable of identifying an S-Risk emergence and capable of committing suicide (translating a personal S-Risk into an X-Risk).

I think this potential for huge risk-tolerance divergence from egoists vs utilitarians and others who care about future generations is worth noting as we consider policies for Artificial General Intelligence. We probably want to make sure that institutions involved appropriately factor in the interests of future generations, because gambling with our own lives, given the possible benefits, may make selfish sense to some.