Are we confident that superintelligent artificial intelligence disempowering humans would be bad?

By Vasco Grilo🔸 @ 2023-06-10T09:24 (+24)

I think it is almost always assumed that superintelligent artificial intelligence (SAI) disempowering humans would be bad, but are we confident about that? Is this an under-discussed crucial consideration?

Most people (including me) would prefer the extinction of a random species to that of humans. I suppose this is mostly due to a desire for self-preservation, but can also be justified on altruistic grounds if humans have a greater ability to shape the future for the better. However, a priori, would it be reasonable to assume that more intelligent agents would do better than humans, at least under moral realism? If not, can one be confident that humans would do better than other species?

From the point of view of the universe, I believe one should strive to align SAI with impartial value, not human value. It is unclear to me how much these differ, but one should beware of surprising and suspicious convergence.

In any case, I do not think this shift in focus means humanity should accelerate AI progress (as proposed by effective accelerationism?). Intuitively, aligning SAI with impartial value is a harder problem, and therefore needs even more time to be solved.

quinn @ 2023-06-10T14:48 (+8)

Historically the field has been focused on impartiality/cosmopolitanism/pluralism, and there's a rough consensus that "human value is a decent bet at proxying impartial value, with respect to the distribution of possible values" that falls from fragility/complexity of value. I.e., many of us suspect that embracing a random draw from the distribution over utility functions leads to worse performance at impartiality than human values.

I do recommend "try to characterize a good successor criterion" (as per Paul's framing of the problem) as an exercise, I found thinking through it very rewarding. I've definitely taken seriously thought processes like "stop being tribalist, your values aren't better than paperclips", so I feel like I'm thinking clearly about the class of mistakes the latest crop of accelerationists may be making.

Even though we're vaguing over any particular specification language that expresses values and so on, I suspect that a view of this based on descriptive complexity is robust to moral uncertainty, at the very least to perturbation in choice of utilitarianism flavor.

Vasco Grilo @ 2023-06-10T16:47 (+2)

Thanks, quinn!

Historically the field has been focused on impartiality/cosmopolitanism/pluralism, and there's a rough consensus that "human value is a decent bet at proxying impartial value

Could you provide evidence for these?

there's a rough consensus that "human value is a decent bet at proxying impartial value, with respect to the distribution of possible values" that falls from fragility/complexity of value

I am not familiar with the posts you link to, but it looks like they focus on human values (emphasis mine):

Complexity of value is the thesis that human values have high Kolmogorov complexity; that our preferences, the things we care about, cannot be summed by a few simple rules, or compressed. Fragility of value is the thesis that losing even a small part of the rules that make up our values could lead to results that most of us would now consider as unacceptable (just like dialing nine out of ten phone digits correctly does not connect you to a person 90% similar to your friend).

Most people would consider a universe filled with paperclips pretty bad, but I think it can plausibly be good as long as the superintelligent AI is having a good time turning everything into paperclips.

I do recommend "try to characterize a good successor criterion" (as per Paul's framing of the problem) as an exercise, I found thinking through it very rewarding. I've definitely taken seriously thought processes like "stop being tribalist, your values aren't better than paperclips", so I feel like I'm thinking clearly about the class of mistakes the latest crop of accelerationists may be making.

Thanks for sharing! Paul concludes that:

Even if we knew how to build an unaligned AI that is probably a good successor, I still think we should strongly prefer to build aligned AGI. The basic reason is option value: if we build an aligned AGI, we keep all of our options open, and can spend more time thinking before making any irreversible decision.

Why would an unaligned AI have lower option value than an aligned one? I agree with Paul that:

Overall, I think the question “which AIs are good successors?” is both neglected and time-sensitive, and is my best guess for the highest impact question in moral philosophy right now [25th May 2018].

quinn @ 2023-06-12T15:38 (+4)

cosmopolitanism: I have a roundup of links here. I think your concerns are best discussed in the Arbital article on generalized cosmopolitanism:

"We are cosmopolitans! We also grew up reading science fiction about aliens that turned out to have their own perspectives, and AIs willing to extend a hand in friendship but being mistreated by carbon chaunivists! We'd be fine with a weird and wonderful intergalactic civilization full of non-organic beings appreciating their own daily life in ways we wouldn't understand. But paperclip maximizers don't do that! We predict that if you got to see the use a paperclip maximizer would make of the cosmic endowment, if you really understood what was going on inside that universe, you'd be as horrified as we are. You and I have a difference of empirical predictions about the consequences of running a paperclip maximizer, not a values difference about how far to widen the circle of concern."

Re the fragility/complexity link:

I am not familiar with the posts you link to, but it looks like they focus on human values (emphasis mine):

My view after reading is that "human" is a shorthand that isn't doubled down on throughout. Fun theory especially characterizes a sense of what's at stake when we have values that can at least entertain the idea of pluralism (as opposed to values that don't hesitate to create extreme lock-in scenarios), and "human value" is sort of a first term approximate proxy of a detailed understanding of that.

Most people would consider a universe filled with paperclips pretty bad, but I think it can plausibly be good as long as the superintelligent AI is having a good time turning everything into paperclips.

This is a niche and extreme flavor of utilitarianism and I wouldn't expect it's conclusions to be robust to moral uncertainty.

But it's a nice question that identifies cruxes in metaethics.

Why would an unaligned AI have lower option value than an aligned one?

I think this is just a combination of taking lock-in actions that we can't undo seriously along with forecasts about how aggressively a random draw from values can be expected to lock in.

Vasco Grilo @ 2023-06-12T18:03 (+2)

Thanks!

I think this is just a combination of taking lock-in actions that we can't undo seriously along with forecasts about how aggressively a random draw from values can be expected to lock in.

I have just remembered the post AGI and lock-in is relevant to this discussion.

quinn @ 2023-06-10T19:48 (+4)

Busy, will come back over the next few days.

rime @ 2023-06-10T18:50 (+7)

Here's my just-so story for how humans evolved impartial altruism by going through several particular steps:

First there was kin selection evolving for particular reasons related to how DNA is passed on. This selects for the precursors to altruism.
With ability to recognise individual characteristics and a long-term memory allowing you to keep track of them, species can evolve stable pairwise reputations.
This allows reciprocity to evolve on top of kin selection, because reputations allow you to keep track of who's likely to reciprocate vs defect.
More advanced communication allows larger groups to rapidly synchronise reputations. Precursors of this include "eavesdropping", "triadic awareness",^[1] all the way up to what we know as "gossip".
This leads to indirect reciprocity. So when you cheat one person, it affects everybody's willingness to trade with you.
There's some kind of inertia to the proxies human brains generalise on. This seems to be a combination of memetic evolution plus specific facts about how brains generalise very fast.
If altruistic reputation is a stable proxy for long enough, the meme stays in social equilibrium even past the point where it benefits individual genetic fitness.
In sum, I think impartial altruism (e.g. EA) is the result of "overgeneralising" the notion of indirect reciprocity, such that you end up wanting to help everybody everywhere.^[2] And I'm skeptical a randomly drawn AI will meet the same requirements for that to happen to them.

^{^}
"White-faced capuchin monkeys show triadic awareness in their choice of allies":
"...contestants preferentially solicited prospective coalition partners that (1) were dominant to their opponents, and (2) had better social relationships (higher ratios of affiliative/cooperative interactions to agonistic interactions) with themselves than with their opponents."
You can get allies by being nice, but not unless you're also dominant.
^{^}
For me, it's not primarily about human values. It's about altruistic values. Whatever anything cares about, I care about that in proportion to how much they care about it.

Vasco Grilo @ 2023-06-10T19:20 (+4)

Thanks for that story!

I'm skeptical a randomly drawn AI will meet the same requirements for that to happen to them.

I think 100 % alignement with human values would be better than random values, but superintelligent AI would presumably be trained on human data, so it would be aligned with human values somewhat. I also wonder about the extent to which the values of the superintelligent AI could change, hopefully for the better (as human values have).

rime @ 2023-06-10T19:46 (+3)

I also have specific just-so stories for why human values have changed for "moral circle expansion" over time, and I'm not optimistic that process will continue indefinitely unless intervened on.

Anyway, these are important questions!

EdoArad @ 2023-06-10T14:39 (+4)

The first couple of paragraphs made me think that you consider the claim "SAI will be good by default", rather than "our goal should be to install good values in our SAI rather than to focus on aligning it with human values (which may be less good)". Is this a good reformulation of the question?

Vasco Grilo @ 2023-06-10T14:49 (+2)

Thanks for the comment, Edo!

I guess the rise of humans has been good to the universe, but disempowered other animal species. So my intuition points towards SAI being good to the universe, even if it disempowers humans by default. However, I do not really know.

dr_s @ 2023-11-23T19:11 (+3)

good to the universe

Good in what sense? I don't really buy the moral realist perspective - I don't see where could this "real morality" possibly come from. But on top of that, I think we all agree that disempowerment, genocide, slavery, etc. are bad; we also frown upon our own disempowerment of non-human animals. So there are two options:

either the "real morality" is completely different and alien from the morality we humans currently tend to aspire to as an ideal (at least in this corner of the world and of the internet), or
a SAI that would disempower us despite being vastly more smart and capable and having no need to would be pretty evil.

If it's 1, then I'm even more curious to know what this real morality is, why should I care, or why would the SAI understand it while we seem to be drifting further away from it. And if it's 2, then obviously unleashing an evil SAI on the universe is a horrible thing to do, not just for us, but for everyone else in our future light cone. Either way, I don't see a path to "maybe we should let SAI disempower us because it's the greater good". Any sufficiently nice SAI would understand well enough we don't want to be disempowered.

Vasco Grilo @ 2024-01-14T16:02 (+2)

From Nate Soares' post Cosmopolitan values don't come free:

Short version: if the future is filled with weird artificial and/or alien minds having their own sort of fun in weird ways that I might struggle to understand with my puny meat-brain, then I'd consider that a win. When I say that I expect AI to destroy everything we value, I'm not saying that the future is only bright if humans-in-particular are doing human-specific things. I'm saying that I expect AIs to make the future bleak and desolate, and lacking in fun or wonder of any sort^[1].

Vasco Grilo @ 2023-06-19T09:38 (+2)

Paul Christiano discussed the value of unaligned AI on The 80,000 Hours Podcast (in October 2018). Pretty interesting!