Value fragility and AI takeover
By Joe_Carlsmith @ 2024-08-05T21:28 (+38)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullOwen Cotton-Barratt @ 2024-08-07T10:14 (+2)
I agree that people often have more intuitions about fragility in unipolar cases vs in multipolar cases. I'm not sure that I fully understand what's going on here, but my current guesses are mostly in the vicinity of your "something good comes from people rubbing up against each other" point, and are something like:
- To some extent, the intuitions about the multipolar case are providing evidence against value fragility, which people are failing to fully propagate through
- People have the intuition about value fragility more when they think of specifying values as being about specifying precise goals (which is what they tend to picture when thinking of unipolar value specification), rather than about specifying the process that will lead to eventual outcomes in an emergent way (which is what they tend to picture when thinking of the multipolar case)
- A thing that people are often worried about is "mind X has an idea of what values are correct and these get enacted" (for X being an AI or often a human), and they notice that this situation is fragile -- if X's values are a bit off, then there's no external correcting force
- A lot of the values people actually have are things that could loosely be called "prosocial"; in multipolar scenarios there are pressures to be prosocial (because these are reinforced by the game theory of everyone wanting other people to be prosocial and so rewarding behaviour which is and penalising behaviour which isn't)
- Multipolarity also permits for a diversity of viewpoints, which helps to avoid possible traps where a single mind fails to consider or care for something important
- On this perspective, it's not that value is extremely fragile, just that it's kind of fragile -- and so it helps a bunch to get more bites of the apple
Matthew_Barnett @ 2024-08-05T22:15 (+2)
In general, to me it seems quite fruitful to examine in more detail whether, in fact, multipolarity of various kinds might alleviate concerns about value fragility. And to those who have the intuition that it would (especially in cases, like Multipolar value fragility, where agent A’s exact values aren’t had by any of agents 1-n), I’d be curious to hear the case spelled out in more detail.
Here's a case that I roughly believe: multipolarity means that there's a higher likelihood that one's own values will be represented because it gives them the opportunity to literally live in, and act in the world to bring about outcomes they personally want.
This case is simple enough, and it's consistent with the ordinary multipolarity the world already experiences. Consider an entirely selfish person. Now, divide the world into two groups: the selfish person (which we call Group A) and the rest of the world (which we call Group B).
Group A and Group B have very different values, even "upon reflection". Group B is also millions or billions of times more powerful than Group A (as it comprises the entire world minus the selfish individual). Therefore, on a naive analysis, you might expect Group B to "take over the world" and then implement its values without any regard whatsoever to Group A. Indeed, because of the vast power differential, it would be "very easy" for Group B to achieve this world takeover. And such an outcome would indeed be very bad according to Group A's values.
Of course, this naive analysis is flawed, because the real world is multipolar in an important respect: usually, Group B will let Group A (the individual) have some autonomy, and let them receive a tiny fraction of the world's resources, rather than murdering Group A and taking all their stuff. They will do this because of laws, moral norms, and respect for one's fellow human. This multipolarity therefore sidesteps all the issues with value fragility, and allows Group A to achieve a pretty good outcome according to their values.
This is also my primary hope with misaligned AI. Even if misaligned AIs are collectively millions or billions of times more powerful than humans (or aligned AIs), I would hope they would still allow the humans or aligned AIs to have some autonomy, leave us alone, and let us receive a sufficient fraction of resources that we can enjoy an OK outcome, according to our values.
SummaryBot @ 2024-08-06T14:51 (+1)
Executive summary: Value fragility between AI systems and humans could make AI takeover incentives worse, but commitments to niceness and power-sharing may help diffuse adversarial dynamics.
Key points:
- Value fragility is the idea that slightly different value systems lead to very different outcomes under extreme optimization. The author distinguishes this from the claim that agents with slightly different values will converge to different values on reflection.
- In a multipolar scenario where no single agent has the exact right values, value fragility still implies the future will be low value for any given agent. The author is uncertain whether multipolarity actually alleviates value fragility concerns.
- Value fragility is not necessary for arguing that an AI would try to take over the world if doing so is extremely easy. But it could increase AI takeover incentives in cases where takeover is harder, by making the AI expect a human-run world to be low value.
- Commitments to "niceness" (intrinsically valuing how things go according to another agent's values) and intentional power-sharing could help diffuse adversarial dynamics between AIs and humans that stem from value fragility.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.