The Governance Problem and the "Pretty Good" X-Risk

By Zach Stein-Perlman @ 2021-08-28T20:00 (+23)

Crossposted from LessWrong.

No longer endorsed.

Like a child suddenly given awesome strength, I could have pushed too hard, and left the world a broken toy I could never repair. —The Hero of Ages

I hope we create superintelligence, and I hope it does what we want. But this would not suffice to make the future great. For a great future from superintelligence, a third condition is necessary: what we want is great. This post is about what the controllers of a superintelligence will want and how we can improve that.

Epistemic status: confident about the problem, not confident about specific failure modes or solutions.

I. Introduction

If it were available to them, I think current human elites (or almost any other group of humans) would overwhelmingly choose this bargain over the status quo:

Earthly Utopia. Human civilization will last on Earth for ten billion years. During that time, almost every person's life will be better than the best life ever lived before now. Earth will be a place of great learning and discovery, personal excellence and achievement, expansive freedom, extraordinary beauty, wonderful experiences, deep relationships, harmony with nature, meaningful projects, and profound joy. However, we may never use resources outside our solar system.

It may be that more intelligent, thoughtful, and wise versions of ourselves would take such a bargain. But more likely, I think, our enlightened counterparts would regard it as the worst mistake and the greatest catastrophe ever. If an astronomical number of wonderful planets is astronomically better than one, we have a problem:

Our decisionmakers and decisionmaking institutions would take a terrible bargain because they neglect nontraditional sources of value (viz., optimizing the rest of the universe) that happen to be overwhelmingly important (in expectation). This is not surprising — these same processes egregiously fail to appreciate the value or risk of AI. So if we solve alignment and create superintelligence, it will not just take care of the rest. It's not enough to have nice people control the superintelligence. To adapt from The Rocket Alignment Problem:

Superintelligence may be developed by ill-intentioned people. But that's not the failure mode I'm worried about right now. I'm more worried that right now, even nice and generous potential-controllers-of-superintelligence would want it to do non-optimal things. Whether Google or the United States or North Korea is the one to develop superintelligence won't make a major difference to the probability of an existential win from my perspective, because right now potential controllers of superintelligence don't want what's optimal.

Aligned superintelligence is good to the extent that the operator wants good things. Longtermists/EAs/rationalists worry a lot about alignment, but little about aligning the eventual operator's instructions with what is best. I aim to investigate what could make us succeed or fail and explain why we should care a lot about this problem, which I call the governance problem since I expect governance issues to be vital.^[1] If we survive and create powerful aligned AI, I believe it is almost certain that the future will be wonderful by provincial human standards. But most such scenarios are still existential catastrophes.

II. Good Uses of Superintelligence

Some uses of superintelligence would be near-optimal.^[2] Call such uses "great," and call a possible future "great" if it involves great use of superintelligence. Under reasonable normative and empirical assumptions, whether a future is great depends overwhelmingly on how we use most of the matter available to us in the universe in the long run, not on the short-term future or the future of Earth.^[3] I make such assumptions here, although some of my conclusions hold without them.

A final use of superintelligence would be great if and only if we near-perfectly:

Precisely specify the object-level things the superintelligence should do,
Tell it what to optimize for,
Tell it how to choose what to optimize for, or
Etc.^[4]

Fortunately, we need not make a final decision—one that closes off most possible futures—immediately after creating superintelligence. The immediate problem of what to do with superintelligence is not how to optimize the universe but how to decide how to optimize the universe. A use of superintelligence is great if we:

Have the superintelligence do something great as a final use; i.e., do something on the previous list (level 0),
Delegate to a system that chooses and effects a level 0 system, near-certainly and without wasting much time/resources (level 1),
Delegate to a system that delegates to a level 1 system, near-certainly and with little waste (level 2), or
Etc.^[5]

Some prima facie great systems for choosing how to use superintelligence are:

Well-designed long reflection, likely involving intelligence amplification, using the superintelligence to understand what various possible futures would look like across many dimensions, and receiving and evaluating arguments from the superintelligence.
Well-designed indirect normativity.
Any system that would delegate to one of the above systems (near-certainty and with little waste).

But I won't speculate here on the details of great systems. Instead, I'll consider what affects the system that we will end up with.

III. Short-Term Issues

We will presumably be able to choose what precisely to do with aligned superintelligence after we create it (rather than the use being determined before or as we create the superintelligence). But what happens before superintelligence affects what we ultimately choose. Our short-term actions affect:

What kind of agent will control the superintelligence
How the controller will think about how to use it and the options that the controller will see as possible, wise, moral, legitimate, dangerous, reckless, etc.
The formal and informal constraints on the use of superintelligence

For example, a poorly-designed "AI constitution," whether hard (truly binding) or soft (with political and psychological power), would be bad. A well-designed one would be good.

I believe it is very likely that the controller of superintelligence will be a state, a group of states, or a new international organization. In particular, I expect that states will appreciate AI before we create superintelligence and will nationalize or oversee promising projects.

The organizations and ideas with power just before superintelligence will determine how we use it. I expect that at the beginning of a fast-takeoff intelligence explosion, how we use superintelligence will be predictable; it will appear very likely or very unlikely that we will use it well. So while the failures I will discuss manifest after superintelligence, and while the period during and directly after the intelligence explosion may be important, I think whether we succeed on governance will be mostly determined before the intelligence explosion.^[6]

IV. Unipolar Failure Modes

A world is "unipolar" if it is dominated by a single agent (such as an organization, human, or AI), called a singleton. A unipolar world order could arise if multiple powerful organizations (presumably states) unite. More plausibly, an organization could form a singleton if it becomes sufficiently powerful relative to others, such as by creating powerful aligned AI as the result of an intelligence explosion.

Suppose a single aligned superintelligence is much more powerful than the rest of the world, and its controller is a singleton, making decisions for the whole world. What does the controller do with this power? It depends on the controller's values and decisionmaking structure — we should expect different choices depending on the nature of the controller (individual, state, coalition of states, international organization), its decisionmaking structure, and popular ideas about AI and what to do with it. Brainstorming potential failure modes, examples or subcases indented (in no particular order):

Flawed benevolence. The singleton tries to do good (with direct or indirect normativity) but chooses poorly, locking in a non-great outcome.
Flawed decisionmaking delegation.
- Decisionmaking—whether hard power or just the ability to make a recommendation that will probably be executed—is delegated to a reasonable-sounding and politically acceptable but uniformed group (e.g., poorly-chosen experts, the UN, a state's legislators, a state's voters, or all adults), and something big is decided without sufficient information (viz., without leveraging AI to make suggestions and comment on proposals, but more broadly deciding something before there's been time/opportunity to improve decisionmakers' normative beliefs).
Flawed power delegation.
- The world order breaks down into a multipolar scenario which then fails (e.g., powerful AI is controlled by a group of states, which decides to give copies of the AI to each state in the group).
- Power is delegated directly to individuals, and a great outcome is impossible due to coordination difficulties.
Other internal coordination failure. The controller is an organization without a specific mandate for using superintelligence and without a centralized, agenty process to decide.
- Formal or informal constraints on use of superintelligence mean that the best politically feasible use is not great.
  - Poor plans were formally locked in or informally became the default, even if the controller dislikes them now (or would dislike them if they had not been locked in).
- Deadlock. Decisionmakers cannot reach a required level of consensus on major issues and the decisionmaking structure responds poorly; the default outcome is not great.
Lack of benevolence. The singleton does not try to do good.
Lack of understanding/appreciation. The controller does not appreciate how superintelligence enables irrevocably locking out great possible futures and does not act with the necessary care.
- Initially, superintelligence is used in ways that are large only by prosaic standards. Perhaps due to popular misunderstanding of AI, superintelligence is treated as a tool for fulfilling object-level preferences rather than for planning or improving our preferences. Eventually, a controller of superintelligence decides to optimize for its current preferences, locking in a narrow set of possible futures, and they are not great.
- To prevent locking in undesired futures, the controller implements a poorly-designed constitution for superintelligence, locking out great possible futures.
Weak safeguards. A discontent faction seizes power, legally or illegally, and uses superintelligence poorly. (This is more relevant the more decentralized and disparate the controller is.)

But attempting to separate these possibilities may be analytically counterproductive by concealing the large-scale reason for concern. Few people want what's optimal, so political forces just don't push in that direction. Most people, interest groups, and policymakers will just have prosaic goals. Political institutions like ours would likely struggle to achieve anything meaningful: it is much easier to imagine decisive political support for something prosaic than for a plan for using our cosmic endowment. A radical (by our standards) plan for how to use our cosmic endowment—which is presumably necessary for a great future^[7]—is not politically feasible. Instead, we may end up with a "pretty good" future: one excellent along prosaic dimensions, like Earthly Utopia, but not great. Extremely democratic uses of AI are prima facie similarly problematic—most people don't want what's best—and also have coordination issues.

I think superintelligence will likely be governed by something like our current political institutions and these institutions may fare poorly. We should not trust the long-term future to decisionmaking systems based in current humans' preferences; we prima facie should delegate this choice to another system. Any candidate system must be both politically feasible and likely to choose well. That is, I think we need something like long reflection to use our cosmic endowment well.^[8] If ideas like long reflection sound radical or just unreasonable immediately before the intelligence explosion, our future is in trouble.

"In the very near future, we are going to lift something to Heaven."^[9] And even if it's aligned with what we want, it might not be aligned with what's good. So what can we do? Some desiderata:

What is popularly perceived as Responsible and Respectable Opinion supports great uses of superintelligence. (Perhaps long reflection is seen as the only legitimate short-term macro-level use of superintelligence.)
The superintelligence's decisionmaking/governance structure is good.
- The controller's decisionmaking process tends to output something great.
  - The controller having a specific mandate for long reflection (or another kind of delegation of power) might help.
  - Intelligence augmentation for relevant humans might help.
- There are strong safeguards against outsiders or factions seizing the superintelligence.
Longtermists/EAs/rationalists are influential. Longtermist individuals and organizations are generally respected authorities on how to use superintelligence and are not associated with ideas that are unpopular or illegible.

V. Multipolar Failure Modes

Despite their titles, this section does not complement section IV. That section was about the risk that an agent can do whatever it wants and it fails to choose well. This section is about additional ways we could fail if nobody has such power after the intelligence explosion.

I am generally more pessimistic about multipolar scenarios (although it really depends on the specifics): roughly, multipolar scenarios include the risk of unipolar failure for each powerful agent. If an omnipotent organization can fail, two semi-omnipotent organizations can each fail in the same way. But multipolar scenarios have their own special failure modes as well:

Conflict. Empowered but unwise humans could act for the sake of relative power, retribution, or enforcing certain norms on others.
Competition. I would expect technologically mature civilizations to be good at coordinating even with those they disagree with, but this may not hold, especially in the short run.
- Cooperation is impossible. Perhaps factions will be unable to prove things about themselves or reliably simulate others and thus be unable to trust one another. Then competition over resources could consume all surplus.
- Agents fail to cooperate even though it is possible.
  - Relevant decisions happen too fast.
  - Humans irrationally decide not to coordinate.
  - AI are constrained for nonstrategic reasons (due to poor decisions by the humans and institutions running the world) and unable to cooperate.
- Many agents each have power and creating value requires coordination (e.g., not racing to seize a common good, or not optimizing for influence or reproductive fitness) but some don't cooperate.
Catastrophic negotiation failure.
- Agents use brinkmanship, attrition, or precommitment to coerce one another. Due to a mistake in communication or in predicting others' responses, catastrophe ensues.
- At least one agent negotiates in a manner that is flawed even from its subjective vantage. Either directly or because others' behavior assumed that no agents would make that mistake, catastrophe ensues.
Accidents. This could occur in multipolar scenarios due to agents not being totally secure in their power (analogous to our near-misses in erroneous nuclear "retaliation"). Perhaps powerful AI systems will eliminate accidents, or perhaps they will foster uncertainty and instability, promote deception, and lead to stronger hair-trigger dynamics.

VI. Conclusion

The governance problem is the practical problem of getting the controller of superintelligence to use it near-optimally.

Here are some propositions I believe:

If we create aligned superintelligence, how we use it will involve political institutions and processes. Superintelligence will probably be controlled by a state or a group of states. This is more likely the more AI becomes popularly appreciated and the more legibly powerful AI is created before the intelligence explosion.

Aligned superintelligence enables directing the arbitrarily distant future. Consider what an intelligent but not omniscient observer would predict about the future of Earth and Earth-originating systems. Throughout human history, events almost always have had negligible effects on the observer's credences. In the last century, some events had non-negligible effects on the credences through their effects on extinction risk. But because of the possibility of superintelligence, it may soon become possible to lock in narrow classes of possible futures. This could happen intentionally (a singleton could optimize for any preferences) or unintentionally (if we create unaligned powerful AI or fail to coordinate to use aligned powerful AI well).

Accidental governance failure is possible. We could create aligned superintelligence but still end up with an outcome that nobody wants.

"Pretty good" governance failure is possible. We could end up with an outcome that many or most influential people want, but that wiser versions of ourselves would strongly disapprove of. This scenario is plausibly the default outcome of aligned superintelligence: great uses of power are a tiny subset of the possible uses of power, the people/institutions that currently want great outcomes constitute a tiny share of total influence, and neither will those who want non-great outcomes be persuaded nor will those who want great outcomes acquire influence much without us working to increase it.

The governance problem depends (in largely predictable ways) on various factors that we can affect before TAI. These include:

Ideas about what to do with powerful AI (among AI elites, elites in general, and people in general). For example, it would be good if people appreciated the value/responsibility of leaving our options open until we can wisely choose what to do with superintelligence. A popular mandate for more specific plans (long reflection or indirect normativity or something else) may be valuable.
The formal and informal constraints on AI research and development.
The level of international cooperation and competition for powerful AI.
Whether an AI constitution exists, and if so, what it looks like.

To improve our chances of achieving successful governance, we should think about what affects how superintelligence is used and how we can affect those factors, then do it.

Thanks to Daniel Kokotajlo for suggestions.

I am not aware of an existing name for the important problem of getting a superintelligence that does what its operator wants to do what is best. This problem roughly requires wisdom and caution to avoiding locking in object-level values prematurely and coordination among people with influence over using superintelligence.

Nick Bostrom defined the "political problem," complementing the control problem, as "how to achieve a situation in which individuals or institutions empowered by such AI use it in ways that promote the common good." To the extent that value is binary, it matters less whether AI promotes the common good on net and more whether AI does astronomical good. To the extent that superintelligence (not previous AI) is all that matters after superintelligence exists, it only matters how we use the superintelligence. I assume Bostrom used this less carving-at-the-joints-y definition for simplicity and to decrease inferential distance for people outside the community; I'm pretty sure that my "governance problem" is closer to how we should be thinking about the problem of using AI well.

Will MacAskill once called some related issues the "second-level alignment problem," but it's not clear what exactly he meant.

Note that, roughly, P(win) = P(aligned powerful AI) * P(great use) = P(survive until powerful AI) * P(powerful AI is aligned) * P(great use). This suggests a decomposition of the problem of achieving an existential win into three subproblems: the survival problem, the alignment problem, and the governance problem. ↩︎
That is, some uses of superintelligence would have near-optimal expected value, where optimal expected value is roughly what we would achieve if we were thoughtful, wise, coordinated, and successful, by our standards. ↩︎
Acausal trade is a conceivable source of value that does not necessarily require our colonizing the universe well. But it is prima facie even more politically challenging. Regardless, the prospect of it and other speculative, potentially radically effective strategies gives us additional reason to increase our collective ability to do unintuitive things with superintelligence. ↩︎
These look similar in practice — rather than just telling the superintelligence to optimize for X, we'd probably have it tell us what optimizing for X would look like first, so we're effectively hearing the object-level way to optimize for X and then telling it to pursue that path. ↩︎
Similarly to note 4, level number isn't really meaningful; it just matters that there's a chain of delegation that ends in something great. ↩︎
More uncertain scenarios would occur if (1) the controller of superintelligence does not make decisions in a predictable way (e.g., it's a group of states with different goals, or it's an international organization without a clear mandate for using superintelligence) or (2) there is a multipolar outcome of some sort — e.g., if there is slow takeoff (in particular, no threshold-y behavior) or superintelligence is not able to form a singleton. ↩︎
Since almost all of the resources eventually available to us involve colonizing the universe and it's prima facie unlikely that what sounds normal to current humans is optimal. ↩︎
I am sympathetic to long reflection but will not defend it here. I merely use it as a prima facie example of a system that could have the two necessary properties for successful governance: acceptability and great decisionmaking. ↩︎
Scott Alexander's Meditations on Moloch. While superintelligence could kill Moloch dead, Moloch might choose how we use it. That would be ironic. ↩︎

MaxRa @ 2021-08-30T18:00 (+4)

Really interesting post, thanks! Some random reactions.

"Pretty good" governance failure is possible. We could end up with an outcome that many or most influential people want, but that wiser versions of ourselves would strongly disapprove of. This scenario is plausibly the default outcome of aligned superintelligence: great uses of power are a tiny subset of the possible uses of power, the people/institutions that currently want great outcomes constitute a tiny share of total influence, and neither will those who want non-great outcomes be persuaded nor will those who want great outcomes acquire influence much without us working to increase it.

My first gut reaction is skepticism that this is a likely or stable state. The Earthly utopia scenario will likely not happen given that seemingly most explorations of humanity's future prominently feature its expansion to space. Additionally, I suspect that a large fraction of people who seriously start thinking about the longterm future of humanity fall into the camp that you consider "people/institutions that currently want great outcomes" If this is true, one might suspect that this will become a much stronger faction and an aligned AI will have to consider those ambitions, too?

Robin Hanson speculated that the debate between people who want to use our cosmic endownment and those who want to stay local might be the cultural debate of the future. He calls it becoming grabby vs. non-grabby. He worries that a central government will try to restrict grabby expansion because it would be nearly impossible to keep the growing civilization under its control:

If within a few centuries we have a strong world government managing capitalist competition, overpopulation, value drift, and much more, we might come to notice that these and many other governance solutions to pressing problems are threatened by unrestrained interstellar colonization. Independent colonies able to change such solutions locally could allow population explosions and value drift, as well as capitalist competition that beats out home industries. That is, colony independence suggests unmanaged colony competition. In addition, independent colonies would lower the status of those who control the central government.
So authorities would want to either ban such colonization, or to find ways to keep colonies under tight central control. Yet it seems very hard to keep a tight lid on colonies. The huge distances involved make it hard to require central approval for distant decisions, and distant colonists can’t participate as equals in governance without slowing down the whole process dramatically. Worse, allowing just one sustained failure, of some descendants who get grabby, can negate all the other successes. This single failure problem gets worse the more colonies there are, the further apart they spread, and the more advanced technology gets.

https://www.overcomingbias.com/2021/07/the-coming-cosmic-control-conflict.html

I'm kind of sceptical that the desire to have absolute control would be strong enough to stamp down any expansatory and exploratory ambitions. I suspect that humans and institutions will converge considerably towards the "making most of our endowment" stance. With the increasing wealth, we will learn more about how much value we will be able to create and how much more value is possible compared to our prosaic imaginations, so an aligned AI will also work towards helping us achieve those sooner or later.

Zach Stein-Perlman @ 2021-08-30T22:30 (+3)

Thanks for your comments!

My first gut reaction is skepticism that [a "pretty good" scenario] is a likely or stable state.

I certainly agree that Earthly utopia won't happen; I just wrote that to illustrate how prosaic values would be disastrous in some circumstances. But here are some similar things that I think are very possible:

Scenarios where some choices that are excellent by prosaic standards unintentionally make great futures unlikely or impossible.
Scenarios where the choices that would tend to promote great futures are very weird by prosaic standards and fail to achieve the level of consensus necessary for adoption.

In retrospect, I should have thought and written more about failure scenarios instead of just risk factors for those scenarios. I expect to revise this post, and failure scenarios would be an important addition. For now, here's my baseline intuition for a "pretty good" future:

After an intelligence explosion, a state controls aligned superintelligence. Political elites
- are not familiar with ideas like long reflection and indirect normativity,
- do not understand why such ideas are important,
- are constrained from pursuing such goals ( or perhaps because opposed factions can veto such ideas), or
- do not get to decide what to do with superintelligence because the state's decisionmaking system is bound by prior decisions about how powerful AI should be used (either directly, by forbidding great uses of AI, or indirectly, by giving decisionmaking power to groups unlikely to choose a great future)
So the state initially uses AI in prosaic ways and, roughly speaking, thinks of AI in prosaic ways. I don't have a great model of what happens to our cosmic endowment in this scenario, but since we're at the point where unwise individuals/institutions are empowered, the following all feel possible:
- We optimize for something prosaic
- We lock in a choice that disallows intentionally optimizing for anything
- We enter a stable state in which we do not choose to optimize for anything

I don't have much to say about Hanson right now, but I'll note that a future that involves status-seeking humans making decisions about cosmic-scale policy (for more than a transition period to locking in something great) is probably a failure; success looks more like optimizing the universe.

I suspect that a large fraction of people who seriously start thinking about the longterm future of humanity fall into the camp that you consider "people/institutions that currently want great outcomes"

Historically, sure. But I think that's due to selection: the people who think about the longterm future are mostly rationalist/EA-aligned. I would be very surprised if a similar fraction of a more representative group had the wisdom/humility/whatever to want a great future, much less the background to understand why we even have a "pretty good" future problem.

If this is true, one might suspect that this will become a much stronger faction

I suspect that humans and institutions will converge considerably towards the "making most of our endowment" stance.

This would surprise me. I expect poor discourse (in the US, at least) about how to use powerful AI. In particular, I expect:

The discourse will focus on prosaic issues like privacy and the future of work.
People will assume that the universe-scale future looks like "humans flying around in spaceships" and debate what those humans should do (rather than "superintelligent von Neumann probes optimizing for something" and debate what they should optimize for — much less recognize that we shouldn't be thinking about what they should optimize for; we should delegate that decision to a better system than current human judgment).

(Also, your comment implies that aligned superintelligence will try to optimize for all humans' preferences. I would be surprised if this occurs; I expect aligned superintelligence to try to do what its controller says.)

I would be very excited to call to discuss this further. Please PM me if you're interested.

MaxRa @ 2021-08-30T18:13 (+3)

If we create aligned superintelligence, how we use it will involve political institutions and processes. Superintelligence will probably be controlled by a state or a group of states. This is more likely the more AI becomes popularly appreciated and the more legibly powerful AI is created before the intelligence explosion.

It seems really useful to me to understand better how likely states will end up calling the shots. I wonder if there are potential options for big tech to keep sovereignty about AI. I'd suspect a company would prefer staying in control and will consider all options it has available. Just some random initial thoughts:

negotiate about moving headquarters with different countries to get as much autonomy as possible
somehow "diversify" across nations and continents and play them off against each other
construct advanced AI systems such that they only do a few narrow tasks that don't seem obviously urgent to be nationalized (something like Codex 3.0, or personal assistants, …) and internally assembling so much ability that they can resist nationalization?

Zach Stein-Perlman @ 2021-08-30T22:30 (+4)

It seems really useful to me to understand better how likely states will end up calling the shots.

Yes, absolutely. I think this largely depends on the extent to which political elites appreciate AI's importance; I expect that political elites will appreciate AI and take action in a few years, years before an intelligence explosion. I want to read/think/talk about this.

While big tech companies will probably come up with more strategies, I'm skeptical about their ability to not be nationalized or closely supervised by states. In response to your specific suggestions:

I think states are broadly able to seize property in their territory. To secure autonomy, I think a corporation would have to get the government to legally bind itself. I can't imagine the US or China doing this. Perhaps a US corporation could make a deal with another government and move its relevant hardware to that state before the US appreciates AI or before the US has time to respond? That would be quite radical. Given the major national security implications of AI, even such a move might not guarantee autonomy. But I think corporations would probably have to move somehow to maintain autonomy if there was political will and a public mandate for nationalization.
I don't understand. But if the US and China appreciate AI's national security implications, they won't be distracted.
I don't understand "assembling . . . ability," but corporations intentionally making AI feel nonthreatening is interesting. I hadn't thought about this. Hmm. This might be a factor. But there's only so much that making systems feel nonthreatening can do. If political elites appreciate AI, then it won't matter whether currently-deployed AI systems feel nonthreatening: there will be oversight. It's also very possible that the US will have a Sputnik moment for AI and then there's strong pressure for a national AI project independent of the current state of private AI in the US.