Other-centered ethics and Harsanyi's Aggregation Theorem

By Holden Karnofsky @ 2022-02-02T03:21 (+59)

I'm interested in doing as much good as possible. This naturally raises the question of what counts as "good," and how (if at all) we can compare some "goods" to others. Is it better to help one person profoundly, or help a hundred people a little? To avert a death, or improve someone's quality of life? Is there any way to compare reducing the suffering of chickens on factory farms to reducing the suffering of humans?

This may seem like a bottomless pit of a question. There is a vast literature on ethics and ethical axiology, stretching back through millennia. But for me - and, I suspect, many other effective altruists - the space of options gets narrower if we add a basic guiding motivation: I want an approach to ethics that's based as much as possible on the needs and wants of others, rather than on my personal preferences and personal goals. To abbreviate, I want an other-centered ethics.

To elaborate a bit, I want to avoid things like:

This piece will lay out the case that the quest for an other-centered ethics leads naturally to utilitarian-flavored systems with a number of controversial implications. These include a number of views that are common in the effective altruism community but off-putting to many, including:

The arguments in this piece aren't original: they mostly derive from a 1955 argument now known as Harsanyi's Aggregation Theorem. Most of the piece will be examining the theorem's assumptions and implications, and most of what I'll say can be found in one paper or another. This piece differs from existing writings on the topic (as far as I can tell) in the following ways:

I think the idea of other-centered ethics has significant appeal, and it pulls me significantly toward some of the controversial positions associated with utilitarianism and effective altruism. I also think the idea has limitations, and I don't think it's realistic or a good idea to try to live entirely by (or even base one's philanthropy entirely on) the idea of other-centered ethics. In future pieces, I hope to explore what an other-centered ethics looks like, and key intuitions both for and against trying to live by it.

Brief summary of the rest of the piece:

Harsanyi's Aggregation Theorem (HAT)

In my view, the best case for putting heavy weight on utilitarian-flavored ethics - and taking on board some of the uncomfortable views associated with them - revolves around a theorem called Harsanyi's aggregation theorem (hereafter HAT), and related ideas.

HAT is a theorem connecting a particular set of assumptions about ethics (which I will enumerate and discuss below) to a conclusion (essentially that one should endorse an ethics that has a certain kind of utilitarian flavor).3 If you accept the assumptions, the conclusion is mathematically proven, so the big question is whether you accept the assumptions.

Below, I list the assumptions and conclusion at a high level; then link to appendices explaining the logic of how HAT connects the assumptions to the conclusion; then go through the assumptions in more detail. My goal here is to examine the assumptions at a high level and address what is and isn't attractive about them.

My presentation is somewhat impressionistic, and other papers (see previous footnote) would be better resources for maximally precise, technical presentations.

I think the assumptions should be quite appealing for one seeking an "other-centered ethics," though (a) there are other approaches to ethics under which they are much less appealing; (b) they leave many questions open.

My high-level gloss on the assumptions driving HAT

I would roughly say that HAT presents a compelling case for a form of utilitarianism, as long as you accept something reasonably close to the following:

  1. There is a particular set of persons that matters ethically, such that the goal of our ethics is to benefit these persons. (Example sets would be "All humans in the universe" or "all sentient beings").
  2. For each person, we can talk meaningfully about which states of the world are "better for them" than others.
    • This might map to which states of the world they personally prefer, or to which states of the world they would prefer under various idealized conditions, or something else.
    • It doesn't have to have anything to do with pleasure or happiness. We could, if we want, say a state of the world is "better for" person X iff it makes person X happier, but we don't need to, and I don't personally take this approach.
  3. In order to achieve a certain kind of consistency (more below) in our ethical decisions, we should use the expected utility framework to model both (a) the sense (from #2) in which some states of the world are better for person X than others; (b) the sense in which some outcomes are ethically better than others.
    • This assumption is not obviously appealing, but there is a deep literature, better explained elsewhere, that I think makes a strong case for this assumption.
    • The main way it will enter in: say that there are two different benefits a person might gain, benefit A and benefit B. Using the expected utility framework implies that:
      • One of the two benefits is “better” - more of a benefit - than the other, such that it would be ethically better for the person to gain that benefit. (Or the two are exactly equally good.)
      • If one of the benefits is better, then there’s some probability, P, such that a P probability of the better benefit is as good as a 100% probability of getting the worse benefit. For example, say that having a nice meal would be a benefit, and finding a long-lost family member would be a benefit. Then a nice meal is exactly as good as a 1/1000, or perhaps 1/10,000 or 1/100,000 (or some number) chance of finding a long-lost family member. We can also express this by saying that finding a long-lost family member is “1000 times (or 10,000 times, or 100,000 times, or some amount) as good as a nice meal.”
      • Similar dynamics would apply when comparing one cost to another, or a benefit to a cost, though the details are a little different.
  4. If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

(Note that I am not including the "impartiality" assumption or anything about interpersonal utility comparisons. I will discuss this later on. If you don't know what this means, ignore it.)

These assumptions can be formalized in different ways. But broadly speaking, once you embrace all of them, something like HAT can establish the conclusion: the goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each ethically significant person's utility ("utility" meaning "how good the state of the world is for that person").

That is, the goal of your ethics should be to maximize the numerical quantity:

X = U_1 * W_1 + U_2 * W_2 + … U_N * W_N

Where U_1 represents person 1’s utility in a given state of the world (a number representing how good that state of the world is for person 1); W_1 represents the ethical “weight” you give person 1 (how much your ethical system values that person relative to others); and the other U’s and W’s are defined similarly, for each of the N persons.

This does not resolve which weight you should use for each person, but it establishes a basic form for your ethics - a weighted sum of utilities - and in my view, it means you're now likely to embrace many of the positions that utilitarianism is known, and criticized, for.

In particular, a corollary of HAT (and hence, derivable from the assumptions above) is the utility legions corollary: a given benefit/harm affecting N persons is N times as good (from an ethical perspective) as a given benefit/harm affecting one person (assuming that the one person, and the benefit they receive, is representative of the N). Hence, if there is a large enough set of persons who benefits (or anti-benefits) from a particular action, that alone can make the action ethically (un)desirable, outweighing all other considerations. I’ll elaborate on this below, in the section on utility legions.

The proof

If you want a short, technical proof of HAT, I recommend Part III of Harsanyi (1955), the paper that introduced it.

I've made a few attempts at a more intuitive walkthrough, though I'm not 100% satisfied with any of them. They are:

I also think it would be OK to simply take my word that the assumptions imply the conclusion, and move on. I don’t think much in particular hinges on understanding the “why” of HAT. And sometimes it’s most fruitful to understand a proof after one has a really clear picture of what assumptions are needed for the proof, and of what the most interesting/nonobvious/controversial implications of the proof are. The rest of the essay is about these topics, so it could make sense to simply read on, and check out the appendices once you have a clearer sense of what’s at stake.

Examining HAT's assumptions

Having given a high-level sketch, I'll now look more closely at each of the four assumptions driving HAT's result.

I think it's worth discussing each assumption in some depth. In technical discussions of HAT, there is often a lot of focus on formalizing each assumption, and most of the assumptions have several candidate formalizations. Here, I'll be using a different emphasis: discussing the high-level intuitions behind each assumption, noting some of the open questions each assumption raises, pointing to literature on these open questions and naming what I consider the most reasonable and common ways of answering them, and discussing how the assumption connects to (or in some cases seems to rely on) the idea of other-centered ethics.

In general, I think the assumptions have enough intuitive appeal that thinking through each of them is likely to make you put significant weight on some form of utilitarianism. However, I don't consider any of the assumptions to be wholly straightforward or unproblematic. I think they all raise some tough issues for further thought and study.

In my view, the assumptions that seem initially most unintuitive and hardest to swallow (#2 and #3 above, the ones comprising expected utility theory) are the ones I'm most inclined to hold onto, while the more innocuous sounding ones (#1 and #4 above) are the ones that my anti-utilitarian side sees as the weakest links.

The population of moral patients (assumption 1)

There is a particular set of persons that matters ethically, such that the goal of our ethics is to benefit these persons. (Example sets would be "All humans in the universe" or "All sentient beings.")

At first glance this assumption seems quite innocuous. It seems hard to get around this assumption if you want an "other-centered" ethics: it's essentially stating that there is a distinct set of "others" that your ethics is focused on.

However, in practice, I think a lot of the biggest debates over utilitarianism concern who exactly is in this "set of persons that matters ethically.”

All of the above points pertain to the "There is a particular set of persons that matters ethically" part of the assumption. "The goal of our ethics is to benefit these persons" has issues as well, mostly because it is entangled with the idea of "other-centered" ethics. I'll discuss this more under assumption 4.

In sum, there are a lot of questions to be answered about the "set of persons that matters ethically," and different utilitarians can take very different views on this. But utilitarians tend to assume that it's possible in principle to arrive at a good answer to "which set of persons matters ethically?", and while I don't think this is clearly right, I think it is a reasonable view and a good starting point.

Utility (assumption 2)

For each person, we can talk meaningfully about which states of the world are "better for them" than others.

I think this assumption is the source of significant confusion, which is why I thought it was worth splitting out from Assumption 3 below.

Early utilitarians proposed to base ethics on "pain and pleasure" (e.g., maximizing pleasure / minimizing pain). This approach is still a live one among utilitarians,6 and utilitarianism is sometimes seen as incompatible with ideas like "we should value things in life other than pain and pleasure."

But the version of HAT I've presented doesn't need to work with the idea of pain or pleasure - it can work with any sense in which one state of the world is "better for" person X than another.

The terms "well-being," "welfare" and "utility" are often used to capture the broad idea of "what is good for a person." To say that situation 1 has greater "utility" for person X than situation 2 is simply to say that it is, broadly speaking, a better situation for person X. Different people have different opinions of the best way to define this, with the main ones being:

The SEP article I linked to gives a good overview of these options, as does this excerpt from Reasons and Persons. But you need not choose any of them, and you could also choose some weighted combination. Whatever you choose as your framework for defining utility, if you take on board the other assumptions, HAT's conclusion will apply for maximizing total weighted utility as you define it.

For the rest of this piece, I will straightforwardly use the idea of "utility" in the following sense: to say that person X's utility is greater in situation 1 than in situation 2 means that situation 1 is better for person X than situation 2, in whichever of the above senses (or whatever other sense) you prefer.

Expected utility (assumption 3)

In order to achieve a certain kind of consistency in our ethical decisions, we should use the expected utility framework to model both (a) the sense (from #2) in which some states of the world are better for person X than others; (b) the sense in which some outcomes are ethically better than others.

In Appendix 3, expected utility mostly makes its appearance via multipliers used to compare two worlds - the idea that any given world can always be described as “___ times as good” as another (where ___ is sometimes less than 1 or negative).8 In Appendix 1, it makes its appearance via the unrealistic assumption that each of the people in the example “cares only about how much money they have, and they care linearly about it.”

The basic idea of expected utility theory is:

A terminology note: throughout this piece, I generally use the term "person-state" or the abbreviation "state" when I want to refer specifically to the state a particular person finds themselves in. I use the terms "states of the world," "world-states" and "worlds" to refer to configurations of person-states (if you know the state of the world, you know which person-state each person is in).

Of the four assumptions, I think this is the one that is hardest to grasp and believe intuitively. However, there is a significant literature defending the merits of the expected utility framework. A common theme in the literature is that any agent that can't be well-modeled using this framework must be susceptible to Dutch Books: hypothetical cases in which they would accept a series of gambles that is guaranteed to leave them worse off.

These arguments have been made in a number of different ways, sometimes with significant variations in how things like "situations" (or "events") are defined. I link to a couple of starting points for relevant literature here.

I want to be clear that expected utility does not seem like a very good way of modeling people's actual behavior and preferences. However, I think it is apt in an ethical context, where it makes sense to focus on idealized behavior:

Overall, I think there is a very strong case for using the expected utility framework in the context of forming an “other-centered ethics,” even if the framework has problems in other contexts.

Pareto efficiency (assumption 4)

If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

I think this idea sounds quite intuitive, and I think it is in fact a very appealing principle.

However, it's important to note that in taking on this assumption, you're effectively expunging everything from morality other than the individual utility functions. (You could also say assumption #1 does this, but I chose to address the matter here.)

For example, imagine these two situations:

  1. Person A made a solemn promise to Person B.
    • (1a) If they break the promise, Person A will have U(A) = 50, and Person B will have U(B) = 10.
    • (1b) If they keep the promise, Person A will have U(A) = 10, and Person B will have U(B) = 20.
  2. Person A and B happen across some object that is more useful to Person A than to Person B. They need to decide who gets to keep the object.
    • (2a) If it's Person A, then Person A will have U(A) = 49.9, and Person B will have U(B) = 10.
    • (2b) If it's Person B, then Person A will have U(A) = 10.1, and Person B will have U(B) = 20.

Pareto efficiency says that (2b) is better than (1b) (since it is better for Person A, and the same for Person B) and that (2a) is worse than (1a) (same reason). So if you prefer (2a) to (2b), it seems you ought to prefer (1a) to (1b) as well.

What happened here is that Pareto efficiency essentially ends up forcing us to disregard everything about a situation except for each person's resulting expected utility. Whether the utility results from a broken promise or from a more "neutral" choice can't matter.

This is pretty much exactly the idea of "other-centered ethics" - we shouldn't be considering things about the situation other than how it affects the people concerned. And because of this, many people will bite the above bullet, saying something like:

If the four cases were really exactly as you say with no further consequences, then I would in fact prefer Case (1a) to Case (1b), just as I prefer Case (2a) to Case (2b). But this is a misleading example, because in real life, breaking a promise has all sorts of negative ripple effects. Keeping a promise usually will not have only the consequences in these hypotheticals - it is usually a good thing for people. But keeping a promise is not intrinsically valuable, so if it did have only these consequences, it would be OK to break the promise.

I don't think this response is obviously wrong, but I don't think it's obviously right either.

I'll discuss these themes more in a later piece. For now, I'll conclude that the "Pareto efficiency" assumption seems like a good one if your goal is other-centered ethics, while noting that it does have some at least potentially uncomfortable-seeming consequences.

Impartiality, weights and interpersonal comparisons

I stated above:

These assumptions can be formalized in different ways. But broadly speaking, once you embrace all of them, something like HAT can establish the conclusion: the goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each ethically significant person's utility ("utility" meaning "how good the state of the world is for that person").

This does not resolve which weight you should use for each person, but it establishes a basic form for your ethics - a weighted sum of utilities - and in my view, it means you're now likely to embrace many of the positions that utilitarianism is known, and criticized, for.

The question of what weight to use for each person is quite thorny, though I don't consider it central to most of the points in this piece. Here I'll cover it briefly.

In Appendix 1, I invoke a case where it implicitly/intuitively seems obvious that one should "weigh everyone equally." I talk about a choice between four actions:

In this case, it seems intuitive that all of these actions are equally good. We haven't given any info to distinguish the persons or their valuations of money.

The idea of "weighing everyone equally" is often invoked when discussing HAT (including by Harsanyi himself). It's intuitive, and it leads to the appealingly simple conclusion that actions can be evaluated by their impact on the simple, unweighted sum of everyone's utilities - this is the version of utilitarianism that most people are most familiar with. But there is a (generally widely recognized) problem.9

The problem is that there are no natural/obvious meaningful "units" for a person's utility function.

Imagine that we're asking, "Which is better, the situation where person A has utility 10 and person B has utility 20, or the situation where person A has utility 25 and person B has utility 10?"

If we're trying to "weigh everyone equally" we might say that the second situation is better, since it has higher total utility. But we could just as easily have asked, "Which is better, the situation where person A has utility 10 and person B has utility 200, or the situation where person A has utility 25 and person B has utility 100?" Multiplying all of B’s utilities by 10 doesn’t matter in any particular way, as far as the expected utility framework is concerned. But it leads us to the opposite conclusion.

So how do we decide what utility function to use for each person, in a way that is "fair" and allows us to treat everyone equally? There are a couple of basic approaches here:

Interpersonal comparisons and impartiality. Some utilitarians hold out hope that there are some sort of "standardized utility units" that can provide a fair comparison that is valid when comparing one person to another. There would need to be some utility function that fulfills the requirements of expected utility theory (e.g., whenever the utility of a person in situation 1 is higher than their utility in situation 2, it means situation 1 is better for them than situation 2), while additionally being in units that can be fairly and reasonably compared across persons.

If you assume that this is the case, you can stipulate an "impartiality" assumption in addition to the assumptions listed above, leading to a version of HAT that proves that actions can be evaluated by their impact on the simple, unweighted sum of everyone's utilities. This is the version of HAT covered in Nick Beckstead's "intuitive explanation of Harsanyi's Aggregation Theorem" document.

You might end up effectively weighing some persons less than others using this approach. For example, you might believe that chickens simply tend to have fewer "standardized utility units" across the board. (For example, under hedonism - discussed next - this would mean that they experience both less pleasure and less pain than humans, in general.)

A common way of going for "standardized utility units" is to adopt hedonism, briefly discussed above: "situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if it gives them a more favorable balance of pleasure and pain." The basic idea here is that everything that is morally relevant can, at least in principle, be boiled down to objective, standardized quantities of pain and pleasure; that the entire notion of a situation being "good for" a person can be boiled down to these things as well; and hence, we can simply sum the pain and pleasure in the world to get total utility. Hedonism has issues of its own (see SEP article), but it is a reasonably popular variant of utilitarianism, as utilitarianisms go. (I'm personally not a big fan.)

Another potential approach is to use the "veil of ignorance" construction, introduced by Harsanyi 1953 and later popularized and reinterpreted by Rawls: "imagine you had to decide how to structure society from behind a veil of ignorance. Behind this veil of ignorance, you know all the facts about each person’s circumstances in society—what their income is, how happy they are, how they are affected by social policies, and their preferences and likes. However, what you do not know is which of these people you are. You only know that you have an equal chance of being any of these people. Imagine, now, that you are trying to act in a rational and self-interested way—you are just trying to do whatever is best for yourself. How would you structure society?"11

As Harsanyi (1955) suggests, we could imagine that whenever a person makes decisions based on how they themselves would set up the world under the veil of ignorance, that person is correctly following utilitarianism. That said, it's not clearly coherent to say that the person making the decision could be randomly assigned to "be another person," or that they could reasonably imagine what this would be like in order to decide how to set up the world. It’s particularly unclear how one should think about the probability that one would end up being a chicken, given that chickens are far more numerous than humans, but may be less ethically significant in some other respect.

The veil of ignorance is generally a powerful and appealing thought experiment, but I expect that different people will have different opinions on the extent to which this approach solves the "interpersonal comparisons of utility" problem.

A final approach could be to use variance normalization to create standardized units for people's utilities. I won't go into that here, but I think it's another approach that some will find satisfying and others won't.

Under the second two approaches above (and sometimes/perhaps the first as well?), some people will want to diverge from equal-weighting in order to accomplish things like weighing animals' interests less heavily than humans, potential people's interests less heavily than existing people's, etc. But the default/starting point seems to be counting everyone the same, once you have standardized utility units that you're happy with.

Utilitarianism with "unjustified" weights. Another approach you can take is to accept that there is no truly principled way to make interpersonal utility comparisons - or at least no truly principled way that our current state of knowledge has much to say about - and try to maximize the weighted sum of utilities using weights that:

I personally tend toward this approach. I don't think we have any satisfying standardized units of utility, and I don't think it matters all that much: when we consider the real-world implications of utilitarianism, we are rarely literally writing down utilities. As I'll discuss lower down, most of the controversial conclusions of utilitarianism seem like they would arise from a wide range of possible choices of weights, indeed from basically any set of weights that seems "ballpark reasonable."

Quick recap

I'm going to quickly recap the assumptions that power HAT, and the conclusion it arrives at, now that I've talked them all through in some detail.

Assumptions:

1 - There is a particular set of persons that matters ethically, such that the goal of our ethics is to benefit these persons. (Example sets would be "All humans in the universe" or "all sentient beings.")

There is a lot of ambiguity about how to choose this set of persons. Particularly challenging conundra arise around how to value animals, computer programs, and "potential lives" (persons who may or may not come to exist, depending on our actions). And we get into real trouble (breaking HAT entirely) if the set of persons that matters ethically is infinite. As I'll argue later, I think the idea of "other-centered ethics" has its biggest problems here. On the other hand, there are plenty of times when you're basically sure that some particular set of persons matters ethically (even if you're not sure that there aren't others who matter as well), and you can (mostly) proceed.

2 - for each person, we can talk meaningfully about which states of the world are "better for them" than others.

There are debates to be had about what makes some states of the world "better for" a person than others. You can define "better for" using the idea of pleasure and pain (hedonism, the original version of utilitarianism); using the idea of preferences or idealized preferences (what states of the world people would prefer to be in); or using other criteria, such as considering certain types of lives better than others regardless of how the persons living them feel about the matter. Or you can use a mix of these ideas and others.

3 - we should use the expected utility framework to model both (a) the sense (from #2) in which some states of the world are better for person X than others; (b) the sense in which some outcomes are ethically better than others.

The basic intuition here is that our ethics shouldn't be subject to "Dutch books": it shouldn't be possible to construct a series of choices such that our ethics predictably makes the persons we are trying to benefit worse off. If you accept this, you will probably accept the use of the expected utility framework. The framework has many variants and a large literature behind it.

4 - If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

This is the "Pareto efficiency" idea that makes sure our ethics is all about what's good for others. It ends up stripping out considerations like "I need to keep my promises" unless they can be connected to benefiting others.

Conclusion: the goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each ethically significant person's utility.

The weight you use for each person represents how much your ethical system "values their value": how good it is for the world for them to be better off. One way to think about it is as the value your ethical system places on their getting some particular benefit (such as $5), divided by the value they themselves place on getting that benefit. The weight you use may incorporate considerations like "how much pleasure and pain they experience," "how much they count as a person," and adjustments to make their units of value fairly comparable to other people. I don't think there is any really robust way of coming up with these weights, but most of the controversial conclusions of utilitarianism seem like they would arise from a wide range of possible choices of weights, indeed from basically any set of weights that seems "ballpark reasonable."

HAT and the veil of ignorance

The "veil of ignorance" idea mentioned briefly above can in some sense be used as a shorthand summary of HAT:

When deciding what state of the world it is ethically best to bring about, you should ask yourself which state of the world you would selfishly choose, if you were (a) going to be randomly assigned to be some person under that state of the world, with exactly the benefits and costs they experience in that state of the world; (b) determined to maximize your own expected utility.

This implicitly incorporates all of the assumptions above:

The veil of ignorance idea as stated above implies utilitarianism as surely as HAT does, and if you find it intuitive, you can often use it as a prompt to ask what a utilitarian would do. (That said, the debate between utilitarians and Rawls shows that there can be quite a bit of disagreement about how to interpret the veil of ignorance idea.)

What you commit to when you accept Harsanyi's Aggregation Theorem

Let's say that you are committed to other-centered ethics, and you accept the "conclusion" I've outlined implied by HAT:

The goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each ethically significant person's utility.

Practically speaking, what does this mean you have committed to? At first glance, it may seem the answer is "not much":

These points initially appear to make the conclusion of HAT relatively uninteresting/unimportant, as well as uncontroversial. Most defenses of utilitarianism don't lean solely on HAT; they also argue for particular conceptions of utility, impartiality, etc. that can be used to reach more specific conclusions.

However, I believe that accepting HAT's conclusion does - more or less - commit your ethical system to follow some patterns that are not obvious or universal. I believe that these patterns do, in fact, account for most of what is interesting, unusual, and controversial in utilitarianism broadly speaking. In what follows, I will use "utilitarianism" as shorthand simply for "the conclusion of HAT," and I will discuss the key patterns that utilitarianism seems committed to.

  1. Utilitarianism implies consequentialism. When everyone agrees that Action 1 would lead to a better state of the world than Action 2, utilitarianism says to take Action 1. Historically, I think this has been the biggest locus of debate over utilitarianism, and a major source of the things utilitarians can boast about having been ahead of the curve on.
  2. Utilitarianism allows "utility monsters" and "utility legions." A large enough benefit to a single person (utility monster), or a benefit of any size to a sufficiently large set of persons (utility legion), can outweigh all other ethical considerations. Utility monsters seem (as far as I can tell) to be a mostly theoretical source of difficulty, but I think the idea of a "utility legion" - a large set of persons such that the opportunity to benefit them outweighs all other moral considerations - is the root cause of most of what's controversial and interesting about utilitarianism today, at least in the context of effective altruism.

Consequentialism

"Consequentialism" is a philosophy that says actions should be assessed by their consequences. SEP article here.14

Imagine we are deciding between two actions, Action 1 and Action 2. We have no doubt that Action 1 will result in a better end-state for the world as a whole, but some people are hesitating to take Action 1 because (a) it violates some rule (such as "don't exchange money for sacred thing X" or "don't do things you committed not to do"); (b) it would be a break with tradition, would challenge existing authorities, etc.; (c) it inspires a "disgust" reaction (e.g.); (d) something else. Any consequentialist philosophy will always say to take Action 1.15

Consequentialism is a broader idea than utilitarianism. For example, you might think that the best state of the world is one in which there is as much equality (in a deep sense of "equality of goodness of life") as possible, and you might therefore choose a state of the world in which everyone has an OK life over a state of the world in which some people have great lives and some have even better lives. That's not the state of the world that is best for any of the individual people; preferring it breaks Pareto efficiency (Assumption 4 above) and can't be consistent with utilitarianism. But you could be a consequentialist and have this view.

That said, utilitarianism and consequentialism are pretty tightly linked. For example, here are some excerpts from the SEP article on the history of utilitarianism:

The Classical Utilitarians, Bentham and Mill, were concerned with legal and social reform. If anything could be identified as the fundamental motivation behind the development of Classical Utilitarianism it would be the desire to see useless, corrupt laws and social practices changed. Accomplishing this goal required a normative ethical theory employed as a critical tool. What is the truth about what makes an action or a policy a morally good one, or morally right? But developing the theory itself was also influenced by strong views about what was wrong in their society. The conviction that, for example, some laws are bad resulted in analysis of why they were bad. And, for Jeremy Bentham, what made them bad was their lack of utility, their tendency to lead to unhappiness and misery without any compensating happiness. If a law or an action doesn't do any good, then it isn't any good ...

This cut against the view that there are some actions that by their very nature are just wrong, regardless of their effects. Some may be wrong because they are ‘unnatural’ — and, again, Bentham would dismiss this as a legitimate criterion ... This is interesting in moral philosophy — as it is far removed from the Kantian approach to moral evaluation as well as from natural law approaches. It is also interesting in terms of political philosophy and social policy. On Bentham's view the law is not monolithic and immutable. ...

My impression is that while consequentialism is still debated, the world has moved a lot in a consequentialist direction since Bentham's time, and that Bentham is generally seen as having been vindicated on many important moral issues. As such, with the above passage in mind, it seems to me that a good portion of historical debates over utilitarianism have actually been about consequentialism.

Consequentialism is still not an uncontroversial approach to ethics, and I don't consider myself 100% consequentialist. But it's a broadly appealing approach to ethics, and I put a lot of weight on consequentialism. I also generally feel that utilitarianism is the best version of consequentialism, mostly due to HAT.

Utility monsters and utility legions

As far as I can tell, the original discussion of "utility monsters" is a very brief bit from Anarchy, State and Utopia:

Utilitarian theory is embarrassed by the possibility of utility monsters who get enormously greater gains in utility from any sacrifice of others than these others lose. For, unacceptably, the theory seems to require that we all be sacrificed in the monster’s maw, in order to increase total utility.

Here I'm also going to introduce the idea of a "utility legion." Where a utility monster is a single person capable of enough personal benefit (or cost) to swamp all other ethical considerations, a utility legion is a set of persons that collectively has the same property - not because any one person is capable of unusual amounts of utility, but because the persons in the "legion" are so numerous.

I believe that both utility monsters and utility legions have the following noteworthy properties:

1 - They cause utilitarianism to make clearly, radically different prescriptions from other approaches to ethics.

A utility monster or utility legion might be constituted such that they reliably and linearly benefit from some particular benefit, and therefore ethics could effectively be reduced to supplying them with that benefit. For example, if a utility monster or legion reliably and linearly benefits from pizza, ethics could be reduced to supplying them with pizza. This would be a radically unfamiliar picture of ethics, almost no matter what we replaced "pizza" with.

An especially discomfiting aspect is the breaking of reciprocity.

As I'll discuss below, real-life utility legions do in fact seem to have the sort of properties discussed above. They tend to have "simple enough" utility functions - and to present low enough odds of reciprocating - to make the moral situation feel quite unfamiliar. Part of the reason I like the term "utility legion" is its association with homogeneity and discipline: what makes utility-legion-dominated morality unfamiliar is partly the fact that there are so many persons who benefit in a seemingly reliable, linear way from particular actions.

2 - Utilitarianism requires that they be possible. I explain the math behind this in a footnote.17

Once you've committed to maximize a weighted sum of utilities, you've committed to the idea that utility monsters and utility legions are possible, as long as the monster has enough utility or the legion has enough persons in it.

In real life, there is going to be some limit on just how much utility/benefit a utility monster can realize, and on just how many persons a utility legion can have. But if you're trying to follow good-faith other-centered ethics, I think there also tend to be limits on how low you can set a person's "ethical weight" (the weight applied to their utility function), assuming you've decided they matter at all. With both limits in mind, I think utilitarians are generally going to feel a significant pull toward allowing the welfare of a utility legion to outweigh all other considerations.

The next section will give examples.

Examples of utility legions

As far as I can tell, the "utility monster" is ~exclusively an abstract theoretical objection to utilitarianism. I don't know of controversial real-world utilitarian positions that revolve around being willing to sacrifice the good of the many for the good of a few.18

Utility legions are another story. It seems to me that the biggest real-world debates between utilitarians (particularly effective altruists) and others tend to revolve around utility legions: sets of persons who can benefit from our actions in relatively straightforward and linear ways, with large enough populations that their interests could be argued to dominate ethical considerations.

In particular:

The scope of utilitarianism

The previous section talks about utility legions' coming to "dominate" ethical considerations. Does this mean that if we accept utilitarianism, we need to be absolutist, such that every ethical rule (even e.g. "Keep your promises") has to be justified in terms of its benefit for the largest utility legion?

It doesn't. We can believe that the "other-centered" approach of utilitarianism is the only acceptable one for some sorts of decisions, while believing that other ethical systems carry serious weight in other sorts of decisions.

An example of a view along these lines would run as follows (I have a lot of sympathy with this view, though not total sympathy):

Conclusion: Thin Utilitarianism

There are many versions of utilitarianism, and many questions (e.g.) that this piece hasn't gone far into. But in my view, the most interesting and controversial real-world aspect of utilitarianism - the way it handles "utility legions" - does not require a fully fleshed out ethical theory, and can be supported based on simple assumptions consonant with the idea of other-centered ethics.

While I do have opinions on what fleshed-out version of utilitarianism is best, I don't feel very compelled by any of them in particular. I feel far more compelled by what I call thin utilitarianism, a minimal set of commitments that I would characterize as follows:22

Thanks to Nick Beckstead, Luke Muehlhauser and Ajeya Cotra for detailed comments on earlier drafts.

Appendix 1: a hypothetical example of HAT

The logic of HAT is very abstract, as you can see from my attempt in the previous appendix to make it intuitive without getting bogged down in mathematical abstractions. I'd love to be able to give a “real-world example” that illustrates the basic principles at play, but it's hard to do so. Here's an attempt that I think should convey the "basic feel" of how HAT works, although it is dramatically oversimplified (which includes adding in some assumptions/structure that isn't strictly included in the HAT assumptions above).

Imagine that there are four people: A, B, C, and D. Unusually (and unrealistically), each of these people cares only about how much money they have, and they care linearly about it. That means:

Let's also take on board a blanket "we're in a vacuum" type assumption: your choice won't have any important effects in terms of incentives, future dynamics, etc.

Now you are deciding between two actions, Action 1 and Action 2. If you take Action 1, it will result in the following distribution of wealth:

If you take Action 2, it will result instead in:

Which action is it ethically better to take?

You might naively think that Action 1 looks more appealing, since it is more egalitarian. However:

Consider Action 1.5, which results in a randomly selected person getting $6000, and each of the others getting $1000. So it has a 25% chance of having the same effect as Action 2, a 25% chance of having a similar effect but benefiting person A instead, etc.

Now consider:

If you accept this claim about Action 1.5, should you accept it about Action 2 as well? I would basically say yes, unless for some reason your ethics values Person 4 less than (at least) one of the other persons.

There are a couple of ways of thinking about this:

If you accept this reasoning that Action 2 is better, or at least more "other-centered," than Action 1, similar logic can be used to argue for the superiority of any action that leads to higher total wealth (across the four people) than its alternative.

How this example is unrealistic

I believe the above example is substantially "powered" by this bit:

Unusually (and unrealistically), each of these people cares only about how much money they have, and they care linearly about it. That means:

Let's also take on board a blanket "we're in a vacuum" type assumption: your choice won't have any important effects in terms of incentives, future dynamics, etc.

This set of assumptions is a simplification of assumptions #2 and #3 above.

As an aside, the above example also implicitly contains assumption #1 (since it focuses on a particular, fixed set of four people) and assumption #4 (this is just the part where we assume that any action preferred by everyone affected is the right, or at least "other-centered," action). But I think in the context of the example, those assumptions seem pretty innocuous, whereas the above bit seems unrealistic and could be said to diverge significantly from real life. I'll talk about why each of the above assumptions is unrealistic in order.

Linear valuation of wealth. First, in real life, people generally don't value wealth linearly. Their valuation might be closer to "logarithmic": going from $1000 to $2000 in wealth might be a similar change (from a quality of life perspective) to going from $10,000 to $20,000 or $100,000 to $200,000.

If I had wanted to make the example more realistic and intuitive from this perspective, I might have dropped the statement that people care "linearly" about wealth, and changed the action consequences as follows:

Now you are deciding between two actions, Action 1 and Action 2. If you take Action 1, it will result in the following distribution of wealth:

If you take Action 2, it will result instead in:

If you felt unconvinced of the superiority of Action 2 before, and feel more convinced in this case, then you're probably not thinking abstractly enough - that is, you're probably importing a lot of intuitions from real life that are supposed to be assumed away in the example.

HAT itself doesn't talk about money. It talks about "utility" or "welfare," which are intended to abstractly capture what is "good for" the affected persons. From a "utility" or "welfare" perspective, the advantage of Action 2 actually looks smaller in the example given directly above than in the original example I gave. This is because the original example stipulated that people care linearly and exclusively about their wealth, whereas the above example takes place in the real world.

So while the original example I gave isn't realistic, I don't think the lack of realism is a huge problem here. It's just that HAT relies on reasoning about some abstract notion of "how good a situation is for a person" that can be treated linearly, and that notion might not translate easily or intuitively to things we're used to thinking about, such as wealth.

One-dimensional values. The example stipulates that the people involved care only about their wealth level. That's a very unusual condition in the real world. I think a lot of what feels odd about the example above is that Action 2 feels unfair and/or inegalitarian, and in real life, at least some of the people involved would care a lot about that! It's hard to get your head around the idea of Person A not caring at all about the fact that Person D has 6x as much wealth as them, for example; and it's relatively easy to imagine a general rule of "maximize total wealth" being exploited and leading to unintended consequences.

Again, HAT itself looks to abstract this sort of thing away through its use of expected utility theory. We should maximize not wealth, but some abstract quantity of "utility" or "welfare" that incorporates things like fairness. And we should maximize it in the long run across time, not just for the immediate action we're taking.

Moving from "maximizing the wealth of persons who care linearly and exclusively about wealth" to "maximizing the 'utility' of persons who care about lots of things in complicated ways" makes HAT much harder to think intuitively about, and harder to apply to specific scenarios. (It also makes it more general.)

But the basic dynamics of the example are the dynamics of HAT: once you perform this abstraction, the principle "take any action that everyone involved would prefer you take" can be used to support the principle "maximize the total utility/welfare/goodness (weighted by how much your ethical system values each person) in a situation."

"Take any action that everyone involved would prefer you take" sounds like a relatively innocuous, limited principle, but HAT connects it to "maximize total (weighted) utility," which is a very substantive one and the key principle of utilitarianism. In theory, it tells you what action to take in every conceivable situation.

Appendix 2: the utility legions corollary

This appendix tries to give an intuitive walkthrough for how the assumptions listed above can be used to support the utility legions corollary:

A given benefit/harm affecting N persons is N times as good (from an ethical perspective) as a given benefit/harm affecting one person (assuming that the one person, and the benefit they receive, is representative of the N). Hence, if there is a large enough set of persons who benefits (or anti-benefits) from a particular action, that alone can make the action ethically (un)desirable, outweighing all other considerations.

I will focus on demonstrating the bolded part. The non-bolded part requires more mathematical precision, and if you want support you are probably better off with the normal proof of HAT or the walkthrough given in Appendix 3.

Imagine two persons: A and B. Person A is part of a "utility legion" - for example, they might be:

Person B is any other person, not included in the "utility legion."

We start from the premise that Person A's interests matter ethically. Because of this, if we are contemplating an action with some noticeable benefit to person A, there is some - even if very small - amount of harm (or benefit missed out on) for B that could be outweighed by the benefit to person A.

By "outweighed," I mean that it could be actively good (ethically) to take an action that results in some (potentially small) amount of harm to B, and the noticeable benefit to A, rather than sticking with the status quo.

Now imagine a large harm to person B. In line with Assumption 3, there is some number K such that "a 1/K probability of the large harm" is better for Person B than "a 100% chance of the smaller harm."

The 1/K chance of the large harm is better for person B than definitely suffering the small harm. Therefore, if we are only choosing between a world where "person A gets a small benefit, and person B gets a smaller harm" and a world where "person A gets a small benefit, and person B gets a 1/K chance of a large harm," the latter is ethically better, by Assumption 4. And since we've already established that the former world is ethically better than the "status quo" (a world with neither the benefit nor the harm), this establishes that the latter world is also ethically better than the status quo.

In other words, the small benefit to person A outweighs a 1/K chance of the large harm to person B.

Now let's imagine a "utility squadron": some subset of the "utility legion" with K persons like Person A, that can benefit in the same way, with the same ethical weight. From an ethical perspective, giving the small benefit to a random person in this "utility squadron" is as good as the small benefit given to person A, and hence can outweigh the 1/K chance of a large harm to person B.

Now imagine a 1/K chance of giving the small benefit to everyone in the "utility squadron." Everyone in the utility squadron would benefit from this gamble just as much as the 100% chance of giving the small benefit to a random person in the squadron.23

So by Assumption 4, the 1/K chance of a small benefit to everyone in the utility squadron is just as good, from an ethical perspective, as giving the small benefit to a random person in the utility squadron. Therefore, it too can outweigh a 1/K chance of a large harm to person B.

Since a 1/K chance of a small benefit to everyone in the utility squadron can outweigh the 1/K chance of a large harm to B, we can cancel out the 1/K multipliers and just say that a small benefit to everyone in the utility squadron can outweigh the large harm to B.

As long as the "utility legion" is large enough, every single person outside the "utility legion" can have any of their interests outweighed by a small benefit to a "squadron" (subset of the utility legion). Hence, for a large enough utility legion, giving the small benefit to the entire legion can outweigh any consequences for the rest of the world.

In practice, when it comes to debates about global poverty, animals on factory farms, etc., most people tend to accept the early parts of this argument - up to "the small benefit to person A can outweigh a 1/K chance of the large harm to person B." For example, preventing a case of malaria overseas has some value; there is some cost to our own society that is worth it, and some small probability of a large harm we would accept in order to prevent the case of malaria.

But as we move to a world in which there are an overwhelming number of malaria cases that we have the power to prevent, many people want to stop letting the benefits keep piling up straightforwardly (linearly). They want to say something like "Sure, preventing a case of malaria is worth $5, but preventing a million of them isn't worth more money than I have."

What HAT tells us is that we can't do this sort of thing if we want to hold onto its assumptions. If we want to use expected utility theory to be consistent in the tradeoffs we're making, and if we want to respect the "Pareto efficiency" principle, then we essentially need to treat a million prevented cases of malaria as being exactly a million times as good as one prevented case of malaria. And that opens the door to letting the prevention of malaria become our dominant ethical consideration.

Appendix 3: A walkthrough of the logic of Harsanyi's Aggregation Theorem

I think the basic conceptual logic of HAT should be understandable (though I wouldn't say "easily understandable") without a technical background. Below I will give my attempt at a relatively intuitive explanation.

It won't be the most rigorous presentation available (there are other papers on that topic). I will try to avoid leaning on the expected utility framework as a shared premise, and will instead highlight the specific assumptions and moves that utilize it.

Note that unlike Nick Beckstead's "intuitive explanation of Harsanyi's Aggregation Theorem" document, I will be focused on the version of HAT that does not assume we have a sensible way of making interpersonal utility comparisons (if you don't know what this means, you can just read on).

This ended up with a lot of steps, and I haven't yet managed to boil it down more and make it more digestible. (Thanks to Ajeya Cotra for help in making it somewhat better, though.)

World or lottery Alice gets Bob gets Comparison 1 Comparison 2
DefaultWorld 10 apples 10 pears
AltWorld 10 bananas 10 oranges
AliceBenefitWorld 10 apples and a lemon 10 pears W_Alice times as "ethically good", overall, as AliceBenefitWorld. (W_Alice = 1)
BobBenefitWorld 10 apples 10 pears and a lemon W_Bob times as "ethically good", overall, as AliceBenefitWorld
AliceSplitWorld 10 bananas 10 pears U_Alice times as good for Alice as AliceBenefitWorld. U_Alice*W_Alice times as ethically good, overall, as DefaultWorld.
BobSplitWorld 10 apples 10 oranges U_Bob times as good for Bob as BobBenefitWorld. U_Bob*W_Bob times as ethically good, overall, as DefaultWorld.
½ chance of AliceSplitWorld;

½ chance of BobSplitWorld

"½ * U_Alice * W_Alice + ½ * U_Bob * W_Bob" times as good as DefaultWorld
½ chance of AltWorld;

½ chance of DefaultWorld

Same as previous row

We could add a third person, Carol, to the population. Maybe Carol would get 10 kiwis in DefaultWorld and 10 peaches in AltWorld. Here’s how things would change:

Similarly, we can extend this reasoning to any number of persons in the population, 1 through N. AltWorld will always be U_1 * W_1 + U_2 * W_2 + … + U_N * W_N times as good as DefaultWorld.

Below, I go into a lot more detail on each piece of this, and consider it for the general case of N persons. The following table may be useful as a reference point to keep coming back to as I go (it is similar to the table above, but more general):

World Characterization Comparison 1 Comparison 2
Defaultworld The "status quo" we'll be comparing to an alternative world brought about by some action. For example, "Alice has 10 apples and Bob has 10 pears."
Altworld The alternative world we're comparing to Defaultworld. For example, "Alice has 10 oranges and Bob has 10 bananas."
Benefitworld_1 Same as Defaultworld, except that Person 1 receives an additional benefit (e.g., "Alice has ten apples and a lemon; Bob has ten pears"). W_1 times as "ethically good", overall, as Benefitworld_1. (W_1 = 1)
Benefitworld_2 Same as Defaultworld, except that Person 1 receives an additional benefit (e.g., "Alice has ten apples; Bob has ten pears and a lemon"). W_2 times as "ethically good", overall, as Benefitworld_1
Benefitworld_N Same as Defaultworld, except that Person N receives an additional benefit. W_N times as "ethically good", overall, as Benefitworld_1
Splitworld_1 Same as Defaultworld for everyone except Person 1. Same as Altworld for Person 1. (Example: "Alice has 10 oranges, as in Altworld; Bob has 10 pears, as in Defaultworld") U_1 times as good for Person 1 as Benefitworld_1. U_1*W_1 times as ethically good, overall, as Defaultworld
Splitworld_2 Same as Defaultworld for everyone except Person 2. Same as Altworld for Person 2. (Example: "Alice has 10 apples, as in Defaultworld; Bob has 10 bananas, as in Altworld") U_2 times as good for Person 2 as Benefitworld_2. U_2*W_2 times as ethically good, overall, as Defaultworld
Splitworld_N Same as Defaultworld for everyone except Person N. Same as Altworld for Person N. U_N times as good for Person N as Benefitworld_N. U_N*W_N times as ethically good, overall, as Defaultworld
Lottery/gamble in final step of the walkthrough 1/N chance of Splitworld_1; 1/N chance of Splitworld_2; ... ; 1/N chance of Splitworld_N. "(1/N)*U_1*W_1 + (1/N)*U_2*W_2 + ... + (1/N)*U_N*W_N" times as good as Defaultworld
Lottery/gamble in final step of the walkthrough 1/N chance of Altworld; otherwise Defaultworld Same as previous row

The basic setup

There is a set of N persons that matters ethically (Assumption 1). We will denote them as Person 1, Person 2, Person 3, ... , Person N. They don't need to be humans; they can be anyone/anything such that we think they matter ethically. For most of this explanation, we'll be focused on Person 1 and Person 2, since it's relatively straightforward to generalize from 2 persons to N persons.

By default all N persons are in a particular "state of the world" - let's call it Defaultworld. We're contemplating an action that would cause the state of the world to be different, resulting in Altworld.24 We're going to try to come up with a system for deciding whether this action, resulting in Altworld, is a good thing or a bad thing.

Benefitworld_1 is a world that is just like Defaultworld, except that Person 1 gets some benefit.

Benefitworld_2 is exactly the same, except it's Person 2 rather than Person 1 getting the benefit. Benefitworld_3, Benefitworld_4, ... Benefitworld_N are the same idea.

Splitworld_1 is a world that is just like Defaultworld for everyone except Person 1, and just like Altworld for Person 1. In the simplified example, this is the one where Alice has 10 oranges (as in Altworld) while Bob has 10 pears (as in Defaultworld).

Splitworld_2 is the same idea, but with Person 2 in the role of the person who is in an Altworld-like state. Splitworld_3, Splitworld_4, ... , Splitworld_N work similarly.

Comparing two worlds with a multiplier

Splitworld_1 vs. Benefitworld_1. We're going to examine how to ethically compare Splitworld_1 and Benefitworld_1.

First let's consider: which is better for Person 1, Benefitworld_1 or Splitworld_1? (For example, is it better for Alice to have ten apples and a lemon, or to have ten bananasoranges?)

One of the consequences of the expected utility framework (Assumption 3) is that this question has an answer: either Benefitworld_1 is better for Person 1, or Splitworld_1 is better for Person 1, or they're exactly equally good.

Let's imagine that Splitworld_1 is better for Person 1 than Benefitworld_1. In this case, by Assumption 3, there exists some probability, P_1, such that:

I'll describe this sort of situation as "Splitworld_1 is U_1 times as good as Defaultworld for Person 1," where U_1 = 1/P_1.

For example, if we say that "Splitworld_1 is 1000x as good as Benefitworld_1," we're saying that "a 1/1000 chance of Splitworld_1; otherwise Defaultworld" would be exactly as good for Person 1 as "a 100% chance of Benefitworld_1." And "a 2/1000 chance of Splitworld_1, otherwise Defaultworld" would be better for Person 1 than Benefitworld_1 (because 2/1000 > 1/1000), and "a 1/2000 chance of Splitworld_1, otherwise Defaultworld" would be worse for Person 1 than Benefitworld_1 (because 1/2000 < 1/1000).

Now for a nonobvious point: if Splitworld_1 is U_1 times as good as Benefitworld_1 for Person 1, it is also U_1 times as ethically good, overall as Benefitworld_1.

This is because of Assumption 4:

If you compare the last three bullet points to the three bullet points above, you'll see that the statements about what is ethically good precisely mirror the statements about what is best for Person 1. Hence, Splitworld_1 is both U_1 times as good as Benefitworld_1 for Person 1, and overall.

Some notes on shortcuts I'll be taking for the rest of the appendix - if you find your eyes glazing over, just move on to the next section:

First, for the rest of the appendix, to save space, I will leave out bullet points such as the second two bullet points directly above. When I compare two different probabilistic situations, A and B, and say A is "exactly as good" as B or that A and B are "equally good," it will be the case that a small change in the key probability within A would make A better or worse than B. This is important because of the way Assumption 4 is written: I will often be saying that our ethical system should consider two situations equally good, when what I mean is that it should consider one better if the key probability shifts a bit. I could elaborate on this if needed for specific cases, but writing it out every time would get tedious.

Second: I've only established "if Splitworld_1 is U_1 times as good as Benefitworld_1 for Person 1, it is also U_1 times as ethically good, overall, as Benefitworld_1" for the case where Splitworld_1 is better (for Person 1) than both Defaultworld and Benefitworld_1. There are also possible cases where Splitworld_1 is worse than both Defaultworld and Benefitworld_1, and where Splitworld_1 is in between (that is, better than Defaultworld but worse than Benefitworld_1). Here I'm going to take a bit of a shortcut and simply assert that in these cases as well, it makes sense to use the "Splitworld_1 is U_1 times as good as Benefitworld_1" language I'm using - specifically,

Bottom line - for any two worlds other than Defaultworld, it will always be possible to say that one of them is "U_1 times as good as the other," where U_1 > 1 implies it is better and U_1 < 1 implies it is worse, and a negative U_1 implies it is bad relative to Defaultworld. Regardless of which one is the case, U_1 will have all the key properties that I talk about throughout this appendix, such as what I showed just above: "if Splitworld_1 is U_1 times as good as Benefitworld_1 for Person 1, it is also U_1 times as ethically good, overall, as Benefitworld_1."

I could elaborate this fully, but it would be tedious to do so.

Here U_1 represents Person 1’s “utility” in Splitworld_1 - how good the world is for them, normalized against some consistent “comparison point” benefit, in this case the benefit from Benefitworld_1.

Ethical weights

Which is more ethically good, Benefitworld_1 or Benefitworld_2? For example, is it better for Alice to have ten apples and a lemon while Bob has only ten pears, or is it better for Alice to have only ten apples while Bob gets ten pears and a lemon?

For purposes of this exercise, we'll essentially say the answer is whatever you want - as long as it is a single, consistent answer.

Whatever you decide, though, we can introduce the same "Benefitworld_2 is ___ times as good as Benefitworld_1" type language as above:

W_2 represents the "ethical weight" of Person 2 with respect to Person 1 - how much your ethical system values the particular benefit we're talking about (e.g., a lemon) for Person 2 vs. Person 1.

Now we can show that Splitworld_2 is U_2*W_2 times as ethically good as Benefitworld_1. As a reminder, U_2 represents how much better Splitworld_2 is than Benefitworld_2 for Person 2, and W_2 is the "ethical weight" of Person 2 just discussed.

As an aside, an alternate approach to HAT is to insist that each of the “benefits” is standardized - for example, that we declare all benefits to be reducible to some quantity of pleasure, and we define Benefitworld_1 as the world in which Person 1 gets a particular, standardized amount of pleasure (and similar for Benefitworld_2, etc.) One can then skip the part about “ethical weights,” and use the impartiality idea to say that everyone gets the same weight.

Final step

Consider the following two possibilities:

We can value the 2nd possibility using the finding from the previous section:

Furthermore, we can say that the two possibilities listed at the top of this section are equally good for all of the N persons, since in each case, that person has a 1/N chance of getting a benefit equivalent to being in Altworld, and will otherwise have an equivalent situation to being in Defaultworld. So by Assumption 4, the two possibilities listed at the top of this section should be equally ethically good, as well.

So now we've established that "A 1/N chance of Altworld, otherwise Defaultworld" is 1/N*(U_1*W_1 + U_2 * W_2 + ... + U_N * W_N) times as good as Benefitworld_1.

That in turn implies that a 100% chance of Altworld is simply (U_1 * W_1 + U_2 * W_2 + ... + U_N * W_N) times as good as Benefitworld_1. (All I did here was remove the 1/N from each.)

We can use this logic to get a valuation of any hypothetical world (other than Defaultworld). When one world has a greater value of (U_1 * W_1 + U_2 * W_2 + ... + U_N * W_N) than another, it will be "more times as good as Benefitworld_1" by our ethical system, which means it is better.

A high-level summary would be:

Notes


  1. I use this term to denote a broad conception of beings. “Persons” could include animals, algorithms, etc. 

  2. Nitpick-type point: one could have lexicographic orderings such that a single benefit of type A outweighs any number of benefits of type B. As such, this claim only holds for specific interpretations of the word "significant" in statements like "perhaps you place significant value on one person getting a bednet." I am more specific about the conditions giving rise to utility legions in Appendix 2

  3. There are a number of variations on HAT. The version I'll be discussing here is based on Harsanyi (1955), the "original" version. (See part III of that paper for a compact, technical explanation and proof.) This is different from the version in Nick Beckstead's "intuitive explanation" of the theorem. The version I'll be discussing is harder to explain and intuit, but I chose it because it requires fewer assumptions, and I think that's worth it. There are a number of other papers examining, re-explaining and reinterpreting the theorem, its assumptions, and variants - for example, Hammond (1992), Mongin (1994), Broome (1987). There are also pieces on HAT in the effective altruism community, including this LessWrong post and Nick Beckstead's "intuitive explanation" of the theorem

  4. For relatively brief discussions of these issues, see Stanford Encyclopedia of Philosophy on the Repugnant Conclusion, or Population Axiology. I plan to write more on this topic in the future. 

  5. That statement is a slight simplification. For example, "discounted utilitarianism" arguably has infinite persons who matter ethically, but their moral significance diminishes as they get "further" from us along some dimension. 

  6. E.g., utilitarianism.net largely has this focus. 

  7. One could also have a hybrid view such as "situation 1 is better for person X if it contains more of certain objective properties and X appreciates or enjoys or is aware of those properties." 

  8. In this context, "World A is ___ times as good as world B" statements also need to make implicit reference to a third "baseline" world. This is spelled out more in the appendix. 

  9. More on this topic here: Comparing Utilities 

  10. I mean, you might reasonably guess that one team has a better offense and the other has a better defense, but you can't say which one performed better overall. (And even the offense/defense statement isn't safe - it might be that one team, or its opponent, simply played at a more time-consuming pace.) 

  11. Quote is from https://www.utilitarianism.net/introduction-to-utilitarianism#the-veil-of-ignorance 

  12. For example:

    • You can set your ethical system up so that your own interests are weighed millions of times, billions of times, etc. more heavily than others' interests. You can do similarly for the interests of your friends and family. This violates the spirit of other-centered ethics, but it doesn't violate the letter of the version of HAT I've used.
    • You can give negligible weight to animals, to people overseas, etc.
    • Actually, if you're comparing any two states of the world, you could call either one "better" from an ethical point of view,and still be consistent with maximizing some weighted sum of utilities, as long as there's at least one person that is better off in the state of the world you pick.
    • Although over time, if you apply your framework to many ethical decisions, you will become more constrained in your ability to make them consistently using a single set of weights.
     
  13. For example:

    • Some people might claim that everyone benefits equally from $1. By HAT, this would imply that we should maximize some weighted sum of everyone's wealth; intuitively, it would suggest that we should simply maximize the equal-weighted sum of everyone's wealth, which means maximizing total world wealth.
    • Some people might claim that people benefit more from $1 when they have less to begin with. This could lead to the conclusion that money should be redistributed from the wealthy to the less wealthy. (In practice, most utilitarians think this, including myself.)
    • It also seems that most persons have a direct interest in what state other persons are in. For example, it might be that people suffer a direct cost when others become more wealthy. On balance, then, it could even be bad to simply generate free wealth and give it to someone (since this might make someone else worse off via inequality.)
      • This consideration makes real-world comparisons of total utility pretty hairy, and seems to prevent many otherwise safe-seeming statements like "Surely it would be good to help Persons 1, 2, and 3 while directly hurting no one."
      • It means that a utilitarian could systematically prefer equality of wealth, equality of good experiences, etc. They can't value equality of utility - they are stuck with a weighted sum. But utility is a very abstract concept, and equality of anything else could matter for the weighted sum of utility.
     
  14. I use "consequentialism" here rather than "welfarist consequentialism" because I think it is a better-known and more intuitive term. The downside is that there are some very broad versions of consequentialism that count all sorts of things as "consequences," including violations of moral rules. I mean "consequentialism" to exclude this sort of thing and to basically mean welfarist consequentialism. 

  15. This isn't meant to be a full definition of consequentialism, i.e., consequentialism can also be applied to cases where it's not clear what will lead to the best outcome. 

  16. Though often an exchange with a "long-run, repeated-game, non-transactional" feel. 

  17. Say we have this weighted sum, where the W's are weights and the U's are utilities:

    W_1*U_1 + W_2*U_2 + ... + W_N*U_N + W_M*U_M + K*W_L*U_L

    The W_M*U_M term corresponds to the "utility monster", and the K*W_L*U_L term corresponds to the "utility legion," while the rest of the sum corresponds to the "rest of the world."

    Once you have filled in all the variables except for U_M, there is some value for U_M that makes the overall weighted sum come out to as big a number as you want.# Hence, for any two situations, adding a high enough value of U_M to one of them makes that situation come out better, regardless of everything else that is going on.

    Similarly, once you have filled in all the variables except for K, there is some value of K that makes the overall weighted sum come out to as big a number as you want.  

  18. Unless you count valuing humans much more than other animals and insects? But this isn't a distinctive or controversial utilitarian position. 

  19. To be clear, I think worrying about suffering of present-day algorithms is obscure and not terribly serious. I think future, more sophisticated algorithms could present more serious considerations. 

  20. "As a rough approximation, let us say the Virgo Supercluster contains 10^13 stars. One estimate of the computing power extractable from a star and with an associated planet-sized computational structure, using advanced molecular nanotechnology[2], is 10^42 operations per second.[3] A typical estimate of the human brain’s processing power is roughly 10^17 operations per second or less.[4] Not much more seems to be needed to simulate the relevant parts of the environment in sufficient detail to enable the simulated minds to have experiences indistinguishable from typical current human experiences.[5] Given these estimates, it follows that the potential for approximately 10^38 human lives is lost every century that colonization of our local supercluster is delayed; or equivalently, about 10^29 potential human lives per second." 

  21. One perspective I've heard is that utilitarianism is the best available theory for determining axiology (which states of the world are better, from an ethical perspective, than others), but that axiology is only part of ethics (which can also incorporate rules about what actions to take that aren't based on what states of the world they bring about). I don't find this division very helpful, because many of the non-utilitarian ethical pulls I feel are hard to divide this way: I often can't tell when an intuition I have is about axiology vs. ethics, and my attempts to think about pure axiology feel very distant from my all-things-considered ethical views. However, others may find this division useful, and it arguably leads to a similar conclusion re: being a utilitarian when making supererogatory, ethics-driven decisions. 

  22. As implied by the italicization, this passage does not actually describe my personal views. It describes a viewpoint that has significant pull for me, but that I also have significant reservations about. This piece has focused on the case in favor; I hope to write about my reservations in the future. 

  23. If a random person in the utility squadron gets the small benefit, that means each person in the squadron has a 1/K chance of getting the small benefit. If there's a 1/K chance of everyone in the squadron getting the small benefit, then each person in the squadron has a 1/K chance of getting the small benefit. They're the same. 

  24. I'm deliberately not getting into a lot of detail about what a "state of the world" means - is it a full specification of all the world's relevant properties at every point in time, is it some limited set of information about the world, or something else? There are different ways to conceive of "world-states" leading to different spins on the case for using the expected utility framework (e.g.


deanspears @ 2022-02-02T15:24 (+20)

I'm very glad to see this, and I just want to add that there is a recent burst in the literature that is deepening or expanding the Harsanyi framework.  There are a lot of powerful arguments for aggregation and separability, and it turns out that aggregation in one dimension generates aggregation in others.  Broome's Weighing Goods is an accessible(ish) place to start.

Fleurbaey (2009): https://www.sciencedirect.com/science/article/abs/pii/S0165176509002894

McCarthy et al (2020): https://www.sciencedirect.com/science/article/pii/S0304406820300045?via%3Dihub

A paper of mine with Zuber, about risk and variable population together: https://drive.google.com/file/d/1xwxAkZlOeqc4iMeXNhBFipB6bTmJR6UN/view

As you can see, this literature is pretty technical for now.  But I am optimistic that in 10 years it will be the case both that the experts much better understand these arguments and that they are more widely known and appreciated.

Johan Gustafsson is also working in this space.  This link isn't about Harsanyi-style arguments, but is another nice path up the mountain: https://johanegustafsson.net/papers/utilitarianism-without-moral-aggregation.pdf 

Holden Karnofsky @ 2022-03-31T22:30 (+3)

Thanks, this is appreciated!

MichaelStJules @ 2022-06-25T18:13 (+10)

It's worth noting that while the rationality axioms on which Harsanyi's theorem depends are typically justified by Dutch book arguments, money pumps or the sure-thing principle, an expected utility maximization with an unbounded utility function (e.g. risk-neutral total utilitarianism) with infinitely many possible outcomes is actually vulnerable to Dutch books and money pumps and violates the sure-thing principle. See, e.g. Paul Christiano's comment with St. Petersburg lotteries. To avoid the issue, you can commit to sticking with the ex ante better option, even though you know you'd later want to break the commitment. But such commitments can be used for other Dutch books, too, e.g. you can commit to never completing the last step of a Dutch book, or, on a more ad hoc basis, anticipate Dutch books and try to avoid them on a more ad hoc basis.

The continuity axiom is also intuitive and I'd probably accept continuity over ranges of individual utilities at least, anyway, but I wouldn't take it to be a requirement of rationality, and without it, maximin/leximin/Rawl's difference principle and lexical thresholds aren't ruled out.

Jake Nebel @ 2022-08-10T22:48 (+8)

Hi Holden, thanks for this very nice overview of Harsanyi's theorem!  

One view that is worth mentioning on the topic of interpersonal comparisons is John Broome's idea (in Weighing Goods) that the conclusion of the theorem itself tells us how to make interpersonal comparisons (though it presupposes that such comparisons can be made). Harsanyi's premises imply that the social/ethical preference relation can be represented by the sum of individual utilities, given a suitable choice of utility function for each person. Broome's view is that this provides  provide the basis for making interpersonal comparisons of well-being: 

As we form a judgement about whether a particular benefit to one person counts for more or less than some other particular benefit to someone else, we are at the same time determining whether it is a greater or lesser benefit. To say one benefit 'counts for more' than another means it is better to bring about this benefit rather than the other. So this is an ethical judgement. Ethical judgements help to determine our metric of good, then. (220). 

And: "the quantity of people's good acquires its meaning in such a way that the total of people's good is equal to general good" (222).  

I don't think Broome is right (see my Aggregation Without Interpersonal Comparisons of Well-Being), but the view is worth considering if you aren't satisfied with the other possibilities. I tend to prefer the view that there is some independent way of making interpersonal comparisons. 

On another note: I think the argument for the existence of utility monsters and legions (note 17) requires something beyond Harsanyi's premises (e.g., that utilities are unbounded). Otherwise I don't see why "Once you have filled in all the variables except for U_M [or U_K], there is some value for U_M [or U_K] that makes the overall weighted sum come out to as big a number as you want." Sorry if I'm missing something! 

Kenny Easwaran @ 2022-02-03T18:54 (+5)

I haven't gone through this whole post, but I generally like what I have seen.

I do want to advertise a recent paper I published on infinite ethics, suggesting that there are useful aggregative rules that can't be represented by an overall numerical value, and yet take into account both the quantity of persons experiencing some good or bad and the probability of such outcomes: https://academic.oup.com/aristotelian/article-abstract/121/3/299/6367834

The resulting value scale is only a partial ordering, but I think it gets intuitive cases right, and is at least provably consistent, even if not complete. (I suspect that for infinite situations, we can't get completeness in any interesting way without using the Axiom of Choice, and I think anything that needs the Axiom of Choice can't give us any reason for why it rather than some alternative is the right one.)

UriKatz @ 2022-10-13T12:39 (+2)

It seems to me that no amount of arguments in support of individual assumptions, or a set of assumptions taken together, can make their repugnant conclusions more correct or palatable. It is as if Frege’s response to Russel’s paradox were to write a book exalting the virtues of set theory. Utility monsters and utility legions show us that there is a problem either with human rationality or human moral intuitions. If not them than the repugnant conclusion does for sure, and it is an outcome of the same assumptions and same reasoning. Personally, I refuse to bite the bullet here which is why I am hesitant to call myself a utilitarian. If I had to bet, I would say the problem lies with assumption 2. People cannot be reduced to numbers either when trying to describe their behavior or trying to guide it. Appealing to an “ideal” doesn’t help, because the ideal is actually a deformed version. An ideal human might have no knowledge gaps, no bias, no calculation errors, etc. but why would their well being be reducible to a function?

(note that I do not dispute that from these assumptions Harsanyi’s Aggregation Theorem can be proven)

MichaelStJules @ 2022-07-08T00:41 (+2)

Pareto efficiency (assumption 4)

If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

It's also worth mentioning that this Pareto efficiency assumption, applied to expected utilities over uncertain options and not just actual utilities over deterministic options, rules out (terminal) other-centered preferences for utility levels to be distributed more equally (or less equally) ex post, as well as ex post prioritarian and ex post egalitarian social welfare functions.

You would be indifferent between these two options over the utility levels of Alice and Bob (well, if you sweeten either an arbitrarily low amount, you should prefer the sweetened one; continuity gives you indifference without sweetening):

  1. 50% chance of 1 for Alice and 0 for Bob, and 50% chance of 0 for Alice and 1 for Bob.
  2. 50% chance of 1 for each and 50% chance of 0 for each.

But an ex post egalitarian might prefer 2, since the utilities are more equal in each definite outcome.

Between the two following options, an ex post prioritarian or ex post egalitarian might strictly prefer the first, as well, even though the expected utilities are the same (using the numbers as the same utilities used to for expected utilities):

  1. 50% chance of -1 for Alice and a 50% chance of 1 for Alice.
  2. 100% chance of 0 for Alice.

 

HAT's assumptions together also rule out preferences for or against ex ante equality and so ex ante priotarianism and ex ante egalitarianism, i.e., you should be indifferent between the following two options, even though the first seems more fair ex ante:

  1. 50% chance of 1 for Alice and 0 for Bob, and 50% chance of 0 for Alice and 1 for Bob.
  2. 100% chance of 1 for Alice and 0 for Bob.
Jaime Sevilla @ 2022-03-29T19:14 (+2)

Since I first read your piece on future-proof ethics my views have evolved from "Not knowing about HAT" to "HAT is probably wrong/confused, even if I can't find any dubious assumptions" and finally "HAT is probably correct, even though I do not quite understand all the consequences". 

I would probably not have engaged with HAT if not for this post, and now I consider it close in importance to VNM's and Cox' theorems in terms of informing my worldview.

I particularly found the veil of ignorance framing very useful to help me understand and accept HAT.

I'll probably be coming back to this post and mulling over it to understand better what the heck I have commited to.

richard_ngo @ 2022-02-26T01:44 (+2)

Situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if person X prefers that situation.

I think you need "prefers that situation for themselves". Otherwise, imagine person X who is a utilitarian - they'll always prefer a better world, but most ways of making the world better don't "benefit X".

Then, unfortunately, we run into the problem that we're unable to define what it means to prefer something "for yourself", because we can no longer use (even idealised) choices between different options.

Holden Karnofsky @ 2022-03-31T22:30 (+2)

Good point, thanks! Edited.

UriKatz @ 2022-10-13T12:35 (+1)

the quest for an other-centered ethics leads naturally to utilitarian-flavored systems with a number of controversial implications.

This seems incorrect. Rather, it is your 4 assumptions that “lead naturally” to utilitarianism. It would not be hard for a deontologist to be other-focused simply by emphasizing the a-priori normative duties that are directed towards others (I am thinking here of Kant’s duties matrix: perfect / imperfect & towards self / towards others). The argument can even be made, and often is, that the duties that one has towards one’s self are meant to allow one to benefit others (i.e. skill development). If by other-focused you mean abstracting from one’s personal preferences, values, culture and so forth, deontology might be the better choice, since its use of a-priori reasoning places it behind the veil of ignorance by default.