Other-centered ethics and Harsanyi's Aggregation Theorem

By Holden Karnofsky @ 2022-02-02T03:21 (+59)

I'm interested in doing as much good as possible. This naturally raises the question of what counts as "good," and how (if at all) we can compare some "goods" to others. Is it better to help one person profoundly, or help a hundred people a little? To avert a death, or improve someone's quality of life? Is there any way to compare reducing the suffering of chickens on factory farms to reducing the suffering of humans?

This may seem like a bottomless pit of a question. There is a vast literature on ethics and ethical axiology, stretching back through millennia. But for me - and, I suspect, many other effective altruists - the space of options gets narrower if we add a basic guiding motivation: I want an approach to ethics that's based as much as possible on the needs and wants of others, rather than on my personal preferences and personal goals. To abbreviate, I want an other-centered ethics.

To elaborate a bit, I want to avoid things like:

A preference for helping people in San Francisco, due to being in San Francisco. This feels like it's making ethics "about me" rather than "about others." Ideally, I'd embrace as much as I could of radical empathy.
Treating it as equivalent to help 1,000,000 people or 1,000,010 people, just because the numbers "feel the same to me." In other words, I want to avoid Scope neglect.
A certain kind of "paternalism." I shouldn't be doing things to discourage or disincentivize a certain lifestyle just because the lifestyle seems bad or offensive to me.
- It's potentially fine to go against someone's short-term preferences, if it accrues to their long-term benefit (for example, working to raise alcohol taxes so that people are less likely to become alcoholics). But the choices I make should be choices that people at least would agree with and appreciate, if they had a full understanding of these choices' ultimate consequences.
Making ethics about my own self-actualization or self-image. My choices shouldn't be focused on how to demonstrate or achieve particular personal virtues, or about how to pay down imagined debts I have to particular people, to society, etc. I should be focused on how to help others get what benefits them.

This piece will lay out the case that the quest for an other-centered ethics leads naturally to utilitarian-flavored systems with a number of controversial implications. These include a number of views that are common in the effective altruism community but off-putting to many, including:

"Given the state of the world today, there's a strong case for prioritizing animal welfare over human welfare."
"The highest moral calling might be to maximize the number of people who will ever exist."

The arguments in this piece aren't original: they mostly derive from a 1955 argument now known as Harsanyi's Aggregation Theorem. Most of the piece will be examining the theorem's assumptions and implications, and most of what I'll say can be found in one paper or another. This piece differs from existing writings on the topic (as far as I can tell) in the following ways:

I aim to fully connect the dots from the idea of "other-centered ethics" to some of the specific controversial views associated with the effective altruist community, while making minimal assumptions, examining the ones I do make, and assuming no philosophical or technical background. I am particularly interested in examining the assumptions that go into Harsanyi's Aggregation Theorem, as I think there is a lot of room to debate and reflect on these, while the reasoning connecting the assumptions to the conclusion doesn't seem in doubt.
Most pieces on this topic sit within the philosophy literature. They discuss the merits of utilitarianism vs. other specific ethical systems such as deontology. By contrast, this piece is aimed at people who are interested in doing as much good as possible rather than settling philosophical questions, and who don't find it obvious how the latter contributes to the former.
In particular, I aim to address the question: "Why should I act according to controversial, unintuitive implications of utilitarianism, when I could instead use my intuition to decide what counts as doing the most good (and use math only in cases that obviously call for it, such as comparing one method of preventing deaths to another)?"

I think the idea of other-centered ethics has significant appeal, and it pulls me significantly toward some of the controversial positions associated with utilitarianism and effective altruism. I also think the idea has limitations, and I don't think it's realistic or a good idea to try to live entirely by (or even base one's philanthropy entirely on) the idea of other-centered ethics. In future pieces, I hope to explore what an other-centered ethics looks like, and key intuitions both for and against trying to live by it.

Brief summary of the rest of the piece:

There is a set of key assumptions that looks very appealing if your goal is an other-centered ethics:
- There is a particular set of persons¹ that matters ethically. (Example sets would be "All humans in the universe" or "all sentient beings.")
- For each person, we can talk meaningfully about which states of the world are "better for them" than others.
- In order to achieve a certain kind of consistency in our ethical decisions, we should use expected utility to model the sense in which some states of the world are better than others (both for individual people, and overall).
- If we're choosing between two states of the world, and one state is better for at least one person and worse for no one, then our ethical system should always prefer that state.
These assumptions can be formalized in different ways. But broadly speaking, once you embrace all of them, something like Harsanyi's Aggregation Theorem can establish the conclusion that your ethics ought to take a utilitarian form. That is, the goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each person's utility ("utility" meaning "how good the state of the world is for that person").
It's not immediately clear what's important about this utilitarian form. There are many different possible approaches to ethics that could be expressed in this form, and many ethical positions could be justified in utilitarian terms.
A key consequence that I want to focus on is the treatment of what I call utility legions. A "utility legion" is some large set of persons who can benefit from our actions in relatively straightforward and linear ways - for example, hundreds of millions of people overseas who could all benefit from a $5 bednet, or billions of animals on factory farms who could all benefit from campaigns for better conditions, or very large numbers of future people who could all benefit from our preventing extinction.
- Many people intuitively want to treat these benefits "nonlinearly": perhaps you place significant value on one person getting a bednet, but you wouldn't place a million times as much value on a million people getting bednets.
- Utilitarianism disallows this. If you want to hold onto the above assumptions, you need to treat a million people getting some benefit as exactly a million times as valuable as one person getting that benefit. As a result, for a big enough utility legion, scaling up the benefits to that set of persons drowns out all other considerations.²
- As such, a utilitarian ethics can come to be dominated by the question, "How can we maximize the number of people who get bednets / animals who get better conditions / future people who get to exist?" - treating everything else, such as your impact on your local community, as instrumental to that ultimate question.
You can choose (as many do) not to use utilitarianism as your only ethical system, while considering it the best available system for questions like "Where should I donate?" and "What career track should I follow?"
There are many versions of utilitarianism, and many questions that this piece doesn't go into. But in my view, the most interesting and controversial real-world aspect of utilitarianism - the way it handles "utility legions" - does not require a fully fleshed out ethical theory, and can be supported based on simple assumptions consonant with the idea of other-centered ethics. I use the term "thin utilitarianism" to refer to a stance that accepts utilitarianism's consequences (and in particular its take on "utility legions"), at least in certain contexts, without committing to any particular fully-fleshed-out version of utilitarianism.

Harsanyi's Aggregation Theorem (HAT)

In my view, the best case for putting heavy weight on utilitarian-flavored ethics - and taking on board some of the uncomfortable views associated with them - revolves around a theorem called Harsanyi's aggregation theorem (hereafter HAT), and related ideas.

HAT is a theorem connecting a particular set of assumptions about ethics (which I will enumerate and discuss below) to a conclusion (essentially that one should endorse an ethics that has a certain kind of utilitarian flavor).³ If you accept the assumptions, the conclusion is mathematically proven, so the big question is whether you accept the assumptions.

Below, I list the assumptions and conclusion at a high level; then link to appendices explaining the logic of how HAT connects the assumptions to the conclusion; then go through the assumptions in more detail. My goal here is to examine the assumptions at a high level and address what is and isn't attractive about them.

My presentation is somewhat impressionistic, and other papers (see previous footnote) would be better resources for maximally precise, technical presentations.

I think the assumptions should be quite appealing for one seeking an "other-centered ethics," though (a) there are other approaches to ethics under which they are much less appealing; (b) they leave many questions open.

My high-level gloss on the assumptions driving HAT

I would roughly say that HAT presents a compelling case for a form of utilitarianism, as long as you accept something reasonably close to the following:

There is a particular set of persons that matters ethically, such that the goal of our ethics is to benefit these persons. (Example sets would be "All humans in the universe" or "all sentient beings").
For each person, we can talk meaningfully about which states of the world are "better for them" than others.
- This might map to which states of the world they personally prefer, or to which states of the world they would prefer under various idealized conditions, or something else.
- It doesn't have to have anything to do with pleasure or happiness. We could, if we want, say a state of the world is "better for" person X iff it makes person X happier, but we don't need to, and I don't personally take this approach.
In order to achieve a certain kind of consistency (more below) in our ethical decisions, we should use the expected utility framework to model both (a) the sense (from #2) in which some states of the world are better for person X than others; (b) the sense in which some outcomes are ethically better than others.
- This assumption is not obviously appealing, but there is a deep literature, better explained elsewhere, that I think makes a strong case for this assumption.
- The main way it will enter in: say that there are two different benefits a person might gain, benefit A and benefit B. Using the expected utility framework implies that:
  - One of the two benefits is “better” - more of a benefit - than the other, such that it would be ethically better for the person to gain that benefit. (Or the two are exactly equally good.)
  - If one of the benefits is better, then there’s some probability, P, such that a P probability of the better benefit is as good as a 100% probability of getting the worse benefit. For example, say that having a nice meal would be a benefit, and finding a long-lost family member would be a benefit. Then a nice meal is exactly as good as a 1/1000, or perhaps 1/10,000 or 1/100,000 (or some number) chance of finding a long-lost family member. We can also express this by saying that finding a long-lost family member is “1000 times (or 10,000 times, or 100,000 times, or some amount) as good as a nice meal.”
  - Similar dynamics would apply when comparing one cost to another, or a benefit to a cost, though the details are a little different.
If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

(Note that I am not including the "impartiality" assumption or anything about interpersonal utility comparisons. I will discuss this later on. If you don't know what this means, ignore it.)

These assumptions can be formalized in different ways. But broadly speaking, once you embrace all of them, something like HAT can establish the conclusion: the goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each ethically significant person's utility ("utility" meaning "how good the state of the world is for that person").

That is, the goal of your ethics should be to maximize the numerical quantity:

X = U_1 * W_1 + U_2 * W_2 + … U_N * W_N

Where U_1 represents person 1’s utility in a given state of the world (a number representing how good that state of the world is for person 1); W_1 represents the ethical “weight” you give person 1 (how much your ethical system values that person relative to others); and the other U’s and W’s are defined similarly, for each of the N persons.

This does not resolve which weight you should use for each person, but it establishes a basic form for your ethics - a weighted sum of utilities - and in my view, it means you're now likely to embrace many of the positions that utilitarianism is known, and criticized, for.

In particular, a corollary of HAT (and hence, derivable from the assumptions above) is the utility legions corollary: a given benefit/harm affecting N persons is N times as good (from an ethical perspective) as a given benefit/harm affecting one person (assuming that the one person, and the benefit they receive, is representative of the N). Hence, if there is a large enough set of persons who benefits (or anti-benefits) from a particular action, that alone can make the action ethically (un)desirable, outweighing all other considerations. I’ll elaborate on this below, in the section on utility legions.

The proof

If you want a short, technical proof of HAT, I recommend Part III of Harsanyi (1955), the paper that introduced it.

I've made a few attempts at a more intuitive walkthrough, though I'm not 100% satisfied with any of them. They are:

Appendix 1: a hypothetical example of HAT in practice
Appendix 2: The utility legions corollary. I find this proof/walkthrough more intuitive and simple than the general proof of HAT, and I believe that "utility legions" are where the most controversial and interesting aspects of utilitarianism lie today.
Appendix 3: A walkthrough of the logic of Harsanyi's Aggregation Theorem. This is my attempt to walk through a fully "general" proof in an intuitive way that doesn't lean too hard on already accepting expected utility theory. It has a lot of steps and may not be easy to follow (though it does not require any background).

I also think it would be OK to simply take my word that the assumptions imply the conclusion, and move on. I don’t think much in particular hinges on understanding the “why” of HAT. And sometimes it’s most fruitful to understand a proof after one has a really clear picture of what assumptions are needed for the proof, and of what the most interesting/nonobvious/controversial implications of the proof are. The rest of the essay is about these topics, so it could make sense to simply read on, and check out the appendices once you have a clearer sense of what’s at stake.

Examining HAT's assumptions

Having given a high-level sketch, I'll now look more closely at each of the four assumptions driving HAT's result.

I think it's worth discussing each assumption in some depth. In technical discussions of HAT, there is often a lot of focus on formalizing each assumption, and most of the assumptions have several candidate formalizations. Here, I'll be using a different emphasis: discussing the high-level intuitions behind each assumption, noting some of the open questions each assumption raises, pointing to literature on these open questions and naming what I consider the most reasonable and common ways of answering them, and discussing how the assumption connects to (or in some cases seems to rely on) the idea of other-centered ethics.

In general, I think the assumptions have enough intuitive appeal that thinking through each of them is likely to make you put significant weight on some form of utilitarianism. However, I don't consider any of the assumptions to be wholly straightforward or unproblematic. I think they all raise some tough issues for further thought and study.

In my view, the assumptions that seem initially most unintuitive and hardest to swallow (#2 and #3 above, the ones comprising expected utility theory) are the ones I'm most inclined to hold onto, while the more innocuous sounding ones (#1 and #4 above) are the ones that my anti-utilitarian side sees as the weakest links.

The population of moral patients (assumption 1)

There is a particular set of persons that matters ethically, such that the goal of our ethics is to benefit these persons. (Example sets would be "All humans in the universe" or "All sentient beings.")

At first glance this assumption seems quite innocuous. It seems hard to get around this assumption if you want an "other-centered" ethics: it's essentially stating that there is a distinct set of "others" that your ethics is focused on.

However, in practice, I think a lot of the biggest debates over utilitarianism concern who exactly is in this "set of persons that matters ethically.”

Historically, utilitarians have sometimes been distinguished from others by advocating for the interests of marginalized populations, such as women, homosexuals, children, prisoners and slaves. In hindsight these disagreements look like big wins for utilitarians.
Utilitarianism has also often been associated with animal rights, particularly via Animal Liberation. The question of how exactly to consider animals' interests (do they count as "persons who matter ethically?" As fractions thereof?) is a live one for utilitarians, and discussed by Open Philanthropy at Luke Muehlhauser's report on consciousness and moral patienthood.
There is a lot of ambiguity about how to handle a variable-size population in this framework: how do we evaluate actions that could lead to a larger or smaller population? Is the "particular set of persons that matters ethically" just the persons who are alive now (so if they all would benefit from creating a large number of new people and making them suffer, that's fine?)? Is it the set of all persons who could ever exist (so creating more people is good, as long as each of those people "benefits from" being created relative to remaining a mere "potential person")? The field of population ethics deals with these issues, and many (including myself) feel there are no satisfying frameworks that have been come up with so far.⁴
There is also ambiguity about how to handle people who change in fundamental ways over time, such that there's no single answer to "what would be best for them." For example, is it better for a five-year-old to stay in their home country or move to another? It could be that whichever happens, the resulting adult will consider the other option to have been unacceptable. So which of those possible "adults" is in the "set of persons that matters ethically," or should we somehow try to consider both?
- A common approach is to treat each "person-moment" as a distinct person, i.e., "Holden on May 10, 2020 at 6:35pm" is a distinct person with distinct interests from "Holden on May 10, 2020 at 6:36pm." This is not without its issues, and makes utilitarianism more unintuitive.
There are many more places where ambiguity could enter in, especially when contemplating long-run futures with radically different technology. (Could a computer program count among the "set of persons that matters ethically?")
If the "set of persons that matters ethically" turns out to contain infinite persons, the math of HAT likely no longer works.⁵ This issue is generally discussed under the heading of "infinite ethics," and poses a substantial challenge to utilitarianism of all kinds. (Also see this recent Hands and Cities post, which I haven't finished but am enjoying so far.)

All of the above points pertain to the "There is a particular set of persons that matters ethically" part of the assumption. "The goal of our ethics is to benefit these persons" has issues as well, mostly because it is entangled with the idea of "other-centered" ethics. I'll discuss this more under assumption 4.

In sum, there are a lot of questions to be answered about the "set of persons that matters ethically," and different utilitarians can take very different views on this. But utilitarians tend to assume that it's possible in principle to arrive at a good answer to "which set of persons matters ethically?", and while I don't think this is clearly right, I think it is a reasonable view and a good starting point.

Utility (assumption 2)

For each person, we can talk meaningfully about which states of the world are "better for them" than others.

I think this assumption is the source of significant confusion, which is why I thought it was worth splitting out from Assumption 3 below.

Early utilitarians proposed to base ethics on "pain and pleasure" (e.g., maximizing pleasure / minimizing pain). This approach is still a live one among utilitarians,⁶ and utilitarianism is sometimes seen as incompatible with ideas like "we should value things in life other than pain and pleasure."

But the version of HAT I've presented doesn't need to work with the idea of pain or pleasure - it can work with any sense in which one state of the world is "better for" person X than another.

The terms "well-being," "welfare" and "utility" are often used to capture the broad idea of "what is good for a person." To say that situation 1 has greater "utility" for person X than situation 2 is simply to say that it is, broadly speaking, a better situation for person X. Different people have different opinions of the best way to define this, with the main ones being:

Hedonism, the original utilitarian position: situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if it gives them a more favorable balance of pleasure and pain.
"Preference" or "desire" theories of well-being: situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if person X prefers that situation - or would prefer it under some hypothetical construction such as "If person X were highly informed about all the possible situations and what they would be like." I am personally partial to the latter approach (called "informed desire" in the SEP article I linked to). (See this comment for an additional refinement of "prefers" that I agree with.)
"Objective list" theories of well-being: situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if it contains more of certain objective properties: for example, if it involves X having more knowledge of the world, or better relationships, etc., regardless of how X feels about these things.⁷

The SEP article I linked to gives a good overview of these options, as does this excerpt from Reasons and Persons. But you need not choose any of them, and you could also choose some weighted combination. Whatever you choose as your framework for defining utility, if you take on board the other assumptions, HAT's conclusion will apply for maximizing total weighted utility as you define it.

For the rest of this piece, I will straightforwardly use the idea of "utility" in the following sense: to say that person X's utility is greater in situation 1 than in situation 2 means that situation 1 is better for person X than situation 2, in whichever of the above senses (or whatever other sense) you prefer.

Expected utility (assumption 3)

In order to achieve a certain kind of consistency in our ethical decisions, we should use the expected utility framework to model both (a) the sense (from #2) in which some states of the world are better for person X than others; (b) the sense in which some outcomes are ethically better than others.

In Appendix 3, expected utility mostly makes its appearance via multipliers used to compare two worlds - the idea that any given world can always be described as “___ times as good” as another (where ___ is sometimes less than 1 or negative).⁸ In Appendix 1, it makes its appearance via the unrealistic assumption that each of the people in the example “cares only about how much money they have, and they care linearly about it.”

The basic idea of expected utility theory is:

There is some possible set of "events" or "situations" or "states" or "worlds" that person X could land in. (I'll use "state" here and say a bit more about my terminology below.)
For any two states, there is a single answer (knowable in theory, though not necessarily in practice) to the question: "Which of these two states gives person X greater utility?" "Utility" here has the same meaning that I concluded the previous section with.
We can represent the previous point as follows: there is a "utility function" associated with person X, which maps each possible state to some number representing the utility that person X would have in that state.
- Say there is a set of N possible states, which we'll denote as S_1 (state 1), S_2, ... S_n. In this case, we could use U(S_1) to denote a number representing X's utility in state S_1, U(S_2) for X's utility in state S_2, etc.
- If U(S_1) > U(S_2), that means state 1 has greater utility (for the person in question) than state 2, which means (per the previous section) that state 1 is better for the person in question than state 2.
Sometimes we need to decide how good a "gamble" between multiple different states is for person X. For example, we might ask whether it's better for person X to have (a) gamble G_1, which gives a 10% chance of landing in state S_1, and a 90% chance of landing in state S_2; (b) gamble G_2, which gives a 50% chance of landing in state S_1, and a 50% chance of landing in state S_3.
Every "gamble" can be evaluated simply by computing the expected value of the utility function.
- So in this example, U(G_1) = 10% * U(S_1) + 90% * U(S_2); U(G_2) = 50% * U(S_1) + 50% * U(S_3).
- If U(G_1) > U(G_2), this tells us that G_1 is "better for" the person (in the sense discussed previously) than G_2.
We can also imagine an ethical utility function that denotes whether one state of the world is morally preferred to another, and works similarly.

A terminology note: throughout this piece, I generally use the term "person-state" or the abbreviation "state" when I want to refer specifically to the state a particular person finds themselves in. I use the terms "states of the world," "world-states" and "worlds" to refer to configurations of person-states (if you know the state of the world, you know which person-state each person is in).

Of the four assumptions, I think this is the one that is hardest to grasp and believe intuitively. However, there is a significant literature defending the merits of the expected utility framework. A common theme in the literature is that any agent that can't be well-modeled using this framework must be susceptible to Dutch Books: hypothetical cases in which they would accept a series of gambles that is guaranteed to leave them worse off.

These arguments have been made in a number of different ways, sometimes with significant variations in how things like "situations" (or "events") are defined. I link to a couple of starting points for relevant literature here.

I want to be clear that expected utility does not seem like a very good way of modeling people's actual behavior and preferences. However, I think it is apt in an ethical context, where it makes sense to focus on idealized behavior:

We want our own choices about which actions to take to pass a "Dutch book" test: it shouldn’t be possible to construct a series of gambles such that our ethical system is guaranteed to make bad choices by its own lights.
When considering the interests of others, we want to imagine/construct those in a way where we couldn't get "Dutch booked" on their behalf, i.e., make a series of decisions that predictably leaves one of the persons we're trying to benefit worse off. This may require constructing utility functions in funky ways (see Assumption 2) rather than basing them on the literal preferences or behaviors of the persons in question.
I would also say that the expected utility framework only enters into HAT in a limited way, and it seems quite reasonable in that context. For a sense of this, you can see Appendix 3, which generally avoids taking the expected utility framework for granted, instead flagging all of the times it is making some assumption in line with that framework.

Overall, I think there is a very strong case for using the expected utility framework in the context of forming an “other-centered ethics,” even if the framework has problems in other contexts.

Pareto efficiency (assumption 4)

If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

I think this idea sounds quite intuitive, and I think it is in fact a very appealing principle.

However, it's important to note that in taking on this assumption, you're effectively expunging everything from morality other than the individual utility functions. (You could also say assumption #1 does this, but I chose to address the matter here.)

For example, imagine these two situations:

Person A made a solemn promise to Person B.
- (1a) If they break the promise, Person A will have U(A) = 50, and Person B will have U(B) = 10.
- (1b) If they keep the promise, Person A will have U(A) = 10, and Person B will have U(B) = 20.
Person A and B happen across some object that is more useful to Person A than to Person B. They need to decide who gets to keep the object.
- (2a) If it's Person A, then Person A will have U(A) = 49.9, and Person B will have U(B) = 10.
- (2b) If it's Person B, then Person A will have U(A) = 10.1, and Person B will have U(B) = 20.

Pareto efficiency says that (2b) is better than (1b) (since it is better for Person A, and the same for Person B) and that (2a) is worse than (1a) (same reason). So if you prefer (2a) to (2b), it seems you ought to prefer (1a) to (1b) as well.

What happened here is that Pareto efficiency essentially ends up forcing us to disregard everything about a situation except for each person's resulting expected utility. Whether the utility results from a broken promise or from a more "neutral" choice can't matter.

This is pretty much exactly the idea of "other-centered ethics" - we shouldn't be considering things about the situation other than how it affects the people concerned. And because of this, many people will bite the above bullet, saying something like:

If the four cases were really exactly as you say with no further consequences, then I would in fact prefer Case (1a) to Case (1b), just as I prefer Case (2a) to Case (2b). But this is a misleading example, because in real life, breaking a promise has all sorts of negative ripple effects. Keeping a promise usually will not have only the consequences in these hypotheticals - it is usually a good thing for people. But keeping a promise is not intrinsically valuable, so if it did have only these consequences, it would be OK to break the promise.

I don't think this response is obviously wrong, but I don't think it's obviously right either.

I, and I suspect many readers, don't feel confident that the only thing that matters about keeping a promise is the long-run consequences. For example, I would often choose to keep a promise even in cases where it seemed certain that breaking it would have no negative consequences for anyone (and might have slight positive consequences for someone).
There are some attempts to argue that no matter how much it seems breaking a rule would have no negative consequences for anyone, there are (often exotic) ways in which it actually does have negative consequences. For example, there is a family of unorthodox decision theories that could potentially (in some cases) provide arguments for "doing the right thing" even when you're sure the direct consequences won't be good. I personally am pretty sympathetic to the intuitions behind these decision theories, but they're complex and seemingly not very developed yet, so I don't think we should count on them for arbitrating cases like "should you break a promise."
Ultimately, my view is that given the current state of knowledge of the world (and of decision theory), statements like "Breaking a promise will lead to negative consequences for the set of persons that matters ethically" can't be given a convincing defense. My conviction that "breaking a promise is unethical, all else equal" is far greater than my conviction that "breaking a promise tends toward negative consequences for the set of persons that matters ethically."

I'll discuss these themes more in a later piece. For now, I'll conclude that the "Pareto efficiency" assumption seems like a good one if your goal is other-centered ethics, while noting that it does have some at least potentially uncomfortable-seeming consequences.

Impartiality, weights and interpersonal comparisons

I stated above:

The question of what weight to use for each person is quite thorny, though I don't consider it central to most of the points in this piece. Here I'll cover it briefly.

In Appendix 1, I invoke a case where it implicitly/intuitively seems obvious that one should "weigh everyone equally." I talk about a choice between four actions:

Action 2A results in Person A having $6000, and each of the other three having $1000.
Action 2B results in Person B having $6000, and each of the other three having $1000.
Action 2C results in Person C having $6000, and each of the other three having $1000.
Action 2D results in Person D having $6000, and each of the other three having $1000 ...

In this case, it seems intuitive that all of these actions are equally good. We haven't given any info to distinguish the persons or their valuations of money.

The idea of "weighing everyone equally" is often invoked when discussing HAT (including by Harsanyi himself). It's intuitive, and it leads to the appealingly simple conclusion that actions can be evaluated by their impact on the simple, unweighted sum of everyone's utilities - this is the version of utilitarianism that most people are most familiar with. But there is a (generally widely recognized) problem.⁹

The problem is that there are no natural/obvious meaningful "units" for a person's utility function.

Recall that we took on the assumption that "we can use the expected utility framework to model ... the sense ... in which some states of the world are better for person X than others." The expected utility framework tells us that each person has a utility function that can be used to determine their preferences between any two situations. It does not give us any guidance on how to compare one person's utility to another's.
You could think of it a bit like the construct of "points" in sports: if you see a football score of 56-49 and another score of 24-21, you can see which team outperformed the other within each game, but you can't necessarily say anything about how the team that scored 56 points compares to the team that scored 24 points.¹⁰
In Appendix 3, we explicitly "normalize" each person's utility: first we decide how ethically valuable it is for a person to get some particular benefit (the example given is a piece of fruit), then we adjust everything else relative to that. This leaves us with the question of how to decide how ethically valuable it is for each person to get a piece of fruit. Presumably everyone is going to benefit from that fruit in different ways - is there any truly fair way to compare these? Appendix 3 tried to get by without finding such a fair way; it says you can set the valuations for different people getting an extra piece of fruit however you want, and goes from there.
Here's how bad it is: if you write down a person's utility function, and then multiply the whole thing by 2 (or by 1000, or by 1/1000, or any constant), nothing changes. That is, if person A is well-modeled by a utility function that is equal to 10 in situation 1 and 20 in situation 2, they'd be equally well-modeled by a utility function that is equal to 1000 in situation 1 and 2000 in situation 2. Expected utility theory gives us nothing to go on in terms of determining which of these utility functions is the "right" one.

Imagine that we're asking, "Which is better, the situation where person A has utility 10 and person B has utility 20, or the situation where person A has utility 25 and person B has utility 10?"

If we're trying to "weigh everyone equally" we might say that the second situation is better, since it has higher total utility. But we could just as easily have asked, "Which is better, the situation where person A has utility 10 and person B has utility 200, or the situation where person A has utility 25 and person B has utility 100?" Multiplying all of B’s utilities by 10 doesn’t matter in any particular way, as far as the expected utility framework is concerned. But it leads us to the opposite conclusion.

So how do we decide what utility function to use for each person, in a way that is "fair" and allows us to treat everyone equally? There are a couple of basic approaches here:

Interpersonal comparisons and impartiality. Some utilitarians hold out hope that there are some sort of "standardized utility units" that can provide a fair comparison that is valid when comparing one person to another. There would need to be some utility function that fulfills the requirements of expected utility theory (e.g., whenever the utility of a person in situation 1 is higher than their utility in situation 2, it means situation 1 is better for them than situation 2), while additionally being in units that can be fairly and reasonably compared across persons.

If you assume that this is the case, you can stipulate an "impartiality" assumption in addition to the assumptions listed above, leading to a version of HAT that proves that actions can be evaluated by their impact on the simple, unweighted sum of everyone's utilities. This is the version of HAT covered in Nick Beckstead's "intuitive explanation of Harsanyi's Aggregation Theorem" document.

You might end up effectively weighing some persons less than others using this approach. For example, you might believe that chickens simply tend to have fewer "standardized utility units" across the board. (For example, under hedonism - discussed next - this would mean that they experience both less pleasure and less pain than humans, in general.)

A common way of going for "standardized utility units" is to adopt hedonism, briefly discussed above: "situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if it gives them a more favorable balance of pleasure and pain." The basic idea here is that everything that is morally relevant can, at least in principle, be boiled down to objective, standardized quantities of pain and pleasure; that the entire notion of a situation being "good for" a person can be boiled down to these things as well; and hence, we can simply sum the pain and pleasure in the world to get total utility. Hedonism has issues of its own (see SEP article), but it is a reasonably popular variant of utilitarianism, as utilitarianisms go. (I'm personally not a big fan.)

Another potential approach is to use the "veil of ignorance" construction, introduced by Harsanyi 1953 and later popularized and reinterpreted by Rawls: "imagine you had to decide how to structure society from behind a veil of ignorance. Behind this veil of ignorance, you know all the facts about each person’s circumstances in society—what their income is, how happy they are, how they are affected by social policies, and their preferences and likes. However, what you do not know is which of these people you are. You only know that you have an equal chance of being any of these people. Imagine, now, that you are trying to act in a rational and self-interested way—you are just trying to do whatever is best for yourself. How would you structure society?"¹¹

As Harsanyi (1955) suggests, we could imagine that whenever a person makes decisions based on how they themselves would set up the world under the veil of ignorance, that person is correctly following utilitarianism. That said, it's not clearly coherent to say that the person making the decision could be randomly assigned to "be another person," or that they could reasonably imagine what this would be like in order to decide how to set up the world. It’s particularly unclear how one should think about the probability that one would end up being a chicken, given that chickens are far more numerous than humans, but may be less ethically significant in some other respect.

The veil of ignorance is generally a powerful and appealing thought experiment, but I expect that different people will have different opinions on the extent to which this approach solves the "interpersonal comparisons of utility" problem.

A final approach could be to use variance normalization to create standardized units for people's utilities. I won't go into that here, but I think it's another approach that some will find satisfying and others won't.

Under the second two approaches above (and sometimes/perhaps the first as well?), some people will want to diverge from equal-weighting in order to accomplish things like weighing animals' interests less heavily than humans, potential people's interests less heavily than existing people's, etc. But the default/starting point seems to be counting everyone the same, once you have standardized utility units that you're happy with.

Utilitarianism with "unjustified" weights. Another approach you can take is to accept that there is no truly principled way to make interpersonal utility comparisons - or at least no truly principled way that our current state of knowledge has much to say about - and try to maximize the weighted sum of utilities using weights that:

Seem intuitively reasonable, fair, and impartial.
Are set up to "adjust for" any goofiness in how the utility functions are set up. For example, I sometimes imagine someone has presented to me a framework in which person A's utility function assigns 1000x as much positive or negative utility to a given situation, on average, as person B's, and that I accordingly set the “weight” for person A at 1/1000.
Reflect one's own value judgments about which persons should be weighed more heavily than others. For example, one might weigh animals less heavily than humans, potential persons less heavily than existing persons, etc. - not because one believes they have "less of" some standardized unit such as pleasure and suffering, but simply because one values their interests less.

I personally tend toward this approach. I don't think we have any satisfying standardized units of utility, and I don't think it matters all that much: when we consider the real-world implications of utilitarianism, we are rarely literally writing down utilities. As I'll discuss lower down, most of the controversial conclusions of utilitarianism seem like they would arise from a wide range of possible choices of weights, indeed from basically any set of weights that seems "ballpark reasonable."

Quick recap

I'm going to quickly recap the assumptions that power HAT, and the conclusion it arrives at, now that I've talked them all through in some detail.

Assumptions:

1 - There is a particular set of persons that matters ethically, such that the goal of our ethics is to benefit these persons. (Example sets would be "All humans in the universe" or "all sentient beings.")

There is a lot of ambiguity about how to choose this set of persons. Particularly challenging conundra arise around how to value animals, computer programs, and "potential lives" (persons who may or may not come to exist, depending on our actions). And we get into real trouble (breaking HAT entirely) if the set of persons that matters ethically is infinite. As I'll argue later, I think the idea of "other-centered ethics" has its biggest problems here. On the other hand, there are plenty of times when you're basically sure that some particular set of persons matters ethically (even if you're not sure that there aren't others who matter as well), and you can (mostly) proceed.

2 - for each person, we can talk meaningfully about which states of the world are "better for them" than others.

There are debates to be had about what makes some states of the world "better for" a person than others. You can define "better for" using the idea of pleasure and pain (hedonism, the original version of utilitarianism); using the idea of preferences or idealized preferences (what states of the world people would prefer to be in); or using other criteria, such as considering certain types of lives better than others regardless of how the persons living them feel about the matter. Or you can use a mix of these ideas and others.

3 - we should use the expected utility framework to model both (a) the sense (from #2) in which some states of the world are better for person X than others; (b) the sense in which some outcomes are ethically better than others.

The basic intuition here is that our ethics shouldn't be subject to "Dutch books": it shouldn't be possible to construct a series of choices such that our ethics predictably makes the persons we are trying to benefit worse off. If you accept this, you will probably accept the use of the expected utility framework. The framework has many variants and a large literature behind it.

4 - If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

This is the "Pareto efficiency" idea that makes sure our ethics is all about what's good for others. It ends up stripping out considerations like "I need to keep my promises" unless they can be connected to benefiting others.

Conclusion: the goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each ethically significant person's utility.

The weight you use for each person represents how much your ethical system "values their value": how good it is for the world for them to be better off. One way to think about it is as the value your ethical system places on their getting some particular benefit (such as $5), divided by the value they themselves place on getting that benefit. The weight you use may incorporate considerations like "how much pleasure and pain they experience," "how much they count as a person," and adjustments to make their units of value fairly comparable to other people. I don't think there is any really robust way of coming up with these weights, but most of the controversial conclusions of utilitarianism seem like they would arise from a wide range of possible choices of weights, indeed from basically any set of weights that seems "ballpark reasonable."

HAT and the veil of ignorance

The "veil of ignorance" idea mentioned briefly above can in some sense be used as a shorthand summary of HAT:

When deciding what state of the world it is ethically best to bring about, you should ask yourself which state of the world you would selfishly choose, if you were (a) going to be randomly assigned to be some person under that state of the world, with exactly the benefits and costs they experience in that state of the world; (b) determined to maximize your own expected utility.

This implicitly incorporates all of the assumptions above:

Assumption 1 ("There is a particular set of persons that matters ethically") is incorporated in the assumption that we can usefully talk about being "randomly assigned to be some person under that state of the world."
Assumptions 2 and 3 (using the expected utility framework to model benefit) are incorporated in the idea that you can take an expected value over the utility you would experience as each of the persons you might be randomly assigned to be.
Assumption 4 is the part connecting impersonal ethics to individual benefit. It tells you that you should consider states of the world that are "better for someone, worse for no one" to be ethically preferred. This is implicitly incorporated in the idea of imagining yourself to be about to become a random person, rather than just aiming for a state of the world that benefits you directly.
And while individuals' takes on this will vary, if you can imagine how good it would be for yourself to become someone else, this can serve as a way of "standardizing" utility across persons, and thus implicitly assigning weights to everyone's utilities in an arguably impartial way.

The veil of ignorance idea as stated above implies utilitarianism as surely as HAT does, and if you find it intuitive, you can often use it as a prompt to ask what a utilitarian would do. (That said, the debate between utilitarians and Rawls shows that there can be quite a bit of disagreement about how to interpret the veil of ignorance idea.)

What you commit to when you accept Harsanyi's Aggregation Theorem

Let's say that you are committed to other-centered ethics, and you accept the "conclusion" I've outlined implied by HAT:

The goal of your ethics should be to achieve the state of the world that maximizes the expected value of X, where X is some weighted sum of each ethically significant person's utility.

Practically speaking, what does this mean you have committed to? At first glance, it may seem the answer is "not much":

In theory, you can define "utility" however you want,¹² and use whatever weights you want.¹³
Furthermore, it's very hard to identify the full global long-run consequences of any action. An action might seem like it helps someone and hurts no one, which would mean it is ethically superior according to Pareto efficiency and according to utilitarianism. But it might have hard-to-predict long-run consequences, or even exotic consequences in worlds unlike ours (see discussion of decision theory above). Many historical debates over utilitarianism are about whether it "really" favors some particular action or not (for example, see the "transplant" argument discussed here).

These points initially appear to make the conclusion of HAT relatively uninteresting/unimportant, as well as uncontroversial. Most defenses of utilitarianism don't lean solely on HAT; they also argue for particular conceptions of utility, impartiality, etc. that can be used to reach more specific conclusions.

However, I believe that accepting HAT's conclusion does - more or less - commit your ethical system to follow some patterns that are not obvious or universal. I believe that these patterns do, in fact, account for most of what is interesting, unusual, and controversial in utilitarianism broadly speaking. In what follows, I will use "utilitarianism" as shorthand simply for "the conclusion of HAT," and I will discuss the key patterns that utilitarianism seems committed to.

Utilitarianism implies consequentialism. When everyone agrees that Action 1 would lead to a better state of the world than Action 2, utilitarianism says to take Action 1. Historically, I think this has been the biggest locus of debate over utilitarianism, and a major source of the things utilitarians can boast about having been ahead of the curve on.
Utilitarianism allows "utility monsters" and "utility legions." A large enough benefit to a single person (utility monster), or a benefit of any size to a sufficiently large set of persons (utility legion), can outweigh all other ethical considerations. Utility monsters seem (as far as I can tell) to be a mostly theoretical source of difficulty, but I think the idea of a "utility legion" - a large set of persons such that the opportunity to benefit them outweighs all other moral considerations - is the root cause of most of what's controversial and interesting about utilitarianism today, at least in the context of effective altruism.

Consequentialism

"Consequentialism" is a philosophy that says actions should be assessed by their consequences. SEP article here.¹⁴

Imagine we are deciding between two actions, Action 1 and Action 2. We have no doubt that Action 1 will result in a better end-state for the world as a whole, but some people are hesitating to take Action 1 because (a) it violates some rule (such as "don't exchange money for sacred thing X" or "don't do things you committed not to do"); (b) it would be a break with tradition, would challenge existing authorities, etc.; (c) it inspires a "disgust" reaction (e.g.); (d) something else. Any consequentialist philosophy will always say to take Action 1.¹⁵

Consequentialism is a broader idea than utilitarianism. For example, you might think that the best state of the world is one in which there is as much equality (in a deep sense of "equality of goodness of life") as possible, and you might therefore choose a state of the world in which everyone has an OK life over a state of the world in which some people have great lives and some have even better lives. That's not the state of the world that is best for any of the individual people; preferring it breaks Pareto efficiency (Assumption 4 above) and can't be consistent with utilitarianism. But you could be a consequentialist and have this view.

That said, utilitarianism and consequentialism are pretty tightly linked. For example, here are some excerpts from the SEP article on the history of utilitarianism:

The Classical Utilitarians, Bentham and Mill, were concerned with legal and social reform. If anything could be identified as the fundamental motivation behind the development of Classical Utilitarianism it would be the desire to see useless, corrupt laws and social practices changed. Accomplishing this goal required a normative ethical theory employed as a critical tool. What is the truth about what makes an action or a policy a morally good one, or morally right? But developing the theory itself was also influenced by strong views about what was wrong in their society. The conviction that, for example, some laws are bad resulted in analysis of why they were bad. And, for Jeremy Bentham, what made them bad was their lack of utility, their tendency to lead to unhappiness and misery without any compensating happiness. If a law or an action doesn't do any good, then it isn't any good ...

This cut against the view that there are some actions that by their very nature are just wrong, regardless of their effects. Some may be wrong because they are ‘unnatural’ — and, again, Bentham would dismiss this as a legitimate criterion ... This is interesting in moral philosophy — as it is far removed from the Kantian approach to moral evaluation as well as from natural law approaches. It is also interesting in terms of political philosophy and social policy. On Bentham's view the law is not monolithic and immutable. ...

My impression is that while consequentialism is still debated, the world has moved a lot in a consequentialist direction since Bentham's time, and that Bentham is generally seen as having been vindicated on many important moral issues. As such, with the above passage in mind, it seems to me that a good portion of historical debates over utilitarianism have actually been about consequentialism.

Consequentialism is still not an uncontroversial approach to ethics, and I don't consider myself 100% consequentialist. But it's a broadly appealing approach to ethics, and I put a lot of weight on consequentialism. I also generally feel that utilitarianism is the best version of consequentialism, mostly due to HAT.

Utility monsters and utility legions

As far as I can tell, the original discussion of "utility monsters" is a very brief bit from Anarchy, State and Utopia:

Utilitarian theory is embarrassed by the possibility of utility monsters who get enormously greater gains in utility from any sacrifice of others than these others lose. For, unacceptably, the theory seems to require that we all be sacrificed in the monster’s maw, in order to increase total utility.

Here I'm also going to introduce the idea of a "utility legion." Where a utility monster is a single person capable of enough personal benefit (or cost) to swamp all other ethical considerations, a utility legion is a set of persons that collectively has the same property - not because any one person is capable of unusual amounts of utility, but because the persons in the "legion" are so numerous.

I believe that both utility monsters and utility legions have the following noteworthy properties:

1 - They cause utilitarianism to make clearly, radically different prescriptions from other approaches to ethics.

A utility monster or utility legion might be constituted such that they reliably and linearly benefit from some particular benefit, and therefore ethics could effectively be reduced to supplying them with that benefit. For example, if a utility monster or legion reliably and linearly benefits from pizza, ethics could be reduced to supplying them with pizza. This would be a radically unfamiliar picture of ethics, almost no matter what we replaced "pizza" with.

An especially discomfiting aspect is the breaking of reciprocity.

In most contexts, helping others can be cast as a sort of exchange:¹⁶ you help others, and they help you in turn, and society functions better overall as a result. Many ethical intuitions can be framed as contractualist intuitions. The idea that "what goes around comes around" is often a significant part of how people are motivated to be ethical, and it may be a significant part of the story of why ethics exists at all (see reciprocal altruism). It also can serve as a sort of reality check on ethics: you can often tell whether you've been good to someone by watching how they respond to you.
By contrast, utility monsters and utility legions can be one-way streets: we pour resources into their benefit, and get no reciprocity or even response. Ethics becomes, rather than an integral part of everyday life and societal functioning, a scoreboard seemingly disconnected from everything else going on.

As I'll discuss below, real-life utility legions do in fact seem to have the sort of properties discussed above. They tend to have "simple enough" utility functions - and to present low enough odds of reciprocating - to make the moral situation feel quite unfamiliar. Part of the reason I like the term "utility legion" is its association with homogeneity and discipline: what makes utility-legion-dominated morality unfamiliar is partly the fact that there are so many persons who benefit in a seemingly reliable, linear way from particular actions.

2 - Utilitarianism requires that they be possible. I explain the math behind this in a footnote.¹⁷

Once you've committed to maximize a weighted sum of utilities, you've committed to the idea that utility monsters and utility legions are possible, as long as the monster has enough utility or the legion has enough persons in it.

In real life, there is going to be some limit on just how much utility/benefit a utility monster can realize, and on just how many persons a utility legion can have. But if you're trying to follow good-faith other-centered ethics, I think there also tend to be limits on how low you can set a person's "ethical weight" (the weight applied to their utility function), assuming you've decided they matter at all. With both limits in mind, I think utilitarians are generally going to feel a significant pull toward allowing the welfare of a utility legion to outweigh all other considerations.

The next section will give examples.

Examples of utility legions

As far as I can tell, the "utility monster" is ~exclusively an abstract theoretical objection to utilitarianism. I don't know of controversial real-world utilitarian positions that revolve around being willing to sacrifice the good of the many for the good of a few.¹⁸

Utility legions are another story. It seems to me that the biggest real-world debates between utilitarians (particularly effective altruists) and others tend to revolve around utility legions: sets of persons who can benefit from our actions in relatively straightforward and linear ways, with large enough populations that their interests could be argued to dominate ethical considerations.

In particular:

Global poverty and charity. The famous, and provocative, "child in a pond" argument has a similar structure to Appendix 2 on utility legions.
- First, it tells the story of a single child drowning in a pond, and establishes that we'd be willing to sacrifice thousands of dollars to save the child - this is analogous to assigning the child a reasonably high "ethical weight."
- Then, by pointing out the availability of charities that can save many children's lives, it pulls back and reveals a "utility legion" of children who seem to command the same weight, and thereby dominate the ethical utility function.
- There are many emotionally compelling causes in the US where we have the opportunity to reduce injustice, help the disadvantaged, etc. But once you accept that the global poor count equally to people in the US (or even close), their greater number and need seems (at least to some) to leave little hope of an ethical case for giving any serious attention to these closer-to-home problems. That's a counterintuitive conclusion that follows directly from the assumptions underlying other-centered ethics.
Animals on factory farms. One reason the above example reaches such a counterintuitive conclusion is that there are a large number of people whom we don't normally interact with, but who can seemingly be helped very cheaply. This case becomes more extreme with animals:
- While there are hundreds of millions of people below the global poverty line, there are billions of animals being raised for food in horrible conditions on factory farms in the US alone. As with the previous "utility legion," it seems we have abundant opportunities to help these animals.
- A common argument given in favor of focusing on animals is along the lines of "Even if you only value chickens 1% as much as humans, you can help >100x more chickens than humans with a given amount of money or effort, and this scales to as much money and effort as you will have available for your entire life." This forces you to choose between setting the ethical weight of chickens even lower than 1%, or accept that they dominate ethical considerations.
Extremely numerous, debatably significant "persons." There many be billions of animals on factory farms, but there are also ten quintillion insects in the world - that's 100 million to 1 billion times as many insects as factory farmed animals. Hence, some have argued that insect suffering is an enormously important cause, and could easily dwarf all the others listed so far (which in turn dwarf the ethical considerations we're used to thinking about). Following along the same lines, there are (quite obscure, and generally not terribly serious) discussions of suffering in fundamental physics (~10^80 atoms in the observable universe!) and algorithms.¹⁹
- The usual argument on these topics runs: "Insects/certain types of algorithms/atoms probably don't count as persons whose interests matter ethically. But even if they might count - with probability greater than, say, one in a million - their sheer multiplicity means they could dominate ethical considerations based on what we know."
Future people, including "potential people."
- Astronomical Waste argues²⁰ that if humanity reaches its potential, there could be more than 10^38 well-lived lives in the future; if humanity falls prey to an existential risk, there will be nowhere near that number.
- If you think of a well-lived life in the future as a "person" whose interests matter - such that you can imagine that a world where this person never gets to live their life is worse for them than a world in which they do - then you are assigning an ethical weight to this "person." Once that is accepted, the vast number of potential future people becomes a utility legion, and anything that affects existential risk affects the interests of this "legion" enough to outweigh anything else.
- In some ways, the implications of this particular "utility legion" are less odd and counterintuitive than those of the others. If what we care about is the long-run trajectory of the future, it becomes relatively easy to imagine how many conventional ideas about ethics - particularly with respect to making the world more well-functioning, equitable, etc. - could be justified. But it remains the case that the fundamental picture of "what we're trying to do when we're trying to be ethical" is radically unfamiliar.
Utility legions in thought experiments. I also see utility legions as being the fundamental issue at play in a couple of the most heated theoretical debates about ethics in the effective altruism community: torture vs. dust specks and the repugnant conclusion.

The scope of utilitarianism

The previous section talks about utility legions' coming to "dominate" ethical considerations. Does this mean that if we accept utilitarianism, we need to be absolutist, such that every ethical rule (even e.g. "Keep your promises") has to be justified in terms of its benefit for the largest utility legion?

It doesn't. We can believe that the "other-centered" approach of utilitarianism is the only acceptable one for some sorts of decisions, while believing that other ethical systems carry serious weight in other sorts of decisions.

An example of a view along these lines would run as follows (I have a lot of sympathy with this view, though not total sympathy):

There is more to personal morality than utilitarianism. There are many things I ought to do to be a good person, such as keep my promises. I do them without subjecting them to an expected-utility analysis, and I would (at least possibly) keep doing them even if I were sure they failed an expected-utility analysis.
There is more to good government policy than utilitarianism. Policy needs to have certain practical qualities such as being consistent, systematic, fair, predictable and workable. However, policy should not be subject to the same morality that personal actions are subject to; constraints on policy should ultimately be justified by arguing that they help government/society "function better" in a broad sense, not by an analogy to non-utilitarian personal morality.
There is not necessarily much more to large-scale philanthropy than utilitarianism. There aren't a lot of norms or expectations around philanthropy. Provided that everything we do is legal, non-coercive and non-deceptive, there are often few moral considerations available for deciding where to give - other than utilitarian considerations. Hence, when we can't think of other moral obligations that apply to us as a philanthropist, we should follow utilitarianism, and that's sufficient to get us pretty far.
We could extend this idea to generally say that we'll use utilitarianism as our ethical system for all supererogatory ethically-motivated actions: actions that there is no particular moral obligation or expectation we would take, but that we want to take in order to help others. These could include the choice of where to volunteer, choices about what career track to follow, and more.²¹

Conclusion: Thin Utilitarianism

There are many versions of utilitarianism, and many questions (e.g.) that this piece hasn't gone far into. But in my view, the most interesting and controversial real-world aspect of utilitarianism - the way it handles "utility legions" - does not require a fully fleshed out ethical theory, and can be supported based on simple assumptions consonant with the idea of other-centered ethics.

While I do have opinions on what fleshed-out version of utilitarianism is best, I don't feel very compelled by any of them in particular. I feel far more compelled by what I call thin utilitarianism, a minimal set of commitments that I would characterize as follows:²²

I don't have a worked-out theory of exactly which persons matter ethically. But I believe that the global poor, animals on factory farms, and potential lives lived in the long-run future all matter ethically.
I don't have a worked-out theory of exactly what welfare/utility is. But I believe that preventing malaria is a real benefit (in terms of welfare/utility) to the global poor; that better conditions on factory farms are a real benefit to animals raised on them; and that preventing extinction is a real benefit to potential lives lived in the long-run future.
I'm not committed to expected utility theory, Pareto efficiency, or anything else as universal requirements for my decisions. But when I take supererogatory actions such as charitable giving, I want to be systematic and consistent about how I weigh others' interests, and I want to make sure that I wouldn't make choices that go against the clear interests of all involved, and this calls for using those assumptions in that framework.
As a result, I accept that a sufficient number of malaria cases averted, or animals living in better conditions, or potential lives lived in the future, could have arbitrarily large value, and could come to dominate decisions such as where to donate. If I find myself wanting to value the millionth such benefit less than the first, this impulse is more about my preferences than about others' interests, and I reject it. To do otherwise would be to either deny that these things have value, or to fail to value them in consistent and other-centered ways.

Thanks to Nick Beckstead, Luke Muehlhauser and Ajeya Cotra for detailed comments on earlier drafts.

Appendix 1: a hypothetical example of HAT

The logic of HAT is very abstract, as you can see from my attempt in the previous appendix to make it intuitive without getting bogged down in mathematical abstractions. I'd love to be able to give a “real-world example” that illustrates the basic principles at play, but it's hard to do so. Here's an attempt that I think should convey the "basic feel" of how HAT works, although it is dramatically oversimplified (which includes adding in some assumptions/structure that isn't strictly included in the HAT assumptions above).

Imagine that there are four people: A, B, C, and D. Unusually (and unrealistically), each of these people cares only about how much money they have, and they care linearly about it. That means:

Unlike in real life where going from $1000 to $2000 of wealth is a much bigger change than going from $10,000 to $11,000, these four people value each dollar equally and exclusively. They would rather have $11,000 than $10,000 to the exact extent they'd rather have $2,000 than $1,000.
None of these people cares about the process by which wealth is determined, or how their wealth compares to others, or anything else. They only care about how much wealth they have.
Finally, assume that they are reasonable/rational to have these preferences, i.e., it's not the case that they "would" or "should" care about things that they don't currently care about.

Let's also take on board a blanket "we're in a vacuum" type assumption: your choice won't have any important effects in terms of incentives, future dynamics, etc.

Now you are deciding between two actions, Action 1 and Action 2. If you take Action 1, it will result in the following distribution of wealth:

Person A: $2000
Person B: $2000
Person C: $2000
Person D: $2000

If you take Action 2, it will result instead in:

Person A: $1000
Person B: $1000
Person C: $1000
Person D: $6000

Which action is it ethically better to take?

You might naively think that Action 1 looks more appealing, since it is more egalitarian. However:

Consider Action 1.5, which results in a randomly selected person getting $6000, and each of the others getting $1000. So it has a 25% chance of having the same effect as Action 2, a 25% chance of having a similar effect but benefiting person A instead, etc.

Now consider:

Person A would rather have a 75% chance of having $1000, and a 25% chance of having $6000, than a 100% chance of having $2000. This follows directly from caring exclusively and linearly about money: 75% * $1000 + 25% * $6000 > 100% * $2000. (This point comes from the "expected utility" framework.)
Person B would also rather have a 75% chance of having $1000, and a 25% chance of having $6000, to a 100% chance of having $2000. Same goes for Person C and Person D.
Therefore, each of the four people would prefer Action 1.5 to Action 1.
You might still prefer Action 1 for a number of reasons - perhaps you think equality is important regardless of how the others feel about it - but it seems hard to debate the claim that an "other-centered" ethics prefers Action 1.5 to Action 1. Action 1.5 is preferred by every affected person. Note that by construction, each affected person cares only about how much wealth they end up with (not the process that got there).

If you accept this claim about Action 1.5, should you accept it about Action 2 as well? I would basically say yes, unless for some reason your ethics values Person 4 less than (at least) one of the other persons.

There are a couple of ways of thinking about this:

Equivalence chains.
- Consider some additional alternatives to the actions laid out so far:
  - Action 2A results in Person A having $6000, and each of the other three having $1000.
  - Action 2B results in Person B having $6000, and each of the other three having $1000.
  - Action 2C results in Person C having $6000, and each of the other three having $1000.
  - Action 2D results in Person D having $6000, and each of the other three having $1000. (This is equivalent to the original "Action 2.")
- Action 1.5 is equivalent to an equal-odds lottery between actions 2A, 2B, 2C, and 2D. Therefore, it seems that at least one of these four actions (2A, 2B, 2C, 2D) must be at least as good as Action 1.5. (It seems that the best of four options must be at least as good as a random draw between the four.) The "it seems" here is just an intuitive claim, though in HAT it follows from the use of expected utility (assumption 3 above).
- And if you have no particular preference between persons A, B, C, and D, this implies that actions 2A, 2B, 2C, and 2D are equivalent, meaning they are all at least as good (actually, exactly as good) as Action 1.5, which is itself better than Action 1. The only way to think 2D is worse than 1.5 is to think that one of the others (2A, 2B, 2C) is better, implying you value one of the persons more than the others.
Veil of ignorance.
- Imagine that each of the four people under discussion were deciding whether you would take Action 1 or Action 2 - without knowing which of the four would get the $6000 under Action 2.
- In this case, they would all agree that they prefer Action 2, for the same reason given above: each would prefer a 75% chance of having $1000, and a 25% chance of having $6000, to a 100% chance of having $2000.
- More broadly, if each of the four contemplated a general rule, "Anyone who is contemplating taking an action that affects the distribution of wealth between us should always take the one that maximizes total wealth" - before knowing anything about how the consequences of this rule would play out - it seems they would all agree to this rule, for similar reasons to why they would all agree they prefer Action 1.5 to Action 1. I don't spell out here exactly why that is, but hopefully it is somewhat intuitive.

If you accept this reasoning that Action 2 is better, or at least more "other-centered," than Action 1, similar logic can be used to argue for the superiority of any action that leads to higher total wealth (across the four people) than its alternative.

How this example is unrealistic

I believe the above example is substantially "powered" by this bit:

Unusually (and unrealistically), each of these people cares only about how much money they have, and they care linearly about it. That means:

Unlike in real life where going from $1000 to $2000 of wealth is a much bigger change than going from $10,000 to $11,000, these four people value each dollar equally and exclusively. They would rather have $11,000 than $10,000 to the exact extent they'd rather have $2,000 than $1,000.
None of these people cares about the process by which wealth is determined, or how their wealth compares to others, or anything else. They only care about how much wealth they have.
Finally, assume that they are reasonable/rational to have these preferences, i.e., it's not the case that they "would" or "should" care about things that they don't currently care about.

Let's also take on board a blanket "we're in a vacuum" type assumption: your choice won't have any important effects in terms of incentives, future dynamics, etc.

This set of assumptions is a simplification of assumptions #2 and #3 above.

As an aside, the above example also implicitly contains assumption #1 (since it focuses on a particular, fixed set of four people) and assumption #4 (this is just the part where we assume that any action preferred by everyone affected is the right, or at least "other-centered," action). But I think in the context of the example, those assumptions seem pretty innocuous, whereas the above bit seems unrealistic and could be said to diverge significantly from real life. I'll talk about why each of the above assumptions is unrealistic in order.

Linear valuation of wealth. First, in real life, people generally don't value wealth linearly. Their valuation might be closer to "logarithmic": going from $1000 to $2000 in wealth might be a similar change (from a quality of life perspective) to going from $10,000 to $20,000 or $100,000 to $200,000.

If I had wanted to make the example more realistic and intuitive from this perspective, I might have dropped the statement that people care "linearly" about wealth, and changed the action consequences as follows:

Now you are deciding between two actions, Action 1 and Action 2. If you take Action 1, it will result in the following distribution of wealth:

Person A: $2000
Person B: $2000
Person C: $2000
Person D: $2000

If you take Action 2, it will result instead in:

Person A: $1000
Person B: $1000
Person C: $1000
Person D: $20,000

If you felt unconvinced of the superiority of Action 2 before, and feel more convinced in this case, then you're probably not thinking abstractly enough - that is, you're probably importing a lot of intuitions from real life that are supposed to be assumed away in the example.

HAT itself doesn't talk about money. It talks about "utility" or "welfare," which are intended to abstractly capture what is "good for" the affected persons. From a "utility" or "welfare" perspective, the advantage of Action 2 actually looks smaller in the example given directly above than in the original example I gave. This is because the original example stipulated that people care linearly and exclusively about their wealth, whereas the above example takes place in the real world.

So while the original example I gave isn't realistic, I don't think the lack of realism is a huge problem here. It's just that HAT relies on reasoning about some abstract notion of "how good a situation is for a person" that can be treated linearly, and that notion might not translate easily or intuitively to things we're used to thinking about, such as wealth.

One-dimensional values. The example stipulates that the people involved care only about their wealth level. That's a very unusual condition in the real world. I think a lot of what feels odd about the example above is that Action 2 feels unfair and/or inegalitarian, and in real life, at least some of the people involved would care a lot about that! It's hard to get your head around the idea of Person A not caring at all about the fact that Person D has 6x as much wealth as them, for example; and it's relatively easy to imagine a general rule of "maximize total wealth" being exploited and leading to unintended consequences.

Again, HAT itself looks to abstract this sort of thing away through its use of expected utility theory. We should maximize not wealth, but some abstract quantity of "utility" or "welfare" that incorporates things like fairness. And we should maximize it in the long run across time, not just for the immediate action we're taking.

Moving from "maximizing the wealth of persons who care linearly and exclusively about wealth" to "maximizing the 'utility' of persons who care about lots of things in complicated ways" makes HAT much harder to think intuitively about, and harder to apply to specific scenarios. (It also makes it more general.)

But the basic dynamics of the example are the dynamics of HAT: once you perform this abstraction, the principle "take any action that everyone involved would prefer you take" can be used to support the principle "maximize the total utility/welfare/goodness (weighted by how much your ethical system values each person) in a situation."

"Take any action that everyone involved would prefer you take" sounds like a relatively innocuous, limited principle, but HAT connects it to "maximize total (weighted) utility," which is a very substantive one and the key principle of utilitarianism. In theory, it tells you what action to take in every conceivable situation.

Appendix 2: the utility legions corollary

This appendix tries to give an intuitive walkthrough for how the assumptions listed above can be used to support the utility legions corollary:

A given benefit/harm affecting N persons is N times as good (from an ethical perspective) as a given benefit/harm affecting one person (assuming that the one person, and the benefit they receive, is representative of the N). Hence, if there is a large enough set of persons who benefits (or anti-benefits) from a particular action, that alone can make the action ethically (un)desirable, outweighing all other considerations.

I will focus on demonstrating the bolded part. The non-bolded part requires more mathematical precision, and if you want support you are probably better off with the normal proof of HAT or the walkthrough given in Appendix 3.

Imagine two persons: A and B. Person A is part of a "utility legion" - for example, they might be:

One of many low-income people overseas, who could benefit from an anti-malarial bednet.
One of many animals on factory farms, who could benefit from better conditions.
One of many persons from the long-run future, who may or may not end up existing, depending on whether an extinction event occurs before their time. (We'll presume that "ending up existing" counts as a benefit.)

Person B is any other person, not included in the "utility legion."

We start from the premise that Person A's interests matter ethically. Because of this, if we are contemplating an action with some noticeable benefit to person A, there is some - even if very small - amount of harm (or benefit missed out on) for B that could be outweighed by the benefit to person A.

By "outweighed," I mean that it could be actively good (ethically) to take an action that results in some (potentially small) amount of harm to B, and the noticeable benefit to A, rather than sticking with the status quo.

This is, roughly speaking, a consequence of the fact that we care about person A's interests at all; it also tends to be intuitive or at least arguable for most real-world "utility legions." For example, you might think humans matter a lot more than chickens - but if you think chickens matter at all, then sparing a chicken from battery cage confinement is presumably worth some sacrifice (3 cents?) on the part of a human.
We'll talk about a "small benefit to person A" and a "smaller harm to person B" to connote these.

Now imagine a large harm to person B. In line with Assumption 3, there is some number K such that "a 1/K probability of the large harm" is better for Person B than "a 100% chance of the smaller harm."

The 1/K chance of the large harm is better for person B than definitely suffering the small harm. Therefore, if we are only choosing between a world where "person A gets a small benefit, and person B gets a smaller harm" and a world where "person A gets a small benefit, and person B gets a 1/K chance of a large harm," the latter is ethically better, by Assumption 4. And since we've already established that the former world is ethically better than the "status quo" (a world with neither the benefit nor the harm), this establishes that the latter world is also ethically better than the status quo.

In other words, the small benefit to person A outweighs a 1/K chance of the large harm to person B.

Now let's imagine a "utility squadron": some subset of the "utility legion" with K persons like Person A, that can benefit in the same way, with the same ethical weight. From an ethical perspective, giving the small benefit to a random person in this "utility squadron" is as good as the small benefit given to person A, and hence can outweigh the 1/K chance of a large harm to person B.

Now imagine a 1/K chance of giving the small benefit to everyone in the "utility squadron." Everyone in the utility squadron would benefit from this gamble just as much as the 100% chance of giving the small benefit to a random person in the squadron.²³

So by Assumption 4, the 1/K chance of a small benefit to everyone in the utility squadron is just as good, from an ethical perspective, as giving the small benefit to a random person in the utility squadron. Therefore, it too can outweigh a 1/K chance of a large harm to person B.

Since a 1/K chance of a small benefit to everyone in the utility squadron can outweigh the 1/K chance of a large harm to B, we can cancel out the 1/K multipliers and just say that a small benefit to everyone in the utility squadron can outweigh the large harm to B.

As long as the "utility legion" is large enough, every single person outside the "utility legion" can have any of their interests outweighed by a small benefit to a "squadron" (subset of the utility legion). Hence, for a large enough utility legion, giving the small benefit to the entire legion can outweigh any consequences for the rest of the world.

In practice, when it comes to debates about global poverty, animals on factory farms, etc., most people tend to accept the early parts of this argument - up to "the small benefit to person A can outweigh a 1/K chance of the large harm to person B." For example, preventing a case of malaria overseas has some value; there is some cost to our own society that is worth it, and some small probability of a large harm we would accept in order to prevent the case of malaria.

But as we move to a world in which there are an overwhelming number of malaria cases that we have the power to prevent, many people want to stop letting the benefits keep piling up straightforwardly (linearly). They want to say something like "Sure, preventing a case of malaria is worth $5, but preventing a million of them isn't worth more money than I have."

What HAT tells us is that we can't do this sort of thing if we want to hold onto its assumptions. If we want to use expected utility theory to be consistent in the tradeoffs we're making, and if we want to respect the "Pareto efficiency" principle, then we essentially need to treat a million prevented cases of malaria as being exactly a million times as good as one prevented case of malaria. And that opens the door to letting the prevention of malaria become our dominant ethical consideration.

Appendix 3: A walkthrough of the logic of Harsanyi's Aggregation Theorem

I think the basic conceptual logic of HAT should be understandable (though I wouldn't say "easily understandable") without a technical background. Below I will give my attempt at a relatively intuitive explanation.

It won't be the most rigorous presentation available (there are other papers on that topic). I will try to avoid leaning on the expected utility framework as a shared premise, and will instead highlight the specific assumptions and moves that utilize it.

Note that unlike Nick Beckstead's "intuitive explanation of Harsanyi's Aggregation Theorem" document, I will be focused on the version of HAT that does not assume we have a sensible way of making interpersonal utility comparisons (if you don't know what this means, you can just read on).

This ended up with a lot of steps, and I haven't yet managed to boil it down more and make it more digestible. (Thanks to Ajeya Cotra for help in making it somewhat better, though.)

Start with two persons that matter ethically (Assumption 1): Alice and Bob. (The reasoning generalizes from two persons to N persons readily, but assuming a population of 2 simplifies the notation a lot.)
In the DefaultWorld, Alice gets ten apples and Bob gets ten pears. In the AltWorld, Alice gets ten bananas and Bob gets ten oranges. We’re going to try to determine which of the two worlds is better according to other-centered ethics, without assuming anything at the outset about how to weigh Alice’s interests against Bob’s -- we might value one a lot more than the other.
To do this, we’re going to posit four more hypothetical worlds:
- Alice-BenefitWorld is a world where Alice gets ten apples and a lemon (an additional benefit) while Bob continues to get ten pears (as in DefaultWorld).
- Bob-BenefitWorld is a world where Alice continues to get ten apples (as in DefaultWorld) while Bob gets ten pears and a lemon.
- AliceSplitWorld is a world where Alice gets ten bananas (as she would in AltWorld) and Bob gets ten pears (as he would in DefaultWorld). That is, it’s like Alice is transported to AltWorld while Bob continues living in DefaultWorld.
- BobSplitWorld is a world where Alice gets ten apples (as she would in DefaultWorld) while Bob gets ten oranges (as he would in AltWorld). That is, it’s like Bob is transported to AltWorld while Alice continues living in DefaultWorld.
We can show (more detail below) that there is some number (or “utility”), U_Alice, such that AliceSplitWorld (where Alice gets 10 bananas) is U_Alice “times as good” as AliceBenefitWorld (where Alice gets 10 apples and a lemon) for Alice; the two worlds as exactly as good as DefaultWorld for Bob (he gets 10 pears in all of three worlds). Similarly there’s a number U_Bob such that BobSplitWorld is U_Bob times as good for Bob as BobBenefitWorld.
- The notion of what “World A is X times as good for person Y as World B” means is filled in using expected utility theory; I’ll cover it in more detail below.
We can choose (more detail below) an “ethical weight,” W_Bob, such that BobBenefitWorld (where Bob gets an extra lemon) is W_Bob times as good as AliceBenefitWorld (where Alice gets the extra lemon instead). That is, we’re essentially choosing how we weigh giving Bob a lemon over giving Alice a lemon, using whatever criteria we want. We can also specify W_Alice, which compares AliceBenefitWorld to itself; this value is by definition 1 -- giving Alice a lemon is exactly as good as giving Alice a lemon.
It follows from these points (as I’ll explain below) that AliceSplitWorld is U_Alice * W_Alice times as “ethically good, overall” as DefaultWorld, and BobSplitWorld is U_Bob * W_Bob times as “ethically good, overall” as DefaultWorld.
Now let’s imagine a gamble/lottery resulting in a ½ chance of AliceSplitWorld and a ½ chance of BobSplitWorld. I'll show both that (a) this gamble/lottery is ½ * U_Alice * W_Alice + ½ * U_Bob * W_Bob times as good as Defaultworld, overall; (b) this lottery is also exactly as good as "A ½ chance of AltWorld, otherwise DefaultWorld."
If we cancel out the ½ multipliers in (a) and (b), the conclusion is that AltWorld is U_Alice * W_Alice + U_Bob * W_Bob times as good as DefaultWorld.

World or lottery	Alice gets	Bob gets	Comparison 1	Comparison 2
DefaultWorld	10 apples	10 pears
AltWorld	10 bananas	10 oranges
AliceBenefitWorld	10 apples and a lemon	10 pears	W_Alice times as "ethically good", overall, as AliceBenefitWorld. (W_Alice = 1)
BobBenefitWorld	10 apples	10 pears and a lemon	W_Bob times as "ethically good", overall, as AliceBenefitWorld
AliceSplitWorld	10 bananas	10 pears	U_Alice times as good for Alice as AliceBenefitWorld.	U_Alice*W_Alice times as ethically good, overall, as DefaultWorld.
BobSplitWorld	10 apples	10 oranges	U_Bob times as good for Bob as BobBenefitWorld.	U_Bob*W_Bob times as ethically good, overall, as DefaultWorld.
½ chance of AliceSplitWorld; ½ chance of BobSplitWorld			"½ * U_Alice * W_Alice + ½ * U_Bob * W_Bob" times as good as DefaultWorld
½ chance of AltWorld; ½ chance of DefaultWorld			Same as previous row

We could add a third person, Carol, to the population. Maybe Carol would get 10 kiwis in DefaultWorld and 10 peaches in AltWorld. Here’s how things would change:

We’d have to add a CarolBenefitWorld (in which Carol gets an additional lemon while everyone else gets whatever they’d get in DefaultWorld) and a CarolSplitWorld (where Carol gets 10 peaches as she would in AltWorld and everyone else gets what they would in DefaultWorld). From Carol’s preferences over these worlds we can derive U_Carol.
We’d have to specify an ethical weight W_Carol (which captures how we weigh Carol getting an extra lemon against Alice getting an extra lemon).
The lottery in the second-to-last step of the walkthrough would be a ⅓ chance of AliceSplitWorld, a ⅓ chance of BobSplitWorld, and a ⅓ chance of CarolSplitWorld.
The lottery in the final step of the walkthrough would be a ⅓ chance of AltWorld and a ⅔ chance of DefaultWorld; we would conclude then that AltWorld is U_Alice * W_Alice + U_Bob * W_Bob + U_Carol * W_Carol times as good as DefaultWorld.

Similarly, we can extend this reasoning to any number of persons in the population, 1 through N. AltWorld will always be U_1 * W_1 + U_2 * W_2 + … + U_N * W_N times as good as DefaultWorld.

Below, I go into a lot more detail on each piece of this, and consider it for the general case of N persons. The following table may be useful as a reference point to keep coming back to as I go (it is similar to the table above, but more general):

World	Characterization	Comparison 1	Comparison 2
Defaultworld	The "status quo" we'll be comparing to an alternative world brought about by some action. For example, "Alice has 10 apples and Bob has 10 pears."
Altworld	The alternative world we're comparing to Defaultworld. For example, "Alice has 10 oranges and Bob has 10 bananas."
Benefitworld_1	Same as Defaultworld, except that Person 1 receives an additional benefit (e.g., "Alice has ten apples and a lemon; Bob has ten pears").	W_1 times as "ethically good", overall, as Benefitworld_1. (W_1 = 1)
Benefitworld_2	Same as Defaultworld, except that Person 1 receives an additional benefit (e.g., "Alice has ten apples; Bob has ten pears and a lemon").	W_2 times as "ethically good", overall, as Benefitworld_1
Benefitworld_N	Same as Defaultworld, except that Person N receives an additional benefit.	W_N times as "ethically good", overall, as Benefitworld_1
Splitworld_1	Same as Defaultworld for everyone except Person 1. Same as Altworld for Person 1. (Example: "Alice has 10 oranges, as in Altworld; Bob has 10 pears, as in Defaultworld")	U_1 times as good for Person 1 as Benefitworld_1.	U_1*W_1 times as ethically good, overall, as Defaultworld
Splitworld_2	Same as Defaultworld for everyone except Person 2. Same as Altworld for Person 2. (Example: "Alice has 10 apples, as in Defaultworld; Bob has 10 bananas, as in Altworld")	U_2 times as good for Person 2 as Benefitworld_2.	U_2*W_2 times as ethically good, overall, as Defaultworld
Splitworld_N	Same as Defaultworld for everyone except Person N. Same as Altworld for Person N.	U_N times as good for Person N as Benefitworld_N.	U_N*W_N times as ethically good, overall, as Defaultworld
Lottery/gamble in final step of the walkthrough	1/N chance of Splitworld_1; 1/N chance of Splitworld_2; ... ; 1/N chance of Splitworld_N.	"(1/N)U_1W_1 + (1/N)U_2W_2 + ... + (1/N)U_NW_N" times as good as Defaultworld
Lottery/gamble in final step of the walkthrough	1/N chance of Altworld; otherwise Defaultworld	Same as previous row

The basic setup

There is a set of N persons that matters ethically (Assumption 1). We will denote them as Person 1, Person 2, Person 3, ... , Person N. They don't need to be humans; they can be anyone/anything such that we think they matter ethically. For most of this explanation, we'll be focused on Person 1 and Person 2, since it's relatively straightforward to generalize from 2 persons to N persons.

By default all N persons are in a particular "state of the world" - let's call it Defaultworld. We're contemplating an action that would cause the state of the world to be different, resulting in Altworld.²⁴ We're going to try to come up with a system for deciding whether this action, resulting in Altworld, is a good thing or a bad thing.

Benefitworld_1 is a world that is just like Defaultworld, except that Person 1 gets some benefit.

The benefit could be ~anything: a massage, 10 extra minutes with their favorite person, etc. The only requirement is that (per Assumption 2) we consider it a true benefit for the person, that is, it makes them substantively better off.
(We could get to the same result by talking about a large benefit or a small benefit; we could also get to the same result by talking about an anti-benefit such as a headache, though these would change some of the details of how to demonstrate HAT.)

Benefitworld_2 is exactly the same, except it's Person 2 rather than Person 1 getting the benefit. Benefitworld_3, Benefitworld_4, ... Benefitworld_N are the same idea.

Splitworld_1 is a world that is just like Defaultworld for everyone except Person 1, and just like Altworld for Person 1. In the simplified example, this is the one where Alice has 10 oranges (as in Altworld) while Bob has 10 pears (as in Defaultworld).

Splitworld_2 is the same idea, but with Person 2 in the role of the person who is in an Altworld-like state. Splitworld_3, Splitworld_4, ... , Splitworld_N work similarly.

Comparing two worlds with a multiplier

Splitworld_1 vs. Benefitworld_1. We're going to examine how to ethically compare Splitworld_1 and Benefitworld_1.

First let's consider: which is better for Person 1, Benefitworld_1 or Splitworld_1? (For example, is it better for Alice to have ten apples and a lemon, or to have ten bananasoranges?)

One of the consequences of the expected utility framework (Assumption 3) is that this question has an answer: either Benefitworld_1 is better for Person 1, or Splitworld_1 is better for Person 1, or they're exactly equally good.

Let's imagine that Splitworld_1 is better for Person 1 than Benefitworld_1. In this case, by Assumption 3, there exists some probability, P_1, such that:

"A P_1 chance of Splitworld_1, otherwise Defaultworld" is exactly as good for Person 1 as "A 100% chance of Benefitworld_1."
"A greater-than-P_1 chance of Splitworld_1, otherwise Defaultworld" is better for Person 1 than "A 100% chance of Benefitworld_1." (This is true for any chance of Splitworld_1 that is greater than P_1 by any amount.)
"A less-than-P_1 chance of Splitworld_1, otherwise Defaultworld" is worse for Person 1 than "A 100% chance of Benefitworld_1." (This is true for any chance of Splitworld_1 that is less than P_1 by any amount.)

I'll describe this sort of situation as "Splitworld_1 is U_1 times as good as Defaultworld for Person 1," where U_1 = 1/P_1.

For example, if we say that "Splitworld_1 is 1000x as good as Benefitworld_1," we're saying that "a 1/1000 chance of Splitworld_1; otherwise Defaultworld" would be exactly as good for Person 1 as "a 100% chance of Benefitworld_1." And "a 2/1000 chance of Splitworld_1, otherwise Defaultworld" would be better for Person 1 than Benefitworld_1 (because 2/1000 > 1/1000), and "a 1/2000 chance of Splitworld_1, otherwise Defaultworld" would be worse for Person 1 than Benefitworld_1 (because 1/2000 < 1/1000).

Now for a nonobvious point: if Splitworld_1 is U_1 times as good as Benefitworld_1 for Person 1, it is also U_1 times as ethically good, overall as Benefitworld_1.

This is because of Assumption 4:

Everyone except Person 1 is indifferent between Splitworld_1 and Benefitworld_1 (by stipulation - these worlds are different from Defaultworld only for Person 1).
Taking the same P_1 used above (U_1 = 1/P_1), we can see that "A P_1 chance of Splitworld_1, otherwise Defaultworld" is exactly as good for all the persons as "A 100% chance of Benefitworld_1."
"A greater-than-P_1 chance of Splitworld_1, otherwise Defaultworld" is better for Person 1 than "A 100% chance of Benefitworld_1." And everyone else is indifferent between these. Therefore, by Assumption 4, "A greater-than-P_1 chance of Splitworld_1, otherwise Defaultworld" is more ethically good, overall, than "A 100% chance of Benefitworld_1."
By similar reasoning, "A less-than-P_1 chance of Splitworld_1, otherwise Defaultworld" is more ethically good, overall, than "A 100% chance of Benefitworld_1." (This is true for any chance of Splitworld_1 that is less than P_1 by any amount.)

If you compare the last three bullet points to the three bullet points above, you'll see that the statements about what is ethically good precisely mirror the statements about what is best for Person 1. Hence, Splitworld_1 is both U_1 times as good as Benefitworld_1 for Person 1, and overall.

Some notes on shortcuts I'll be taking for the rest of the appendix - if you find your eyes glazing over, just move on to the next section:

First, for the rest of the appendix, to save space, I will leave out bullet points such as the second two bullet points directly above. When I compare two different probabilistic situations, A and B, and say A is "exactly as good" as B or that A and B are "equally good," it will be the case that a small change in the key probability within A would make A better or worse than B. This is important because of the way Assumption 4 is written: I will often be saying that our ethical system should consider two situations equally good, when what I mean is that it should consider one better if the key probability shifts a bit. I could elaborate on this if needed for specific cases, but writing it out every time would get tedious.

Second: I've only established "if Splitworld_1 is U_1 times as good as Benefitworld_1 for Person 1, it is also U_1 times as ethically good, overall, as Benefitworld_1" for the case where Splitworld_1 is better (for Person 1) than both Defaultworld and Benefitworld_1. There are also possible cases where Splitworld_1 is worse than both Defaultworld and Benefitworld_1, and where Splitworld_1 is in between (that is, better than Defaultworld but worse than Benefitworld_1). Here I'm going to take a bit of a shortcut and simply assert that in these cases as well, it makes sense to use the "Splitworld_1 is U_1 times as good as Benefitworld_1" language I'm using - specifically,

If U_1 is greater than 1, "Splitworld_1 is U_1 times as good as Benefitworld_1" means Splitworld_1 is better. (For example, "Splitworld_1 is 1000 times as good as Benefitworld_1.")
- In this case, there is some probability of Splitworld_1 that is equally good to a 100% probability of Benefitworld-1 (this is the case discussed above).
If U_1 is between 0 and 1, "Splitworld_1 is U_1 times as good as Benefitworld_1" means Splitworld_1 is worse than Benefitworld_1, but still better than Defaultworld. (For example, "Splitworld_1 is 1/1000 as good as Benefitworld_1.")
- In this case, there is some probability of Benefitworld_1 that is equally good to a 100% probability of Splitworld_1.
If U_1 is less than 0, "Splitworld_1 is U_1 times as good as Benefitworld_1" means Splitworld_1 is worse than Benefitworld_1 and worse than Defaultworld. (For example, "Splitworld_1 is -1000 times as good as Benefitworld_1" - colloquially this implies that it is 1000 times as bad for Person 1 as Benefitworld_1 is good for Person 1.)
- In this case, there is some probability P_1 such that "A P_1 probability of Splitworld_1, otherwise Benefitworld_1" is equally good to Defaultworld.

Bottom line - for any two worlds other than Defaultworld, it will always be possible to say that one of them is "U_1 times as good as the other," where U_1 > 1 implies it is better and U_1 < 1 implies it is worse, and a negative U_1 implies it is bad relative to Defaultworld. Regardless of which one is the case, U_1 will have all the key properties that I talk about throughout this appendix, such as what I showed just above: "if Splitworld_1 is U_1 times as good as Benefitworld_1 for Person 1, it is also U_1 times as ethically good, overall, as Benefitworld_1."

I could elaborate this fully, but it would be tedious to do so.

Here U_1 represents Person 1’s “utility” in Splitworld_1 - how good the world is for them, normalized against some consistent “comparison point” benefit, in this case the benefit from Benefitworld_1.

Ethical weights

Which is more ethically good, Benefitworld_1 or Benefitworld_2? For example, is it better for Alice to have ten apples and a lemon while Bob has only ten pears, or is it better for Alice to have only ten apples while Bob gets ten pears and a lemon?

For purposes of this exercise, we'll essentially say the answer is whatever you want - as long as it is a single, consistent answer.

Whatever you decide, though, we can introduce the same "Benefitworld_2 is ___ times as good as Benefitworld_1" type language as above:

If you think Benefitworld_1 is better than Benefitworld_2, there must be some probability P_2 such that "A P_2 chance of Benefitworld_1, otherwise Defaultworld" is exactly as ethically good as "A 100% chance of Benefitworld_2." In this case, we'll set W_2 = P_2 and say that "Benefitworld_2 is W_2 times as ethically good as Benefitworld_1."
If you think Benefitworld_1 is worse than Benefitworld_2, the same sort of logic applies, except that W_2 will be greater than 1 in this case (specifically, it will be 1/P_2, where P_2 is such that "A P_2 chance of Benefitworld_2, otherwise Defaultworld" is exactly as ethically good as "A 100% chance of Benefitworld_1.")

W_2 represents the "ethical weight" of Person 2 with respect to Person 1 - how much your ethical system values the particular benefit we're talking about (e.g., a lemon) for Person 2 vs. Person 1.

Now we can show that Splitworld_2 is U_2*W_2 times as ethically good as Benefitworld_1. As a reminder, U_2 represents how much better Splitworld_2 is than Benefitworld_2 for Person 2, and W_2 is the "ethical weight" of Person 2 just discussed.

Splitworld_2 is U_2 times as ethically good as Benefitworld_2, by the logic in the previous section.
Benefitworld_2 is W_2 times as ethically good as Benefitworld_1, by the stipulations in this section.
If the multiplication metaphor I'm using is legit, then "Splitworld_2 is U_2 times as ethically good as Benefitworld_2, which is in turn W_2 times as ethically good as Benefitworld_1" straightforwardly implies "Splitworld_2 is U_2*W_2 times as ethically good as Benefitworld_1." And in fact this is the case, though it is a little hairy to show in full detail. But to give an idea of how we get there, I will spell it out for the case where U_2>1 and W_2>1:
- The following two possibilities are equally good for Person 2: "a 1/U_2 probability of Splitworld_2, otherwise Defaultworld" and "A 100% chance of Benefitworld_2."
- We can modify each of the two possibilities above by saying there is a 1/W_2 chance of that possibility unfolding, otherwise it is Defaultworld. This shouldn't change anything about what's good for Person 2. Therefore, it's also the case that the following are equally good for Person 2: "A 1/W_2 probability of (a 1/U_2 probability of Splitworld_2, otherwise Defaultworld), otherwise Defaultworld" and "A 1/W_2 probability of Benefitworld_2, otherwise Defaultworld."
- That in turn means the following are equally good for Person 2: "A 1/(U_2*W_2) probability of Splitworld_2, otherwise Defaultworld)" and "A 1/W_2 probability of Benefitworld_2, otherwise Defaultworld." For probabilities of Splitworld greater than 1/(U_2*W_2), the former is better; otherwise, the latter is better.
- All other persons are unaffected by the differences between these three worlds. So by Assumption 4, our ethical system should consider the following two to be equally good: "a 1/(U_2*W_2) probability of Splitworld_2, otherwise Defaultworld" and "A 1/W_2 probability of Benefitworld_2, otherwise Defaultworld."
- Our ethical system also finds the following two equally good, by the reasoning in the previous section: "A 1/W_2 probability of Benefitworld_2, otherwise Defaultworld" and "A 100% chance of Benefitworld_1."
- We can connect the dots between the bolded statements to see that our ethical system should consider these two equally good: "a 1/(U_2*W_2) probability of Splitworld_2, otherwise Defaultworld" and "A 100% chance of Benefitworld_1." This corresponds to the original statement, "Splitworld_2 is U_2*W_2 times as ethically good as Benefitworld_1."

As an aside, an alternate approach to HAT is to insist that each of the “benefits” is standardized - for example, that we declare all benefits to be reducible to some quantity of pleasure, and we define Benefitworld_1 as the world in which Person 1 gets a particular, standardized amount of pleasure (and similar for Benefitworld_2, etc.) One can then skip the part about “ethical weights,” and use the impartiality idea to say that everyone gets the same weight.

Final step

Consider the following two possibilities:

"A 1/N chance of Altworld, otherwise Defaultworld"
"A 1/N chance of Splitworld_1; a 1/N chance of Splitworld_2; ... ; a 1/N chance of Splitworld_N"

We can value the 2nd possibility using the finding from the previous section:

For all N, Splitworld_N is U_N*W_N times as good as Benefitworld_1, where U_N is such that "Splitworld_N is U_N times as good for Person N as Benefitworld_N," and W_N is such that "Benefitworld_N is W_N times as ethically good as Benefitworld_1."
So, "A 1/N chance of Splitworld_1; a 1/N chance of Splitworld_2; ... ; a 1/N chance of Splitworld_N" is (1/N * (U_1*W_1) + 1/N * (U_2*W_2) + ... + 1/N * (U_N*W_N)) times as good as Benefitworld_1.
Putting the 1/N on the outside, we can rewrite this as being 1/N*(U_1 * W_1 + U_2 * W_2 + ... + U_N * W_N) times as good as Benefitworld_1.

Furthermore, we can say that the two possibilities listed at the top of this section are equally good for all of the N persons, since in each case, that person has a 1/N chance of getting a benefit equivalent to being in Altworld, and will otherwise have an equivalent situation to being in Defaultworld. So by Assumption 4, the two possibilities listed at the top of this section should be equally ethically good, as well.

So now we've established that "A 1/N chance of Altworld, otherwise Defaultworld" is 1/N*(U_1*W_1 + U_2 * W_2 + ... + U_N * W_N) times as good as Benefitworld_1.

That in turn implies that a 100% chance of Altworld is simply (U_1 * W_1 + U_2 * W_2 + ... + U_N * W_N) times as good as Benefitworld_1. (All I did here was remove the 1/N from each.)

We can use this logic to get a valuation of any hypothetical world (other than Defaultworld). When one world has a greater value of (U_1 * W_1 + U_2 * W_2 + ... + U_N * W_N) than another, it will be "more times as good as Benefitworld_1" by our ethical system, which means it is better.

A high-level summary would be:

Pick a benefit, any benefit - say, getting an extra lemon. Your ethical system needs a consistent multiplier (the “ethical weight”) for how much you value one person getting that benefit, vs. another.
There's also a multiplier (a "utility") for how good a particular world is for a particular person, compared to how good the above benefit (the lemon) would be. We can show, using hypothetical "gambles," that an ethics respecting all of the assumptions needs to use this same multiplier for comparing any two worlds where that person is the only one affected.
We can show, using a hypothetical "gamble," that a 1/N chance of a given world is equivalent to a 1/N chance of each “split world” - a world where we imagine only one person at a time getting the associated benefit (or harm) associated with the world. And each “split world” needs to be valued by the utility times the ethical weight.

Notes

I use this term to denote a broad conception of beings. “Persons” could include animals, algorithms, etc. ↩
Nitpick-type point: one could have lexicographic orderings such that a single benefit of type A outweighs any number of benefits of type B. As such, this claim only holds for specific interpretations of the word "significant" in statements like "perhaps you place significant value on one person getting a bednet." I am more specific about the conditions giving rise to utility legions in Appendix 2. ↩
There are a number of variations on HAT. The version I'll be discussing here is based on Harsanyi (1955), the "original" version. (See part III of that paper for a compact, technical explanation and proof.) This is different from the version in Nick Beckstead's "intuitive explanation" of the theorem. The version I'll be discussing is harder to explain and intuit, but I chose it because it requires fewer assumptions, and I think that's worth it. There are a number of other papers examining, re-explaining and reinterpreting the theorem, its assumptions, and variants - for example, Hammond (1992), Mongin (1994), Broome (1987). There are also pieces on HAT in the effective altruism community, including this LessWrong post and Nick Beckstead's "intuitive explanation" of the theorem. ↩
For relatively brief discussions of these issues, see Stanford Encyclopedia of Philosophy on the Repugnant Conclusion, or Population Axiology. I plan to write more on this topic in the future. ↩
That statement is a slight simplification. For example, "discounted utilitarianism" arguably has infinite persons who matter ethically, but their moral significance diminishes as they get "further" from us along some dimension. ↩
E.g., utilitarianism.net largely has this focus. ↩
One could also have a hybrid view such as "situation 1 is better for person X if it contains more of certain objective properties and X appreciates or enjoys or is aware of those properties." ↩
In this context, "World A is ___ times as good as world B" statements also need to make implicit reference to a third "baseline" world. This is spelled out more in the appendix. ↩
More on this topic here: Comparing Utilities ↩
I mean, you might reasonably guess that one team has a better offense and the other has a better defense, but you can't say which one performed better overall. (And even the offense/defense statement isn't safe - it might be that one team, or its opponent, simply played at a more time-consuming pace.) ↩
Quote is from https://www.utilitarianism.net/introduction-to-utilitarianism#the-veil-of-ignorance ↩
For example:
- You can set your ethical system up so that your own interests are weighed millions of times, billions of times, etc. more heavily than others' interests. You can do similarly for the interests of your friends and family. This violates the spirit of other-centered ethics, but it doesn't violate the letter of the version of HAT I've used.
- You can give negligible weight to animals, to people overseas, etc.
- Actually, if you're comparing any two states of the world, you could call either one "better" from an ethical point of view,and still be consistent with maximizing some weighted sum of utilities, as long as there's at least one person that is better off in the state of the world you pick.
- Although over time, if you apply your framework to many ethical decisions, you will become more constrained in your ability to make them consistently using a single set of weights.
↩
For example:
- Some people might claim that everyone benefits equally from $1. By HAT, this would imply that we should maximize some weighted sum of everyone's wealth; intuitively, it would suggest that we should simply maximize the equal-weighted sum of everyone's wealth, which means maximizing total world wealth.
- Some people might claim that people benefit more from $1 when they have less to begin with. This could lead to the conclusion that money should be redistributed from the wealthy to the less wealthy. (In practice, most utilitarians think this, including myself.)
- It also seems that most persons have a direct interest in what state other persons are in. For example, it might be that people suffer a direct cost when others become more wealthy. On balance, then, it could even be bad to simply generate free wealth and give it to someone (since this might make someone else worse off via inequality.)
  - This consideration makes real-world comparisons of total utility pretty hairy, and seems to prevent many otherwise safe-seeming statements like "Surely it would be good to help Persons 1, 2, and 3 while directly hurting no one."
  - It means that a utilitarian could systematically prefer equality of wealth, equality of good experiences, etc. They can't value equality of utility - they are stuck with a weighted sum. But utility is a very abstract concept, and equality of anything else could matter for the weighted sum of utility.
↩
I use "consequentialism" here rather than "welfarist consequentialism" because I think it is a better-known and more intuitive term. The downside is that there are some very broad versions of consequentialism that count all sorts of things as "consequences," including violations of moral rules. I mean "consequentialism" to exclude this sort of thing and to basically mean welfarist consequentialism. ↩
This isn't meant to be a full definition of consequentialism, i.e., consequentialism can also be applied to cases where it's not clear what will lead to the best outcome. ↩
Though often an exchange with a "long-run, repeated-game, non-transactional" feel. ↩
Say we have this weighted sum, where the W's are weights and the U's are utilities:
W_1*U_1 + W_2*U_2 + ... + W_N*U_N + W_M*U_M + K*W_L*U_L
The W_M*U_M term corresponds to the "utility monster", and the K*W_L*U_L term corresponds to the "utility legion," while the rest of the sum corresponds to the "rest of the world."
Once you have filled in all the variables except for U_M, there is some value for U_M that makes the overall weighted sum come out to as big a number as you want.# Hence, for any two situations, adding a high enough value of U_M to one of them makes that situation come out better, regardless of everything else that is going on.
Similarly, once you have filled in all the variables except for K, there is some value of K that makes the overall weighted sum come out to as big a number as you want. ↩
Unless you count valuing humans much more than other animals and insects? But this isn't a distinctive or controversial utilitarian position. ↩
To be clear, I think worrying about suffering of present-day algorithms is obscure and not terribly serious. I think future, more sophisticated algorithms could present more serious considerations. ↩
"As a rough approximation, let us say the Virgo Supercluster contains 10^13 stars. One estimate of the computing power extractable from a star and with an associated planet-sized computational structure, using advanced molecular nanotechnology[2], is 10^42 operations per second.[3] A typical estimate of the human brain’s processing power is roughly 10^17 operations per second or less.[4] Not much more seems to be needed to simulate the relevant parts of the environment in sufficient detail to enable the simulated minds to have experiences indistinguishable from typical current human experiences.[5] Given these estimates, it follows that the potential for approximately 10^38 human lives is lost every century that colonization of our local supercluster is delayed; or equivalently, about 10^29 potential human lives per second." ↩
One perspective I've heard is that utilitarianism is the best available theory for determining axiology (which states of the world are better, from an ethical perspective, than others), but that axiology is only part of ethics (which can also incorporate rules about what actions to take that aren't based on what states of the world they bring about). I don't find this division very helpful, because many of the non-utilitarian ethical pulls I feel are hard to divide this way: I often can't tell when an intuition I have is about axiology vs. ethics, and my attempts to think about pure axiology feel very distant from my all-things-considered ethical views. However, others may find this division useful, and it arguably leads to a similar conclusion re: being a utilitarian when making supererogatory, ethics-driven decisions. ↩
As implied by the italicization, this passage does not actually describe my personal views. It describes a viewpoint that has significant pull for me, but that I also have significant reservations about. This piece has focused on the case in favor; I hope to write about my reservations in the future. ↩
If a random person in the utility squadron gets the small benefit, that means each person in the squadron has a 1/K chance of getting the small benefit. If there's a 1/K chance of everyone in the squadron getting the small benefit, then each person in the squadron has a 1/K chance of getting the small benefit. They're the same. ↩
I'm deliberately not getting into a lot of detail about what a "state of the world" means - is it a full specification of all the world's relevant properties at every point in time, is it some limited set of information about the world, or something else? There are different ways to conceive of "world-states" leading to different spins on the case for using the expected utility framework (e.g.) ↩

deanspears @ 2022-02-02T15:24 (+20)

I'm very glad to see this, and I just want to add that there is a recent burst in the literature that is deepening or expanding the Harsanyi framework. There are a lot of powerful arguments for aggregation and separability, and it turns out that aggregation in one dimension generates aggregation in others. Broome's Weighing Goods is an accessible(ish) place to start.

Fleurbaey (2009): https://www.sciencedirect.com/science/article/abs/pii/S0165176509002894

McCarthy et al (2020): https://www.sciencedirect.com/science/article/pii/S0304406820300045?via%3Dihub

A paper of mine with Zuber, about risk and variable population together: https://drive.google.com/file/d/1xwxAkZlOeqc4iMeXNhBFipB6bTmJR6UN/view

As you can see, this literature is pretty technical for now. But I am optimistic that in 10 years it will be the case both that the experts much better understand these arguments and that they are more widely known and appreciated.

Johan Gustafsson is also working in this space. This link isn't about Harsanyi-style arguments, but is another nice path up the mountain: https://johanegustafsson.net/papers/utilitarianism-without-moral-aggregation.pdf

Holden Karnofsky @ 2022-03-31T22:30 (+3)

Thanks, this is appreciated!

MichaelStJules @ 2022-06-25T18:13 (+10)

It's worth noting that while the rationality axioms on which Harsanyi's theorem depends are typically justified by Dutch book arguments, money pumps or the sure-thing principle, an expected utility maximization with an unbounded utility function (e.g. risk-neutral total utilitarianism) with infinitely many possible outcomes is actually vulnerable to Dutch books and money pumps and violates the sure-thing principle. See, e.g. Paul Christiano's comment with St. Petersburg lotteries. To avoid the issue, you can commit to sticking with the ex ante better option, even though you know you'd later want to break the commitment. But such commitments can be used for other Dutch books, too, e.g. you can commit to never completing the last step of a Dutch book, or, on a more ad hoc basis, anticipate Dutch books and try to avoid them on a more ad hoc basis.

The continuity axiom is also intuitive and I'd probably accept continuity over ranges of individual utilities at least, anyway, but I wouldn't take it to be a requirement of rationality, and without it, maximin/leximin/Rawl's difference principle and lexical thresholds aren't ruled out.

Jake Nebel @ 2022-08-10T22:48 (+8)

Hi Holden, thanks for this very nice overview of Harsanyi's theorem!

One view that is worth mentioning on the topic of interpersonal comparisons is John Broome's idea (in Weighing Goods) that the conclusion of the theorem itself tells us how to make interpersonal comparisons (though it presupposes that such comparisons can be made). Harsanyi's premises imply that the social/ethical preference relation can be represented by the sum of individual utilities, given a suitable choice of utility function for each person. Broome's view is that this provides provide the basis for making interpersonal comparisons of well-being:

As we form a judgement about whether a particular benefit to one person counts for more or less than some other particular benefit to someone else, we are at the same time determining whether it is a greater or lesser benefit. To say one benefit 'counts for more' than another means it is better to bring about this benefit rather than the other. So this is an ethical judgement. Ethical judgements help to determine our metric of good, then. (220).

And: "the quantity of people's good acquires its meaning in such a way that the total of people's good is equal to general good" (222).

I don't think Broome is right (see my Aggregation Without Interpersonal Comparisons of Well-Being), but the view is worth considering if you aren't satisfied with the other possibilities. I tend to prefer the view that there is some independent way of making interpersonal comparisons.

On another note: I think the argument for the existence of utility monsters and legions (note 17) requires something beyond Harsanyi's premises (e.g., that utilities are unbounded). Otherwise I don't see why "Once you have filled in all the variables except for U_M [or U_K], there is some value for U_M [or U_K] that makes the overall weighted sum come out to as big a number as you want." Sorry if I'm missing something!

Kenny Easwaran @ 2022-02-03T18:54 (+5)

I haven't gone through this whole post, but I generally like what I have seen.

I do want to advertise a recent paper I published on infinite ethics, suggesting that there are useful aggregative rules that can't be represented by an overall numerical value, and yet take into account both the quantity of persons experiencing some good or bad and the probability of such outcomes: https://academic.oup.com/aristotelian/article-abstract/121/3/299/6367834

The resulting value scale is only a partial ordering, but I think it gets intuitive cases right, and is at least provably consistent, even if not complete. (I suspect that for infinite situations, we can't get completeness in any interesting way without using the Axiom of Choice, and I think anything that needs the Axiom of Choice can't give us any reason for why it rather than some alternative is the right one.)

UriKatz @ 2022-10-13T12:39 (+2)

It seems to me that no amount of arguments in support of individual assumptions, or a set of assumptions taken together, can make their repugnant conclusions more correct or palatable. It is as if Frege’s response to Russel’s paradox were to write a book exalting the virtues of set theory. Utility monsters and utility legions show us that there is a problem either with human rationality or human moral intuitions. If not them than the repugnant conclusion does for sure, and it is an outcome of the same assumptions and same reasoning. Personally, I refuse to bite the bullet here which is why I am hesitant to call myself a utilitarian. If I had to bet, I would say the problem lies with assumption 2. People cannot be reduced to numbers either when trying to describe their behavior or trying to guide it. Appealing to an “ideal” doesn’t help, because the ideal is actually a deformed version. An ideal human might have no knowledge gaps, no bias, no calculation errors, etc. but why would their well being be reducible to a function?

(note that I do not dispute that from these assumptions Harsanyi’s Aggregation Theorem can be proven)

MichaelStJules @ 2022-07-08T00:41 (+2)

Pareto efficiency (assumption 4)
If we're choosing between two states of the world, and one state is better for at least one person and worse for no one (in terms of expected utility as defined above), then our ethical system should always prefer that state.

It's also worth mentioning that this Pareto efficiency assumption, applied to expected utilities over uncertain options and not just actual utilities over deterministic options, rules out (terminal) other-centered preferences for utility levels to be distributed more equally (or less equally) ex post, as well as ex post prioritarian and ex post egalitarian social welfare functions.

You would be indifferent between these two options over the utility levels of Alice and Bob (well, if you sweeten either an arbitrarily low amount, you should prefer the sweetened one; continuity gives you indifference without sweetening):

50% chance of 1 for Alice and 0 for Bob, and 50% chance of 0 for Alice and 1 for Bob.
50% chance of 1 for each and 50% chance of 0 for each.

But an ex post egalitarian might prefer 2, since the utilities are more equal in each definite outcome.

Between the two following options, an ex post prioritarian or ex post egalitarian might strictly prefer the first, as well, even though the expected utilities are the same (using the numbers as the same utilities used to for expected utilities):

50% chance of -1 for Alice and a 50% chance of 1 for Alice.
100% chance of 0 for Alice.

HAT's assumptions together also rule out preferences for or against ex ante equality and so ex ante priotarianism and ex ante egalitarianism, i.e., you should be indifferent between the following two options, even though the first seems more fair ex ante:

50% chance of 1 for Alice and 0 for Bob, and 50% chance of 0 for Alice and 1 for Bob.
100% chance of 1 for Alice and 0 for Bob.

Jaime Sevilla @ 2022-03-29T19:14 (+2)

Since I first read your piece on future-proof ethics my views have evolved from "Not knowing about HAT" to "HAT is probably wrong/confused, even if I can't find any dubious assumptions" and finally "HAT is probably correct, even though I do not quite understand all the consequences".

I would probably not have engaged with HAT if not for this post, and now I consider it close in importance to VNM's and Cox' theorems in terms of informing my worldview.

I particularly found the veil of ignorance framing very useful to help me understand and accept HAT.

I'll probably be coming back to this post and mulling over it to understand better what the heck I have commited to.

richard_ngo @ 2022-02-26T01:44 (+2)

Situation 1 is better for person X than situation 2 (that is, it gives X higher utility) if and only if person X prefers that situation.

I think you need "prefers that situation for themselves". Otherwise, imagine person X who is a utilitarian - they'll always prefer a better world, but most ways of making the world better don't "benefit X".

Then, unfortunately, we run into the problem that we're unable to define what it means to prefer something "for yourself", because we can no longer use (even idealised) choices between different options.

Holden Karnofsky @ 2022-03-31T22:30 (+2)

Good point, thanks! Edited.

UriKatz @ 2022-10-13T12:35 (+1)

the quest for an other-centered ethics leads naturally to utilitarian-flavored systems with a number of controversial implications.

This seems incorrect. Rather, it is your 4 assumptions that “lead naturally” to utilitarianism. It would not be hard for a deontologist to be other-focused simply by emphasizing the a-priori normative duties that are directed towards others (I am thinking here of Kant’s duties matrix: perfect / imperfect & towards self / towards others). The argument can even be made, and often is, that the duties that one has towards one’s self are meant to allow one to benefit others (i.e. skill development). If by other-focused you mean abstracting from one’s personal preferences, values, culture and so forth, deontology might be the better choice, since its use of a-priori reasoning places it behind the veil of ignorance by default.