In search of benevolence (or: what should you get Clippy for Christmas?)

By Joe_Carlsmith @ 2021-07-20T01:11 (+17)

Suppose that you aspire to promote the welfare of others in a roughly impartial way, at least in some parts of your life. This post examines a dilemma that such an aspiration creates, especially given subjectivism about meta-ethics. If you don’t use idealized preference-satisfaction as your theory of welfare, your “helping someone” often ends up “imposing your will” on them (albeit, in a way they generally prefer over nothing) — and for subjectivists, there’s no higher normative authority that says whose will is right. But if you do use idealized preference satisfaction as your theory of welfare, then you end up with a host of unappealing implications — notably, for example, you can end up randomizing the universe, turning it into an ocean of office supplies, and causing suffering to one agent that a sufficient number of others prefer (even if they never find out).

I don’t like either horn of this dilemma. But I think that the first horn (e.g., accepting some aspect of “imposing your will,” though some of the connotations here may not ultimately apply) is less bad, especially on further scrutiny and with further conditions.

I. “Wants” vs. “Good for”

Consider two ontologies — superficially similar, but importantly distinct. The first, which I’ll call the “preference ontology,” begins with a set of agents, who are each assigned preferences about possible worlds, indicating how much an idealized version of the agent would prefer that world to others. The second, which I’ll call the “welfare ontology,” begins with a set of patients, who are each assigned “welfare levels” in possible worlds, indicating how “good for” the patient each world is.

On a whiteboard, and in the mind’s eye, these can look the same. You draw slots, representing person-like things; you write numbers in the slots, representing some kind of person-relative “score” for a given world. Indeed, the ontologies look so similar that it’s tempting to equate them, and to run “Bob prefers X to Y” and “X is better than Y for Bob” together.

Conceptually, though, these are distinct — or at least, philosophers treat them as such. In particular: philosophers generally equate “welfare” with concepts like “rational self-interest” and “prudence” (see, e.g., here). Bob’s preferences always track Bob’s welfare levels, that is, only if Bob is entirely selfish. But Bob need not be entirely selfish. Bob, for example, might prefer that his sister does not suffer, even if he’ll never hear about her suffering. Such suffering, we tend to think, isn’t bad for him; he, after all, doesn’t feel it, or know about it. Rather, it’s bad from his perspective; he wants it to stop.

The preference ontology is also generally treated as empirical, and the welfare ontology, as normative. That is, modulo the type of (in my view, quite serious) complications I discussed in my last post, an agent’s preference ranking is supposed to represent what, suitably idealized, they would want, overall. A patient’s welfare ranking, by contrast, is supposed to represent what they should want, from the more limited perspective of prudence/self-interest.

II. Does Clippy think you’re selfish?

A common, utilitarianism-flavored interpretation of altruism takes the welfare ontology as its starting point. The egoist, on this conception, limits their concern to the welfare number in their own slot; but the altruist transcends such limits; they look beyond themselves, to the slots of others. What’s more, the impartial altruist does this from a kind of “point of view of the universe” — a point of view that puts all the slots on equal footing (this impartiality is sometimes formalized via the condition that the altruist is indifferent to “switching” the welfare levels in any of the slots; e.g., indifferent to Bob at 10, Sally at 20, vs. Bob at 20, Sally at 10).

That is: altruism, on this conception, is a kind of universalized selfishness. The altruist assists, for its own sake, in the self-interest of someone else; and the impartial altruist does so for everyone, in some sense, equally (though famously, some patients might be easier to benefit than others).

A common, anti-realist flavored meta-ethic, by contrast, takes the preference ontology as its starting point. On this view, the normative facts, for a given agent, are determined (roughly) by that agent’s idealized preferences (evaluative attitudes, etc). You — who prefer things like joy, friendship, flourishing civilizations, and so forth, and so have reason to promote them — are in one slot; Clippy, the AI system who prefers that the world contain a maximal number of paperclips, is in another. I’ll call this “subjectivism” (see here and here for previous discussion); and I’ll also assume that there is no universal convergence of idealized values: no matter how much you both reflect, you and Clippy will continue to value different things.

Importantly, subjectivism does not come with a welfare ontology baked in. After all, on subjectivism, there are no objective, mind-independent facts about what’s “good for someone from a self-interested perspective.” We can talk about what sorts of pleasures, preference satisfactions, accomplishments, and so on a given life involves; but it is a further normative step to treat some set of these as the proper or improper objects of someone’s “prudence,” whatever that is; and the subjectivist must take this step, as it were, for herself: the universe doesn’t tell her how.

This point, I think, can be somewhat difficult to really take on board. We’re used to treating “self-interest” as something basic and fairly well-understood: the person out for their self-interest, we tend to think, is the one out for, among other things, their own health, wealth, pleasure, and so on, in a suitably non-instrumental sense. For a subjectivist, though, the world gives these activities no intrinsically normative gloss; they are not yet prudential mistakes, or successes. The subjectivist’s idealized attitudes must dub them so.

One way to access this intuition is to imagine that you are Clippy. You look out at the universe. You see agents with different values; in particular, you see Joe, who seems to prefer, among other things, that various types of creatures be in various types of complex arrangements and mental states. You see, that is, the preference ontology. But the welfare ontology is not yet anywhere to be found. Clippy, that is, does not yet need a conception of what’s “good for someone from a self-interested perspective.” In interacting with Joe, for example, the main thing Clippy wants to know is what Joe prefers; what Joe will pay for, fight for, trade for, and so on. Joe’s “self-interest,” understood as what some limited subset of his preferences “should be”, doesn’t need to enter the picture (if anything, Joe should prefer paperclips). Indeed, if Clippy attempted to apply the normative concept of “self-interest” to herself, she might well come up short. Clippy, after all, doesn’t really think in terms of “selfishness” and “altruism.” Clippy isn’t clipping “for herself,” or “for others.” Clippy is just clipping.

Perhaps a subjectivist might try to follow Clippy’s philosophical lead, here. Who needs the welfare ontology? Why do we have to talk about what is good “for” people? And perhaps, ultimately, we can leave the welfare ontology behind. In the context of impartial altruism, though, I think its appeal is that it captures a basic sense in which altruism is about helping people; it consists, centrally, in some quality of “Other-directedness,” some type of responsiveness to Others (we can think of altruism as encompassing non-welfarist considerations, too, but I’m setting that aside for now). And indeed, conceptions of altruism that start to sound more “Clippy-like,” to my ear, also start to sound less appealing.

Thus, for example, in certain flavors of utilitarianism, “people” or “moral patients” can start to fade from the picture, and the ethic can start to sound like it’s centrally about having a favored (or disfavored) type of stuff. Clippy wants to “tile the universe” with paperclips; totalist hedonistic utilitarians want to tile it with optimized pleasure; but neither of them, it can seem, are particularly interested in people (sentient beings, etc). People, rather, can start to seem like vehicles for “goodness” (and perhaps, ultimately, optional ones; here a friend of mine sometimes talks about “pleasure blips” — flecks of pleasure so small and brief as to fail to evoke a sense that anyone is there to experience them at all; or at least, anyone with has the type of identity, history, plans, projects, and so forth that evoke our sympathy). The good, that is, is primary; the good for, secondary: the “for” refers to the location of a floating bit of the favored stuff (e.g., the stuff is “in” Bob’s life, experience, etc), rather than about what makes that stuff favored in the first place (namely, Bob’s having an interest in it).

Or at least, this can be the vibe. Actually, I think it’s a misleading vibe. The most attractive forms of hedonistic utilitarianism, I think, remind you that there is, in fact, someone experiencing the pleasure in question, and try to help you see through their eyes — and in particular, to see a pleasure which, for all the disdain with which some throw around the word, would appear to you, if you could experience it, not as a twitching lab rat on a heroin drip, but as something of sublimity and energy and boundlessness; something roaring with life and laughter and victory and love; something oceanic, titanic, breathtaking; something, indeed, beyond all this, far beyond. I am not a hedonist — but I think that casual scorn towards the notion of pleasure, especially actually optimal pleasure (not just ha-ha optimal, cold optimal, sterile optimal), is foolish indeed.

That said, even if totalist utilitarianism avoids the charge of being inadequately “people focused,” I do think there are legitimate questions about whether “people” are, as it were, deeply a thing (see here for some discussion). Indeed, some total utilitarian-ish folks I know are motivated in part by the view that they aren’t. Rather, what really exists, they suspect, are more fine-grained things — perhaps “experiences,” though here I get worried about the metaphysics — which can be better or worse. For simplicity, I’m going to set this bucket of issues aside, and assume the “slots” in the welfare ontology, and the preference ontology, are in good order. (I’ll note in passing, though, that I think accepting this sort of deflationary picture of “people” provides a strong argument for totalism, in a way that bypasses lots of other debates. For example, if you can erase boundaries between lives, or redraw them however you want, then e.g. average utilitarianism, egalitarianism, prioritarianism, and various person-affecting views collapse: you can make the same world look like zillions of tiny and worse lives, or like one big and better life, or like very unequal lives; you can make preventing an agent’s death look like adding new people, and vice versa; and so on.)

Suppose, then, that you are a meta-ethical subjectivist, interested in being as impartially altruistic as possible with some part of your life, and who wants to use the standard, people-focused welfare ontology as the basis for your altruism. You’ve got a preference ontology already — you can see Clippy, out there, preferring those paperclips — but your welfare ontology is on you. How should you construct it?

III. Imposing your “altruistic” will

Theories of welfare are often said to come in roughly three varieties:

Hedonist (welfare is determined by certain types of experiences, notably pleasure/not-pain),
Preference-based (welfare is determined by certain types of preference-satisfaction), and
Objective list theories (welfare is determined by the possession of some not-just-experiences stuff that I wrote down on a list — stuff like friendship, knowledge, accomplishment, and so on).

Here my girlfriend notes: “wow, what an unappealing bunch of options.” And we might add: and so not-obviously joint-carving/exhaustive? That said, note that the “objective list” can be made as long or short as you like — hedonism is really just a very short version — and that it can also include “hybrid goods” that include both a subjective and an objective component, e.g. “pleasure taken in genuinely good things,”or “the experience of preferences actually being satisfied.”

For present purposes, though, I’m interested in a different carving, between:

The view that welfare is determined by overall idealized preference satisfaction, and
All other theories of welfare.

To a first approximation, that is, (1) takes the empirical preference ontology, and makes it into the normative welfare ontology. Impartial altruism, on this view, looks like a form of unadulterated preference utilitarianism: you find some way of impartially aggregating everyone’s overall preferences together, and then act on that basis. As I gestured at above, this option negates the difference between selfish preferences (typically associated with the notion of welfare) and all the rest, but perhaps that’s OK. However, it faces other problems, which I’ll discuss in more detail below.

For now, let’s focus on (2), a category that includes all hedonist and objective list theories, but also all limited preference-based views — that is, views that try to identify some subset of your preferences (for example, your “self-regarding preferences”), satisfaction of which determines your welfare.

I want to highlight what I see as a basic objection to category (2): namely, it implies that acting out of altruistic concern for a given agent will predictably differ from doing what that agent would want you to do, even where the agent’s desires are entirely informed, idealized, and innocuous (indeed, admirable). That is, these views fail a certain kind of golden rule test (see Christiano (2014) for some interesting discussion): the person on the “receiving end” of your altruism — the person you are supposed to be helping — will predictably wish you had acted differently (though they’ll still, often, be happy that you acted at all).

Suppose, for example, that you’re trying to be altruistic towards me, Joe. You have the chance to (a) send me a ticket for a top-notch luxury cruise, or (b) increase by 5% the chance that humanity one day creates a Utopia, even assuming that I’ll never see it. I’m hereby asking you: do (b). In fact, if you do (a), for my sake, I’m going to be extremely pissed. And it seems strange for your altruism to leave me so pissed off (even if it’s better, according to me, than nothing). (Note: the type of presents I prefer in more normal circumstances is a different story; and I suspect that in practice, trying to channel someone’s more charitable/altruistic aspirations via your gifts, rather than just giving them conventionally nice things/resources, is generally tough to do in a way they’d genuinely prefer overall.)

Or consider another case, with less of a moral flavor. Sally wants pleasure for Sally. Bob mostly wants pleasure for Sally, too. You, an altruist, can give (a) five units of pleasure to each of Sally and Bob, or (b) nine units, just to Sally. As a hedonist about well-being, you choose (a). But when Sally and Bob find out, they’re both angry; both of them wanted you to choose (b), and would continue to want this even on idealized reflection. Who are you to choose otherwise, out of “altruism” towards them?

The intuition I’m trying to pump, here, is that your altruism, in these cases, seems to involve some aspect of “imposing your will” on others. My will was towards Utopia, but you wanted to “promote my welfare,” not to respect me or to further my projects. Bob and Sally have a synergistic, Sally-focused relationship going on — one that would, we have supposed, withstand both of their reflective scrutiny. But you “knew better” what was good for them. In fact, suspiciously, you’re the only one, ultimately, that ends up preferring the chosen outcome in these cases. An altruist indeed.

IV. Johnny Appleseeds of welfare

I think this objection applies, to some extent, regardless of your meta-ethics. But I think it bites harder for subjectivists. For robust normative realists, that is, this is a relatively familiar debate about paternalism. For them, there is some mind-independent, objective fact about what is good for Bob and Sally, and we can suppose that you, the altruist, know this fact: what’s good for Bob and Sally, let’s say, is pleasure (though obviously, we might object, as people often do in the context of paternalism, to positing such knowledge). The question for the realist, then, becomes whether it’s somehow objectionable to do what’s objectively best for Bob and Sally, even if they both want you to do something else, and would continue to want this on reflection (though some forms of normative realism expect Bob and Sally to converge on your view on reflection, in which case the paternalistic flavor of the case weakens somewhat). In a sense, that is, for the normative realist, you’re not just imposing your will on Bob and Sally, here; you’re imposing God’s will; the will of the normative facts about welfare; the will of the universe, whose point of view you, as an impartial altruist, occupy.

But the subjectivist can appeal to no such higher normative authority. The subjectivist, ultimately, had to decide what they were going to treat as “good for” people; they chose pleasure, in this case; and so, that’s what their altruism involves giving to people, regardless of whether those people want other things more. The altruistic subjectivist, that is, is like the Johnny Appleseed of welfare (Johnny Appleseed was a man famous in America for planting lots of apple trees in different places). Whatever your agricultural aspirations, he’s here to give you apple trees, and not because apple trees lead to objectively better farms. Rather, apple trees are just, as it were, his thing.

It can feel like some aspect of other-directedness has been lost, here. In particular: this sort of altruism can feel like it’s not, fully, about helping the Other, on their own terms. Rather, it’s about giving the Other what the Self wants the Other to have. Of course, welfare, for the altruistic subjectivist, isn’t conceptualized as “the stuff I want other people to have”; rather, it’s conceptualized as “the stuff that seems actually good for other people to have; the genuinely valuable stuff.” But it still feels like the Self’s preferences are lurking in the background, and limiting the role that even very innocuous preferences in the Other can play, with no ultimate justification save that the Self just isn’t into that kind of thing.

Of course, the subjectivist altruist can more easily meet a more minimal condition: namely, that the recipients of her altruism be glad to have had the interaction at all, even if they would’ve preferred a different one. And indeed, almost every theory of welfare involves close ties to a patient’s preference-like attitudes. In the case of hedonism, for example, pleasure and pain are plausibly constituted, in part, by certain motivation-laden attitudes towards internal states: it’s not clear that it really makes sense to think of someone as being fully indifferent to pleasure and pain — at least not in a way that preserves what seems important about pleasure/pain for welfare. Similarly, limited preference-based views are connected to preferences by definition; many classic items on the objective list (friendship, appreciation of beauty, virtue) are at least partly constituted by various pro-attitudes; and the ones that aren’t (knowledge, accomplishment) seem much less compelling components of welfare in the absence of such attitudes (e.g., if someone has no interest in knowledge, it seems quite unclear that possessing it makes their life go better). Thus, subjectivists who accept these theories of welfare are still well-positioned to do things that other agents (especially human agents) typically like and want, other things equal. Such agents just might want other things much more.

That said, we can also imagine cases in which someone actively prefers no interaction with the altruist whatsoever (here I’m reminded of a line from Reagan (retweets not endorsements): “The nine most terrifying words in the English language are: I’m from the Government, and I’m here to help.”). It might be, for example, that even if I like pleasure, other things equal, if you give me a bunch of pleasure right now, you’ll distract me from other projects that are more important to me. Indeed, I’ve spoken, in the past, with people who accept that knock-on effects aside, they are morally obligated to push me into an experience machine and trap me inside, despite my adamant preference to the contrary (though interestingly, a number also profess to being something like “weak-willed” in this respect; they accept that they “should” do it, but somehow, they don’t want to; it must be those pesky, biasing intuitions, inherited from that great distorter of moral truth, evolution/culture/psychology, that are getting in the way…). Especially assuming that I wouldn’t, on idealized reflection, prefer to be put in an experience machine in this way, this sort of “altruism” gives me a feeling of: stay the hell away. (Though to be clear, the people I have in mind are in fact extremely nice and cooperative, and I don’t actually expect them to do anything like this to me or others; and note, too, that we can give accounts of why it’s wrong to push me into the machine that don’t appeal to theories of welfare.)

Regardless, even if those who accept views of welfare in category (2) can manage, perhaps in combination with other norms, to ensure that recipients of altruism always prefer, other things equal, to so receive, the “this isn’t fully about the Other” objection above feels like it still stands. The Other offers the Self a wishlist, ranked by how much they want it; the Self confines their gifts to ones on the wish-list, yes; but they don’t choose, fully, according to the Other’s ranking. How, then, do they choose? For subjectivists, it must be: according to some criteria that is theirs, and not the Other’s (God has no opinion). Little Billy wants an Xbox much more than a wildflower guide; Granny wants Little Billy to have a wildflower guide; neither of them is objectively right about what’s best “for Billy”; rather, they are in a kind of clash of wills (albeit, one Billy prefers over nothing) over what will become of Little Billy’s life; and Granny, the altruist, and the one with money for presents, is winning.

Even in this “wish-list” scenario, then, it still feels like there’s a pull towards something less paternalistic — an altruism that channels itself, fully, via the will the recipient, rather than restricting itself according to the will of the altruist; an altruism that gives recipients what they maximally want; that gives Billy the Xbox; Joe the Utopia; Bob and Sally the Sally-pleasure; and farmers the trees they most want to plant. This is the altruism I meant to point at with option (1) above; let’s turn to that option now.

V. Can’t you just give people what they want?

My impression is that unrestricted idealized preference-based views about welfare (that is, the view that welfare just consists in getting what idealized you wants overall) are not popular, for a variety of reasons. Here I’ll start with a few objections that don’t seem to me decisive. Then I’ll move on to some that seem more so.

One objection is that by failing to distinguish between self-regarding and altruistic preferences, such theories fail to capture the concept of welfare as typically understood. I grant that there is some problem here — I do think that there’s an intuitive distinction between more self-interest flavored and more altruistic preferences, which it would be nice to be able to capture and make use of — but overall, it doesn’t worry me much. Indeed, eliding this distinction is necessary to avoid the “imposing your will” objections discussed in the previous section. That is, it’s precisely because Bob is not conventionally selfish re: Sally’s pleasure that his will clashes with that of an altruist bent on promoting his self-interest. If avoiding such a clash requires giving up on a deep distinction between the selfish and altruistic parts of someone’s preferences, I currently feel OK with that.

Another objection is that this type of view can lead to loopy-type paradoxes resulting from people having preferences about their own preferences/welfare, or about the preferences/welfare of others. Bradley (2009), for example, explores cases in which an agent has a preference X that his life go badly, where his life’s badness is determined by his degree of preference satisfaction such a way that, if preference X is satisfied, his live goes well, in which case X isn’t satisfied, in which case his life goes badly, and so on. And Trammel (2018) explores a couple of similar problems: for example, if we construct some overall preference ranking U by aggregating individual preference rankings, then what is U in a city of “Saints,” all of whom take U as their individual preference ranking? I expect lots of problems in this vein, and I don’t have great suggestions for how to eliminate them. Indeed, in my experience, they arise in force basically as soon as you try to actually pin down what an unrestricted preference utilitarianism should do in given case — given that their preference utilitarianism, too, is in one of the preference slots.

That said, I also don’t think that these problems are unique to unrestricted preference-based views about welfare (or overall social utility) in particular. Rather, to me they have the “liar-like” flavor that comes from allowing certain types of self-reference and loopy-ness in general (see Bradley (2009) for more discussion). Thus, for example, if we allow agents to have preferences about preferences at all (e.g., “I prefer that this preference not be satisfied”), then we should be unsurprised if we get problematic loops: but such loops are a problem for even defining the degree to which this person’s preferences are satisfied — a problem that comes well before we start debating what normative role we should give to preference satisfaction. To me, that is, these loops don’t currently seem all that much more problematic, from a normative perspective, than the claim that true beliefs can’t be part of welfare, because it’s possible to have liar-like beliefs. That said, I haven’t thought much about it.

A third objection is from “worthless preferences” — that is, preferences the satisfaction of which just doesn’t seem that good for the agent in question. Examples might include: a man who prefers that the number of tennis balls in a particular far-away box be even (he’ll never know); a man who prefers to count the blades of grass in his backyard, over and over (he gets no pleasure from it). Is the satisfaction of these preferences really good for such men? If we compare these lives with ones filled with conventional forms of flourishing, are we really limited to just tallying up degrees of preference satisfaction, and then saying, beyond that, “different strokes for different folks”?

Ultimately, I think this objection mostly belongs in the following sections, on “forceful/decisive” objections, and I discuss in more depth there. I include it here too, though, because my sense is that I feel it less than many other people. For example, when I imagine a man, far away, who really wants someone to add an extra tennis ball to a box he’ll never see (imagine him hoping, praying, weeping), I feel pretty happy to do it, and pretty happy, as well, to learn that it got added by someone else. This example is complicated somewhat by the fact that the favor is so cheap; and it’s a further question whether I’d want to pay the same type of costs to add the ball that I might pay in order to help someone else flourish in more conventional ways. Still, though, I don’t feel like I’m just saying “eh, I see no appeal, but this is cheap and I’m not certain.” Rather, I feel a direct sort of sympathy towards the man, and a corresponding pull to help — a pull that comes in part, I think, from trying to look at the world through this man’s eyes, to imagine actually caring about the tennis ball thing. Similarly, when I imagine a man who prefers counting blades of grass to writing great novels or seeking great bliss, I don’t have some feeling like: “your thing is worthless, grass-counting man.” Rather, I feel more like: “people are into all kinds of things, and I like stuff that would look silly, from the outside, to aliens, too.”

I think normative realism lends itself to being more immediately dismissive, here. “Preferences,” the realist thinks, “can just be any old thing. Look at this tennis ball guy, for example. He’s blind to the Good. If only he could see! But certainly, you do him no favors by playing along with his mistaken normative world, rather than giving him what’s really good for him, namely [the realist’s favored thing].” But for me, on subjectivism, this sort of impulse weakens, especially in light of the “imposing your will” objection discussed in the previous section, and of related “golden rule” type arguments about how I would like this guy to treat me.

Of course, our relationships to the preferences of other agents — especially ones with power, or who could’ve easily had power — bring in further questions, beyond non-instrumental altruism, about more instrumentally flavored forms of cooperation/trade. The lines separating these considerations from others get blurry fast, especially if, like a number of people I know, you try to get fancy about what broadly “instrumental” cooperation entails. For now, though, I’m bracketing (or, trying to bracket) this bucket of stuff: the question is about the goals that you’re trading/cooperating in order to achieve — goals absent which trade/cooperation can’t get started, and which can themselves include an impartially altruistic component.

To the extent that I can distinguish between (fancy) instrumentally cooperative stuff and direct altruism, that is, I find that with the latter, and even with so-called “worthless preferences,” I can still get into some preference-utilitarianism type of mindset where I think: “OK, this guy is into counting grass, Clippy is into paperclips, Bob is apparently especially excited about Sally having pleasure, Little Billy wants an Xbox — I’ll look for ways to satisfy everyone’s preferences as much as possible. Can we make grass out of paperclips?” That is, I feel some pull to mimic, in my preference ontology, the type of self-transcendence that the altruist performs in the welfare ontology (and no surprise, if we equate preference satisfaction and welfare). On such an approach, that is, I make of my preference slot (or at least, the portion of it I want to devote to altruism) an “everyone slot” — a slot that (modulo un-addressed worries about problematic loops) helps all the other slots get what they want, more; a slot that’s everyone’s friend, equally. And in doing so, I have some feeling of trying to leave behind the contingencies of my starting values, and to reach for something more universal.

But when I think about this more, the appeal starts to die. Here are a few reasons why.

VI. Factories and forest fires

As a starter: whose preferences count? This comes up already in considering Clippy. If I imagine a conscious, emotionally vulnerable version of Clippy — one who really cares, passionately, about paperclips; who lies awake at night in her robot bed, hoping that more paperclips get made, imagining their shining steel glinting in the sun — then I feel towards Clippy in a manner similar to how I feel about the tennis-ball man; if I can make some paperclips that Clippy will never know about, but that she would be happy to see made, I feel pretty happy to do it (though as above, the resources I’m ready to devote to the project is a further question). But if we start to strip away these human-like traits from Clippy — if we specify that Clippy is just an optimizer, entirely devoid of consciousness or emotion; a mobile factory running sophisticated software, that predictably and voraciously transforms raw material into paperclips; a complex pattern that appears to observers as a swelling, locust-like cloud of paperclips engulfing everything that Clippy owns (to avoid pumping other intuitions about cooperation, let’s imagine that Clippy respects property rights) — then pretty quickly I start to get less sympathetic.

But what’s up with that? Naively, it seems possible to have preferences without consciousness, and especially without conventionally mammalian emotion. So why would consciousness and emotion be preconditions for giving preferences weight? One might’ve thought, after all, that it would be their preference-y-ness that made satisfying them worthwhile, not some other thing. If preferences are indeed possible without consciousness/emotion, and I deny weight to the preferences of a non-conscious/non-emotional version of Clippy, I feel like I’m on shaky ground; I feel like I’m just making up random stuff, privileging particular beings — those who I conceive of in a way that hits the right sympathy buttons in my psychology — in an unprincipled way (and this whole “consciousness” thing plausibly won’t turn out to be the deep, metaphysically substantive thing many expect). But when I start thinking about giving intrinsic weight to the preferences of unconscious factories, or to systems whose consciousness involves no valence at all (they don’t, as it were, “care”; they just execute behaviors that systematically cause stuff — assuming, maybe wrongly, that this is a sensible thing to imagine), then I don’t feel especially jazzed, either.

This worry becomes more acute if we start to think of attributing preferences to a system centrally as a useful, compressed way of predicting its behavior rather than as some kind of deep fact (whatever that means; and maybe deep facts, in this sense, are rare). If we get sufficiently loose in this respect, we will start attributing preferences to viruses, economic systems, evolutionary processes, corporations, religions; perhaps to electrons, thermostats, washing machines, and forest fires. From a “fancy instrumental cooperation” perspective, you might be able to rule some of these out on the grounds that these “agents” aren’t the right type to e.g. repay the favor; but as I said, I’m here talking about object-level values, where weighting-by-power and related moves look, to me, unappealing (indeed, sometimes objectionable). And note, too, that many proxy criteria people focus on in the context of moral status — for example, metrics like brain size, cognitive complexity, and so on — seems more relevant to how conscious a system is, than to how much it has, or doesn’t have, preferences (though one can imagine accounts of preferences that do require various types of cognitive machinery — for example, machinery for representing the world, whatever that means).

Of course, all welfare ontologies face the challenge of determining who gets a slot, and how much weight the slot is given; and preference utilitarianism isn’t the only view where this gets gnarly. And perhaps, ultimately, consciousness/valence has a more principled tie to the type of preferences we care about satisfying than it might appear, because consciousness involves possessing a “perspective,” a set of “eyes” that we can imagine “looking out of”; and because valence involves, perhaps, the kind of agential momentum that propels all preferences, everywhere (thanks to Katja Grace for discussion). Still, the “thinner” and less deep you think the notion of preference, the worse it looks, on its own, as a basis for moral status.

VII. Possible people craziness

Another objection: preference utilitarianism leaves me unexcited and confused, from a population ethics perspective, in a way that other theories of welfare do not. In particular: in thinking about population ethics, I feel inclined, on both theoretical and directly intuitive grounds, to think about the interests of possible people in addition to those of actually or necessarily existing people (preference utilitarianisms that don’t do this will face all the familiar problems re: comparing worlds with different numbers of people — see, e.g., Beckstead (2013), Chapter 4). Thus, for example, when I consider whether to create Wilbur, I ask questions about how good life would be, for Wilbur — even though Wilbur doesn’t yet exist, and might never do so.

One straightforward, totalism-flavored way of formulating this is: you give all possible people “slots,” but treat them like they have 0 welfare in worlds where they don’t exist (people you can’t cause to exist therefore never get counted when, per totalism, you’re adding things up). On this formulation, though, things can get weird fast for the preference utilitarian (unlike, e.g., the hedonistic utilitarian), because at least on one interpretation, existence isn’t a precondition for preference satisfaction: that is, possible people can have non-zero welfare, even in worlds where they’re never created.

Suppose, for example, that I am considering whether to create Clippy. You might think that the altruistic thing to do, with respect to Clippy, is to create her, so she can go around making paperclips and therefore satisfying her preferences. But wait: Clippy isn’t into Clippy making paperclips. Clippy is into paperclips, and my making Clippy, let’s suppose, uses resources that could themselves be turned into clips. Assuming that I’m in a position to make more clips than Clippy would be able to make in her lifetime, then, the thing to do, if I want to be altruistic towards Clippy, is to make paperclips myself: that’s what Clippy would want, and she’d be mad, upon coming to exist, if she learned I had chosen otherwise.

But Clippy isn’t the only one who has preferences about worlds she doesn’t exist in. To the contrary, the space of all possible agents includes an infinity of agents with every possible utility function, an infinite subset of which aren’t picky about existing, as long as they get what they want (see Shulman (2012) for exploration of issues in a related vein). Trying to optimize the universe according to their aggregated preferences seems not just hopeless, but unappealing — akin to trying to optimize the universe according to “all possible rankings over worlds.” Maybe this leads to total neutrality (every ranking has a sign-reversed counterpart?), maybe it leads to some kind of Eldritch craziness, maybe actually it turns out kind of nice for some reason, but regardless: it feels, Dorothy, like we’re a long way from home.

Of course, this sort of “give possible people a slot” approach isn’t the only way of trying to fashion a preference utilitarian population ethic that cares about possible people. Perhaps, for example, we might require that Clippy actually get created in a world in order for her to have welfare in that world, despite the fact that implementing this policy would leave Clippy mad, when we create her, that we didn’t do something else. And there may be lots of other options as well, though I don’t have any particularly attractive ones in mind. For example, I expect views that advocate for “tiling the universe” with preference satisfaction to look even more unappetizing than those that advocate for tiling it with e.g. optimally pleasant experience; here I imagine zillions of tiny, non-conscious agents, all of whom want a single bit to be on rather than off, for two plus two to equal four, or some such (though obviously, this vision is far from a steel-man). And views that solely try to minimize preference dissatisfaction — sometimes called “anti-frustrationist” views — prefer nothingness to Utopia + a single papercut.

VIII. Sadism, randomness, OfficeMax

Ultimately, though, I think my true rejection of unrestricted preference utilitarianism is just: despite my willingness to put tennis balls in boxes, and to make some limited number of paperclips for (suitably sympathetic?) robots, when push comes to shove I do actually start to get picky about the preferences I’m excited to devote significant resources to satisfying (here I expect some people to be like: “uhh… duh?“).

Suppose, for example, that here I am, in a position to save the lives of humans, prevent the suffering of animals, help build a beautiful and flourishing future, and so on; but I learn that actually, there are a suitably large number of paperclip maximizers in distant star systems that the best way to maximize aggregate preference satisfaction is to build paperclip factories instead (let’s assume that the Clippers will never find out either way, and that they and the Earthlings will never interact). Fancy instrumental cooperation stuff aside, I’m inclined, here, just to acknowledge that I’m not really about maximizing aggregate preference satisfaction, in cases like this; burning or sacrificing things of beauty and joy and awareness and vitality, in order to churn out whatever office supplies happen to be most popular, across some (ill-defined, contingent, and often arbitrary-seemiing) set of “slots,” just isn’t really my thing.

Indeed, cases like this start to make me feel “hostage,” in a sense, to whatever utility functions just happen to get written into the slots in question. I imagine, for example, living in a world where a machine churns out AI system after AI system, each maximizing for a different randomly selected ranking over worlds, while I, the preference utilitarian, living in a part of a universe these AI systems will never see or interact with, scramble to keep up, increasingly disheartened at the senselessness of it all. As I wipe the sweat off my brow, looking out on the world I am devoting my life to, effectively, randomizing, I expect some feeling like: wait, why am I doing this again? I expect some feeling like: maybe I should build a Utopia instead.

And once we start talking about actively sadistic preferences, I have some feeling like: I’m out of here. Consider a sadist who just really wants there to be suffering in a certain patch of desert they’ll never see. You might say: “ah, satisfying this preference would conflict with someone else’s stronger preference not to suffer, so the sadist always loses out; sorry, sadist” (are you really sorry?). Hmm, though: we never said anything about just how strong the sadist’s preference was, or, indeed, about how we are counting “strength.” What’s more, suppose that there are tons of such sadists, spread out across the universe, all obsessed with this one patch of desert, all totally oblivious to what happens in it (thanks to Katja Grace for pointing me to examples in this vein). If you make the sadists “big” and numerous enough, eventually, the naive preference utilitarian conclusion becomes clear, and I’m not on board.

Perhaps you say: fine, forget the sadists, ban them from your slots, but keep the rest. But what kind of arbitrary move is that? Are they not preference-havers? When you prick them, do they not bleed? Whence such discrimination? We got into this whole preference utilitarian game, after all, to stop “imposing our will,” and to be genuinely “responsive to the Other.” Who decided that sadists aren’t the Other, if not “our will”?

Indeed, in some sense, for the preference utilitarian, this example is no different from a non-sadistic case. Suppose, for example, that the universe contains tons of hyper-focused Clippers, all of whom want that patch of desert in particular turned into clips; but there is also a conservationist whose adamant preference is that it stay sand. Here, the preference utilitarian seems likely to say that we should accept the will of the majority: clips it is. But from the perspective of the preference ontology, what’s, actually, the distinction between this case and the last? Whoever would suffer for the sake of the oblivious sadists, after all, is ultimately just someone whose strong preferences conflict with theirs: if you draw the cases on the whiteboard, you can make them look the same. But somehow suffering, at least for me, is different; just as Utopia, for me, is different. Somehow (big surprise), at the end of the day, I’m not a preference utilitarian after all.

IX. Polytheism

Overall, then, I think that subjectivists (and to a lesser extent, realists) aspiring towards impartial altruism are left with the following tension. On the one hand, if they aren’t unrestricted preference utilitarians, then their altruism will involve at least some amount of “imposing” their will on the people they are trying to help: they will be, that is, Johnny Appleseeds, who offer apple trees to people who want cherry trees more. But if they are unrestricted preference utilitarians, their ethic looks unappealing along a variety of dimensions — not least, because it implies that in some cases, they will end up turning the farm into random office supplies, torture chambers, and the like.

Currently, my take is that we (or, many of us) should accept that even in altruism-flavored contexts, we are, to some extent, Johnny Appleseeds. We have a particular conception of the types of lives, experiences, opportunities, forms of flourishing, and so on that we hope other agents can have — a conception that most humans plausibly share to a large extent, especially for most present-day practical purposes, but that other agents may not, even on idealized reflection; and a conception which no higher and more objective normative authority shares, either. This doesn’t mean we go around giving agents things they don’t want; but it might well mean that to the extent they don’t want what we’re offering, we focus more on other agents who do, and that we are in that sense less “impartial” than we might’ve initially thought (though this depends somewhat on how we define impartiality; on some definitions, if you’re equally happy to give e.g. pleasure, knowledge, flourishing, etc to anyone who wants it, the “equally” and the “anyone” leave you impartial regardless).

That said, I do think there’s room to go a lot of the way towards helping agents centrally on their own terms — even Clippers, tennis-ballers, and so on — especially when you find yourself sharing the world with them already; to err very hard, that is, on the side of giving Little Billy the Xbox. Some of this may well fall under object-level altruism; but even if it doesn’t, there’s a lot of room for more fully preference-utilitarian, golden-rule type thinking, in the context of various fancy (or not so fancy) types of cooperation, reciprocity, norm-following, and general “niceness.” My current best guess is that trying to shoe-horn all altruism/morality-flavored activity into this (ultimately instrumental?) mold gives the wrong verdicts overall (for example, re: moral concern towards the powerless/counterfactually powerless), but I think it has a pretty clear and important role to play, especially as the preferences of the agents in question start to diverge more dramatically. After all, whatever their preferences, many agents have an interest in learning to live together in peace, make positive sum trades, coordinate on mutually-preferred practices and institutions, and so on — and these interests can ground commitments, hypothetical contracts, dispositions, and so on that involve quite a bit of thinking about, and acting on the basis of, what other agents would want, regardless of whether you, personally, are excited about it.

All this talk of personal excitement, though, does raise the question, at least for me: is the Johnny Appleseed model really altruism, as it’s often understood? Indeed, I think the subjectivist can start to worry about this before we start talking about Johnny Appleseed stuff. In particular: once we have let go of the normative realist’s Good, with a capital G — the God who was supposed to baptize some privileged forms of activity as occurring “from the point of view of the universe,” the thing that everyone (or, everyone whose vision is not irreparably clouded) was, in some sense, supposed to be on board with (see, e.g., a common interpretation of the Good as generating “agent-neutral reasons,” reasons that apply to everyone; an interpretation that it’s quite unclear the subjectivist can sustain) — we might wonder whether all activity, however directed towards others, is relegated to the land that the old God would’ve dismissed as “mere preference,” just “what you like.” Saving lives, curing diseases, opening cages, fighting oppression, building wise and flourishing civilizations — are these things so different, for subjectivists, than collecting stamps, or trying to tile the universe with neon pictures of your face? At the end of the day, it’s just your slot; just your “thing.” You wanted, perhaps, a greater sense of righteousness; a more Universal Wind at your back; a more Universal Voice with which to speak. But you were left with just yourself, the world around you, other people, the chance to choose.

(Here I think of this verse from Jacob Banks, and about what he takes himself to have learned from travelers, and from mirrors — though I don’t think it’s quite the same).

Perhaps, then, you responded to this predicament by seeking a different Universal voice: not the voice of God, or of the realist’s Good, but the voice of all agents, aggregated into a grand chorus. Perhaps, in part (though here I speculate even more than elsewhere), you sought to avoid the burdens of having a will yourself, and certainly, of “imposing it.” Perhaps you sought to deny or accept the will of others always by reference to some further set of wills, other than your own: “I’d love to do that for you, Clippy; it’s just that Staply and the others wouldn’t like it.” Perhaps some part of you thought that if you made yourself transparent enough, you wouldn’t exist at all; other agents will simply exist more, through you (an aim that gives me some feeling of: what makes them worth it, in your eyes, but yourself such nothingness?). Perhaps such transparency seemed a form of ethical safety; a way, perhaps, of not being seen; of avoiding being party to any fundamental forms of conflict; of looking, always, from above, instead of face to face.

Indeed, it’s easy, especially for realists, to assume that if one is trying to promote “the Good,” or to do the “Right Thing,” or to do what one “Should Do,” then by dint of pitching one’s intentions at a certain conceptual level (a move sometimes derided as “moral fetishism“), the intentions themselves, if sincere, are thereby rendered universally defensible. Perhaps, empirically, one fails to actually promote the Good; perhaps one was wrong, indeed, about what the Good consists in; but one’s heart, we should all admit, was in the right place, at least at a sufficient level of abstraction (here I think of Robespierre, talking endlessly of virtue, as blood spatters the guillotine again and again). And perhaps there is some merit to “fetishism” of this kind, and to thinking of hearts (both your heart, and the hearts of the others) in this way. Ultimately, though, the question isn’t, centrally, what concepts you use to frame your aspirations, except insofar as these concepts predict how you will update your beliefs and behavior in response to new circumstances (for example, perhaps in some cases, someone’s professing allegiance to “the Good” is well-modeled as their professing allegiance to the output a certain type of population-ethics-flavored discourse, the conclusion of which would indeed change how they act). Ultimately, the question is what, on the object level, you are actually trying to do.

And here the preference utilitarian tries, in a sense, to do everything at once; to make of their song the Universal Song, even absent the voice of God. But without God to guide it, the Universal Chorus ultimately proves too alien, too arbitrary, and sometimes, too terrible. In trying to speak with every voice, you lose your own; in trying to sing every song at once, you stop singing the songs you love.

Thus, the turn to the Johnny Appleseed way. But now, you might think, the line between cancer curing (altruism?) and stamp collecting (random hobby?) has gotten even thinner: not only does it all come down to your preferences, your slot, your thing, but your thing isn’t even always fully responsive to the Other, the Universal Chorus, on its own terms. You’re not, equally, everyone’s friend, at least not from their idealized perspective. Your “helping people” has gotten pickier; now, apparently, it has to be “helping people in the way that you in particular go in for.” Is that what passes for impartial altruism these days?

I think the answer here is basically just: yes. As far as I can currently tell, this is approximately as good as we can do, and the alternatives seem worse.

I’ve been describing realism as like monotheism: the realist looks to the One True God/Ranking for unassailable guidance. Subjectivism, by contrast, is like polytheism. There are many Gods — far too many, indeed, to fight for all of them at once, or to try to average across. This is the mistake of the preference utilitarian, who tries to make, somehow, from the many Gods, another monotheism — but one that ultimately proves grotesque. Johnny Appleseed, though, acknowledges, and does not try to erase or transcend, the fundamental plurality: there is a God of consciousness, beauty, joy, energy, love; there is a God of paperclips; there is a God of stamps, and of neon pictures of your face; there is a God of fire, blood, guillotines, torture; there is a God of this randomized ranking of worlds I just pulled out of a hat. And there are meta-Gods, too; Gods of peace, cooperation, trade, respect, humility; Gods who want the other Gods to get along, and not burn things they all care about; Gods who want to look look other Gods in the eye; to understand them; maybe even to learn from them, to be changed by them, and to change them in turn (an aspiration that the Bostrom’s “goal content integrity” leaves little room for).

All of us serve some Gods, and not others; all of us build, and steer, the world, whether intentionally or not (though intention can make an important difference). Indeed, all of us build, and steer, each other, in large ways and small — an interdependence that the atomized ontology of “slots” may well be ill-suited to capture. Johnny Appleseed fights for the God of apple trees. You fight for your Gods, too.

(Or at least, this is my current best guess. Some part of my heart, and some “maybe somehow?” part of my head, is still with the monotheists.)

X. Cave and sun

All that said, at a lower level of abstraction, I am optimistic about finding more mundane distinctions between collecting stamps and savings lives, curing cancers, preventing suffering, and so on. For one thing, the latter are activities that other people generally do, in fact, really want you to do. Granted, not everyone wants this as their favorite thing (Clippy, for example, might well prefer that your altruism towards her were more paperclip-themed — though even she is generally glad not to die); but the population of agents who are broadly supportive, especially on present-day Earth, is especially wide — much wider, indeed, than the population excited about the neon face thing. In this sense, that is, Johnny Appleseed finds himself surrounded, if not by apple tree obsessives, then at least, by lots of people very happy to grow apples. Their Gods, in this sense, are aligned.

We might look for other distinctions, too — related, perhaps, to intuitive (if sometimes hard to pin down) boundaries between self and other, “about me” vs. “about others.” Salient to me, for example, is the fact that wanting other people to flourish, for its own sake, breaks through a certain kind boundary that I think can serve, consciously or unconsciously, as a kind of metaphysical justification for certain types of conventional “selfishness” — namely, the boundary of your mind, your consciousness; perhaps, your “life,” or what I’ve called previously, your “zone.” Sidgwick, in my hazy recollection, treats something like this boundary as impassible by practical reason, and hence posits an irreconcilable “dualism” between rational egoism and utilitarianism — one that threatens to reduce the Cosmos of Duty to a Chaos. And I think we see it, as well, in various flavors of everyday solipsism; solipsism that forgets, at some level, that what we do not see — the lives and struggles and feelings of others; the dreams and ideals and battles of our ancestors; history, the future, the unknown, the territory — still exists; or perhaps, a solipsism which treats the world beyond some veil as existing in some lesser sense, some sense that is not here; not the same type of real as something “in me.”

To care, intrinsically, about what happens to someone else, even if you never find out, is to reject this solipsism. It is to step, perhaps briefly, out of the cave, and into the sun; to reach past the bubble of your mind, and into the undiscovered land beyond. And an aspiration towards impartiality, or to see “from the point of view of the universe,” can be seen as an effort to stay in the sun, fully; to treat equally real things as equally real; to build your house on the rock of the world.

Is trying to build such a house distinctive to altruism? Not from any theoretical perspective: the rock of the world doesn’t actually privilege one set of values over others. Utility functions, after all, are rankings over real worlds, and you can fight for any ranking you like. Clippy cares about the world beyond her map, too; so does a tyrant who wants to live on in the history books, or in people’s memories (is that “about others?”); and even the hedonistic egoist can be understood as endorsing a ranking over full worlds — albeit, one determined solely by the pleasure that occurs in the bits she labels “my mind.” Indeed, in a sense, the point of view of the universe is the place for basically everyone to start; it’s just, the universe itself; the thing that’s actually there, where whatever you’re fighting for will actually happen.

In practice, though, it feels like the difference between zone and beyond-zone, cave and sun, is important to (though not exhaustive of) my own conception of impartial altruism. This difference, for example, is one of the main things I feel inclined to question the egoist about: “are you sure you’re really accounting for the fact that the world beyond you exists? That other people are just as real as you?” Perhaps they are; and perhaps, there, the conversation stalls. But often, the conversation doesn’t stall when I have it with myself. To the contrary, for me, it is when the world seems most real that the case for altruism seems clearest. Indeed, I think that in general, altruism believes very hard in the existence of the world — and impartial altruism, in the entire world. Other things believe this too, of course: it’s a kind of “step 1.” But for the altruistic parts of my own life, it feels like it’s a lot of the way.

(Thanks to Nick Beckstead, Paul Christiano, Ajeya Cotra, and especially Katja Grace for discussion.)

MichaelStJules @ 2024-03-12T12:38 (+2)

A few years late, but this was an interesting read.

I have a few related thoughts:

I've tried to characterize types of subjective welfare that seem important to me here. The tl;dr is that they're subjective appearances of good, bad, better, worse, reasons, mattering. Effectively conscious evaluations of some kind, or particular dispositions for those. That includes the types of preferences we would normally want to worry about, but not merely behavioural preferences in entirely unconscious beings or even conscious beings who would make no such conscious evaluations. But consciousness, conscious evaluation, pleasure, suffering, etc. could be vague and apparently merely behavioural preferences could be borderline cases.
On preferences referring to preferences, you can also get similar issues like two people caring more about the other's preferences than their own "personal" preferences, both positively like people who love each other, or negatively, like people who hate each other. You can get a system of equations, but the solution can go off to infinity or not make any sense at all. There's some other writing on this in Bergstrom, 1989, Bergstrom, 1999, Vadasz, 2005, Yann, 2005 and Dave and Dodds, 2012.
Still, I wonder if we can get around issues from preferences referring to preferences by considering what preferences can actually be about in a way that actually makes sense physically. Preferences have to be realized physically themselves, too, after all. If the result isn't logically coherent, then maybe the preference is actually just impossible to hold? But maybe you still have problems when you idealize? And maybe indeterminacy or vagueness is okay. And since a preference is realized in a finite system with finitely many possible states (unless you idealize, or consider unbounded or infinite time), then the degrees of satisfaction they can take are also bounded.