Alienation and meta-ethics (or: is it possible you should maximize helium?)

By Joe_Carlsmith @ 2021-01-15T07:06 (+19)

In a previous post, I tried to gesture at the possibility of a certain kind of wholeheartedness in ethical life. In this post, I want to examine a question about meta-ethics that seems to me important to this wholeheartedness: namely, whether you can be completely alienated from what you should do. By this I mean: is it possible for what you should do to hold no appeal whatsoever, even given arbitrarily deep and full understanding of reality and of the consequences of your actions? I discuss how three different meta-ethical questions approach this question, and why the differences matter for wholeheartedness.

(Note: this post plays fast and loose with a number of philosophical issues that it would take a lot of extra work to tease apart and pin down precisely; my aim is mostly just to gesture at a general picture that I think is important, not to set it out rigorously).

(Second note: the types of “shoulds” and ”normative facts” I have in mind here are the ones of all-things-considered practical deliberation – deliberation that encompasses both moral and non-moral considerations. Thus, the question of whether I should eat this cupcake, go back to grad school, clip my toenails, etc, are all normative questions in this sense, but not necessarily moral ones).

I.

On some versions of normative realism, the answer to the question above is yes: it might turn out that what you should do, according to the true normativity written into the fabric of the reality, is fully and forever alien from anything you or any other human cares about. It might be, for example, that there is an over-riding moral imperative to maximize the amount of helium gas, or to paint your mother a specific shade of pastel blue, or to eat a prime number of bricks every other Tuesday. Future philosophers might discover this fact (though given their current reliance on intuitions closely tied to actual patterns of human concern, their prospects for finding the truth look slim); but their discovery, even fully comprehended, wouldn’t move them to action. Rather, the human species as a whole would be afflicted with a kind of totalized akrasia: “I know I should eat this brick; I’ve seen the proof, it’s airtight; but no matter how much I reflect and imagine, I just can’t get motivated.”

Or perhaps the better analogy is species-wide sociopathy, an irredeemable indifference to what really matters, that summum bonum, helium. “How can you just sit there, letting these precious opportunities to make helium slip away?!” morally-attuned aliens might say, aghast at our wasteful games, our loves and joys. But we would persist, numb to (but not ignorant of) the gravity of our wrongs.

Or maybe we wouldn’t persist. Maybe we (or some of us) would start dutifully making helium, motivated by the bare fact that doing so is right, good, what we should do, even though entirely un-motivated, at least directly, by what this turns out to consist in. Perhaps we would try to find things we like about helium, or to raise our children to like helium, or, failing that, to rewire our brains. It’s a hard scenario to imagine, partly because it’s so unclear what sort of discovery could prove helium the summum bonum, without also at least helping us to see why making helium might be at all appealing (my girlfriend suggests imagining that the discovery came via some sort of convoluted argument about anthropics — “if we endorse the Self-Sampling Assumption, and the Great Filter is behind us, then it must be the case that in the infinite universe of mathematical structures, all of the logically impossible civilizations valued…” — but even the surprises and confusions of anthropics don’t seem up to the task).

Indeed, it’s tempting to imagine the dutiful helium makers saying something like “Well, I can’t for the life of me see it myself, but apparently making helium is the thing. Weird, right? Yeah, I thought so too. But so it goes: let’s get on it with.” That is, it’s tempting to imagine them thinking of themselves as not understanding something about helium — something they have to, as it were, “defer to normative reality” about. A feature of the set-up, though, is supposed to be that this isn’t the case. They understand everything there is to understand; they have seen through to the core of reality, and reality, it turns out, is really into helium. But they, unfortunately, aren’t.

Maybe the set-up is confused in this respect. In particular, it assumes that there’s an available notion of “understanding” such that you can understand fully not just that you should do X, but why you should do X, and still not be at all motivated to do X (though note that in certain cases, on many realist views, there won’t be a further “why” — it will just be a brute fact that X is good, obligatory, reason-providing, and so forth). That is, in this case, motivation is orthogonal to normative understanding. Normative facts are like facts about which minor league batters have the best batting average: whether you know/understand them is one thing; whether you care, entirely another. Call this “externalist normative realism.”

II.

Not all normative realism need take this form. To the contrary: the most intuitive and naive form of normative realism, I think, treats normative facts, understood fully, as necessarily compelling to our motivational capacities. That is, on this view, if helium is ultimately the highest good, then adequate understanding of reality ought to allow us to see what’s good about it in a manner that makes it actually seem good; the morally-attuned aliens ought, with enough patience and non-coercive effort, to be able to bring us around, such that later, in deciding whether to make more helium, or to help an old dog trapped in a ditch, we’ll be able to return, internally, to the rational source of our allegiance to helium — to that perception of helium’s value that we had when we first really saw what the aliens were talking about — and then to choose with emotional clarity. On this conception, that is, it’s not an open question whether Plato’s cave-dwellers, upon emerging to stand in the raw and shining light of the Good, will turn back with an indifferent shrug. Not, at least, if their eyes were really open: love and true-seeing are too intimately linked. Call this “internalist normative realism.”

This is the type of normative realism that really calls to me at an intuitive level; and I expect that it’s the type of thing that many people hope for out a realist meta-ethic (though it’s not, I think, the thing they are likely to ultimately get, at least not from analytic meta-ethics). It’s also, I think, the one most continuous with a realist’s desire to treat normative facts as similar in metaphysical status to mathematical and scientific facts, while preserving the intimate connection between normative understanding and motivation. For just as we expect creatures capable of understanding the universe to converge on similar mathematical and scientific facts, so too does this form of realism tend (at least in its most naive form) to expect them to converge both in their understanding of the normative facts, and in the patterns of care and concern that flow from this understanding. In the great intergalactic meeting of mature and intelligent civilizations, on this view, we will all be able to share stories about how we started out valuing X, Y, or Z old thing, for whatever random evolutionary reason, but that gradually, as our wisdom and enlightenment expanded, we started to discern the contours of the great Way, the deep compellingness of which became clearer and clearer as the haze of ignorance and folly fell away, until we all arrived, here, in cosmic harmony, all with our helium-making machines, having seen the Truth and followed it.

(I’m planning to talk more about expectations of normative convergence in a future post. In particular, I think realism of this broad type is in tension with Nick Bostrom’s Orthogonality Thesis, at least in the form most relevant to thinking about AI risk — but that the Orthogonality Thesis is probably correct. Note, though, that it’s also possible to expect a great deal of convergence in both normative understanding and patterns of concern even on various anti-realist views).

A general problem with internalist normative realism, though, is that it’s really hard to make sense of. In particular, I think, it’s hard to see what sort of epistemic access we could have to the normative facts, and what sorts of facts they could be, such that suitably comprehensive epistemic contact with them necessarily reshapes our motivations, via a kind of rational causation, to conform with normative reality. Exactly what makes this picture hard to make out is a bit tricky to pin down, but I think the general problem is something to do with Humean intuitions to the effect that facts about the universe — of which normative facts are, on this form of realism, a species — are by themselves motivationally inert; they only motivate if you already have some pre-existing motivation to which they connect. But on internalist realism, these facts can somehow intervene, via your rational understanding, to reconfigure your motivational system entirely, regardless of what concerns you started with. Internalist realists have a variety of replies available to this, but I think something in this vicinity remains an important objection, and I tend to think that the best way to be this type of realist is to basically just admit that we lack even a rudimentary picture of how this kind of rational communion with the normative facts works, and how it fits with our empirical picture of human psychology and cognition.

III.

Enter our third meta-ethical view, which I’ll call “internalist anti-realism” (though note that there are lots of other forms of anti-realism). On this view, facts about what you should do are in some sense dependent on your existing patterns of care and concern, rather than being independent facts about reality itself (exactly how to tease apart realist and anti-realist views in this respect is a bit tricky — but again, I’m playing fast and loose here). The versions of anti-realism that I take most seriously here usually appeal to some kind epistemic idealization procedure — for example, they appeal to what you would care about, or what you would want yourself to do (the counterfactual here can get complicated), if you had arbitrarily full and vivid understanding of reality, the consequences of your actions, etc. Indeed, this sort of idealization procedure closely resembles the type of idealized understanding that the internalist realist also expects to result in correct normative understanding and motivations. That is, on both views, people with ideal understanding of reality end up with the right patterns of concern, and in this sense, both can capture an intuitive connection between understanding and motivation. On internalist anti-realism, though, this connection arises because the right patterns of concern just are whichever ones you would end up with given ideal understanding; whereas on internalist realism, the correct patterns of concern are fixed independently on the concerns you started out with, and ideal understanding necessarily leads you to the True Way.

A central objection to internalist anti-realism, at least in its naive forms, is that it ends up endorsing claims in the vicinity of “an ideally coherent paperclip maximizing AI system should maximize paperclips, even if it means killing Bob” or “an ideally coherent pain maximizer should maximize pain.” There are various more complicated moves you can make to try to dodge these sorts of implications — for example, you can try for theories of normative language that make it true in our mouths that “the paperclipper shouldn’t kill Bob,” but true in the paperclipper’s mouth that “I should kill Bob” — and of course, you can endorse (at least certain types of) human attempts to restrain/punish/shame/deter the paperclipper (or to try to deceive it about meta-ethics). But in a basic sense — which persists, for me, upon engagement with these dodges — a certain kind of realist hope that Clippy can be revealed as mistaken about its basic reasons for action has been lost. You can fight with Clippy; you can trade with Clippy; but it won’t be the case that in the eyes of the universe (and not just in your own eyes), Clippy cares about the wrong things, and you care about the right things. For this and other reasons, various people resist this type of anti-realism.

IV.

Simplifying somewhat, then, we can think of the three different views I’ve discussed via the following framework (though note that there are many more problems with these views than the ones I’ve discussed):

View	Does what you should depend on your current values?	Will ideal understanding always bring your values into harmony with what you should do?	What should a paperclip maximizer do?	What would the paperclip maximizer want to do on ideal reflection?	Problems discussed
1. Externalist realism	No	No	Maximize flourishing (or whatever, the thing the true normativity says)	Maximize paperclips	Loss of intuitive connection between understanding and motivation
2. Internalist realism	No	Yes	Maximize flourishing (or whatever)	Maximize flourishing (or whatever)	Hard to tell a plausible epistemic story about how normative facts exert this type of influence on your values
3. Internalist anti-realism	Yes	Yes	Maximize paperclips	Maximize paperclips	Ends up saying that Clippy should clip, or e.g. making claims like "I should clip" true in Clippy's mouth

It’s tempting, in thinking about meta-ethics, to focus primarily on the distinction between realist and anti-realist views — that is, to draw the important line between (1) and (2) on the one hand vs. (3) on the other. But I find that in my own life, it’s actually the line between (1) vs. (2) and (3) that matters most. This is because (2) and (3) both ground a certain kind of wholeheartedness in simultaneously pursuing the projects of trying to see the world truly, trying to do what you should do, and trying to protect and promote what the wisest version of yourself would care about most deeply. That is, on (2) or (3), what you should ultimately do, and what the wisest version of yourself would want you to do, are in harmony. Wise-you and normativity are always on the same side.

On (1), by contrast, this isn’t so. You might wake up, at end of all your inquiry, having understood the truth as fully as possible, and still find yourself fundamentally alienated from the values that should govern your action. You might do what you should do, but you won’t ever deeply see why it’s worth doing; the values at stake will always seem external, outside, other; you’ll never been able to endorse and act on them with full sincerity.

V.

Here’s a (strange) example that might illustrate the difference. Suppose that after you die, you find yourself in Hell, being held in a cage by demon guards. These guards inform you that you’ve done great wrongs, that it’s God’s Will that you endure this confinement for some unspecified and possibly infinite period of time, and His Will, also, that you don’t resist or try to escape.

You can’t, from your current perspective, really see that this is a just punishment. But on (2) and (3), you know that whatever it is you should do — whether it’s to endure the punishment, or to try to escape — you would endorse it wholeheartedly if you really understood the situation. Perhaps full understanding would reveal that God’s Will is in fact truly just and right, that you should be punished in this way, and that wise-you would want you to cooperate humbly with your captors. Or perhaps wise-you would understand that in fact, you and God are in a fundamentally adversarial relationship, and that what you should do is resist Him, subvert His plans, try free the other captors, etc. Either way, you can consider your options wholeheartedly, confident that your wisest self is on the side of making the right choice.

What’s ruled out by (2) and (3), but not by (1), is the possibility that (a) you should endure the punishment without trying to resist or escape, and (b) you’d never be able to comprehend why, even in the limits of wisdom and clarity. That God’s Will is right and true, but what makes it right and true will remain fundamentally and forever alien from you, the sinner, regardless of the understanding you achieve.

Of course, your meta-ethical convictions may prove cold comfort as the days in the cell wear on (though I actually think it would make a big difference to me, if I found myself in Hell, if I expected to endorse my being there on reflection). But I actually think that the difference between (1) vs. (2) and (3) is important in much more mundane cases. I’ll leave that, though, for another post.

There are also important questions about the possibility of being alienated from the wisest version of yourself, even if that version and normativity are in harmony. I’m generally optimistic, though, about finding routes to harmony, cooperation, and loyalty between you and wise-you, despite the difference in wisdom — since the two of you are ultimately, I think, on the same side (and indeed, the same person). But I’ll leave that for another post as well.

alexrattee @ 2021-01-17T22:19 (+3)

This was one of my favourite EA Forum posts in a long time, thanks for sharing it!

Externalist realism is my preferred theory. Though I think we'd probably need something like God to exist in order for humans to have epistemic access to any normative facts. I've spent a bit of time reflecting on the "they understand everything there is to understand; they have seen through to the core of reality, and reality, it turns out, is really into helium. But they, unfortunately, aren’t." style case. Of course it'd be really painful, but I think the appropriate response would be to understand the issue as one of human motivational brokenness. Something has gone wrong in my wiring which means my motivations are not properly functioning as they are out of kilter with what there is all-things-considered most reason to do, namely promote helium. That doesn't mean that I'm to blame for this mismatch. But I'd hope that I'd then push this acknowledgement of my motivational brokenness into a course of actions to see if I can get my motivations to fall in line with the normative truth.

On the hell case (which feels personally relevant as an active Christian) I think I'd take a lot of solace during my internment that this is just what there is all-things-considered most reason to happen. If my dispositions/motivations fail to fall in line, then as above they are failing to properly function and I think/hope that acknowledging this would take some of the edge off the dissonance of not being able to understand why this is a just punishment.

Joe_Carlsmith @ 2021-01-20T08:33 (+2)

Glad to hear it :)

Re: "my motivational system is broken, I'll try to fix it" as the thing to say as an externalist realist: I think this makes sense as a response. The main thing that seems weird to me is the idea that you're fundamentally "cut off" from seeing what's good about helium, even though there's nothing you don't understand about reality. But it's a weird case to imagine, and the relevant notions of "cut off" and "understanding" are tricky.

richard_ngo @ 2021-01-15T21:25 (+2)

Interesting post. I interpret the considerations you outline as forming a pretty good argument against moral realism. Partly that's because I think that there's a stronger approach to internalist anti-realism than the one you suggest. In particular, I interpret a statement that "X should do Y" as something like: "the moral theory which I endorse would tell X to do Y" (as discussed further here). And by "moral theory" I mean something like: a subset of my preferences which has certain properties, such as being concerned with how to treat others, not being specific to details of my life, being universalisable, etc. (Although the specific properties aren't that important; it's more of a familial resemblence.)

So I certainly wouldn't say that Clippy should clip. And even if Clippy says that, I don't need to agree that it's true even from Clippy's perspective. Firstly because Clippy doesn't endorse any moral theory; and secondly because endorsements aren't truth-apt. On this position, the confusion comes from saying "X should do Y" without treating it as a shorthand for "X should do Y according to Z; hooray for Z".

From the perspective of morality-as-coordination-technology, this makes a lot of sense: there's no point having moral discussions with an entity that doesn't endorse any moral theory! (Or proto-theory, or set of inchoate intuitions; these are acceptable substitutes.) But within a community of people who have sufficiently similar moral views, we still have the ability to say "X is wrong" in a way that everyone agrees with.

Joe_Carlsmith @ 2021-01-16T09:58 (+2)

Thanks for reading. Re: your version of anti-realism: is "I should create flourishing (or whatever your endorsed theory says)" in your mouth/from your perspective true, or not truth-apt?

To me Clippy's having or not having a moral theory doesn't seem very central. E.g., we can imagine versions in which Clippy (or some other human agent) is quite moralizing, non-specific, universal, etc about clipping, maximizing pain, or whatever.

richard_ngo @ 2021-01-16T12:09 (+2)

It's not truth-apt. It has a truth-apt component (that my moral theory endorses creating flourishing). But it also has a non-truth-apt component, namely "hooray my moral theory". I think this gets you a lot of the benefits of cognitivism, while also distinguishing moral talk from standard truth-apt claims about my or other people's preferences (which seems important, because agreeing that "Clippy was right when it said it should clip" feels very different from agreeing that"Clippy wants to clip").

I can see how this was confusing in the original comment; sorry about that.

I think the intuition that Clippy's position is very different from ours starts to weaken if Clippy has a moral theory. For example, at that point we might be able to reason with Clippy and say things like "well, would you want to be in pain?", etc. It may even (optimistically) be the case that properties like non-specificity and universality are strong enough that any rational agent which strongly subscribes to them will end up with a reasonable moral system. But you're right that it's somewhat non-central, in that the main thrust of my argument doesn't depend on it.