On expected utility, part 1: Skyscrapers and madmen

By Joe_Carlsmith @ 2022-03-16T21:54 (+22)

(Cross-posted from Hands and Cities)

    Summary:

Suppose that you’re trying to do something. Maybe: get a job, or pick a restaurant, or raise money to pay medical bills.

Some people think that unless you’re messing up in silly ways, you should be acting “as if” you’re maximizing expected utility – i.e., assigning consistent, real-numbered probabilities and utilities to the possible outcomes of your actions, and picking the action with the highest expected utility (the sum, across the action’s possible outcomes, of each outcome’s utility multiplied by its probability).

But in combination with plausible ethical views, expected utility maximization (EUM) can lead to a focus on lower-probability, higher-stakes events — a focus that can be emotionally difficult. For example, faced with a chance to save someone’s live for certain, it directs you to choose a 1% chance of saving 1000 lives instead – even though this choice will probably benefit no one. And EUM says to do this even for one shot, or few shot, choices – for example, choices about your career.

Why do this? The “quick argument” – namely, that EUM maximizes actual utility, given repeated choices and independent trials – isn’t enough for few-shot cases. And neither are appeals to “collective action” across EUM-ish people with your values.

Rather, I think the strongest arguments for EUM come from a cluster of related theorems, which say something like: your choices conform to certain attractive ideals of rationality if and only if you act like an EUM-er. But the theorems themselves often go unexplained. Informal discussions typically list some set of axioms, and then state the theorem, but they leave the proof, and the basic dynamics underlying it, as a black box.

Early on in my exposure to such theorems, I found this frustrating. If I was going to make big decisions (especially one-shot decisions) using EUM as a guiding ideal, I wanted to understand its rationale more deeply. But the full proofs are often lengthy and difficult.

This series of essays is an attempt to write the type of thing I would’ve loved to read, at that point in my life.

Exactly how much support these theorems give to EUM as a normative ideal is a further question, which I don’t try to tackle comprehensively (though collectively, I find them pretty compelling). And there are lots of further issues in this vicinity I don’t discuss. Regardless of whether you embrace EUM, though, I think engaging with these theorems at a level deeper than “apparently the math says blah” can result in a more visceral and clear-headed relationship with the moves and vibes that structure EUM-ish thinking. I’ve found such engagement useful, and I hope some readers will too.

Thanks to Katja Grace, Cate Hall, Petra Kosonen, Ketan Ramakrishnan, and especially to John Wentworth for discussion.

I. Maximal skyscraper

What does an expected utility maximizer (EUM-er) do? Three things:

  1. Assigns probabilities to the possible outcomes of their actions (call this “probabilism”).
  2. Assigns real-numbered “utilities” to these outcomes, representing something like their value/preferred-ness (call this “utility-ism”).
  3. Chooses the action that yields the highest expected utility – i.e., the sum, across the action’s possible outcomes, of each outcome’s utility multiplied by its probability.

Thus, suppose you can save A one life for certain, or (B) a thousand lives with 1% chance. And suppose you value each life equally, such that saving a thousand lives is a thousand times better than saving one. EUM then says to choose B, because it saves ten lives in expectation, whereas A saves only one.

We can think of EUM via what I call a “skyscraper” model. Probabilities, recall, behave like the area of a space, like a 1×1 square (see here and here for more). Thus, in the choice above:

Utilities then behave like an extra dimension.

So each action gets a “city-scape” – that is, a probability square as the ground, and “skyscrapers” sprouting up from the different regions, with bases corresponding to the probability of the outcome in that region, and heights corresponding to that outcome’s utility. And EUM says to choose the action with the largest total volume of skyscraper. It’s a pro-housing vibe.

Thus, A is a very short skyscraper taking up the whole city. B is mostly an empty lot, but it’s got a very tall and thin skyscraper sitting in the corner. And this skyscraper is sufficiently tall that B actually has more total housing overall (the drawing above does not capture this well). So EUM chooses B.

(At times in this series, I’ll also use 2d visualizations, with probability on the horizontal axis, and utility on the vertical axis, where the expected utility is then the area of the 2d “housing.” This is often simpler and better – but I like the way the 3d picture can combine with visualizing probability as 2d, which I find generally useful in other contexts.)

II. Only one shot

OK, but: what’s supposed to be cool about this? In particular: it seems like B, here, probably does nothing. If you choose B over A, you predictably lose. At least a 1000 people die, and you could’ve saved one of them for certain (throughout this series, I’ll use the term “losing” broadly, to mean “ending up in a dis-preferred state”). And if EUM results in predictably losing, why do it?

One argument is: EUM does well in the long run. In particular: it maximizes actual utility, if you make the same choice enough times. For example, if you choose B over A a zillion times (and the 1%, in B, is e.g. a new dice-roll each time), then you’re ~guaranteed to save ~10x the lives, relative to choosing A over B instead.

This is the “quick argument.” But it’s also the “not good enough” argument. For one thing: sometimes failures are correlated. If the 1% comes from saving 1000 if the n-th through n+6th digits of pi are all odd (really, this would be .8%), and they’re not, but you never get to find out whether you’re saving people or not, you can choose B over and over, and never save anyone.

But also: suppose that you’re only choosing between A and B once. Suppose, indeed, that choosing between A and B is literally the only thing you will ever do. What then? EUM fans still say: B. But why? The quick argument is silent.

What’s more, this silence matters in real life. Consider careers. I’m in favor of applying EUM-ish reasoning to career choice. For example, I think it can be worth spending your whole career trying to prevent some form of existential catastrophe, even if the overall risk of that type of catastrophe is low, and all your work will probably end up irrelevant. The stakes are that large.

But careers are especially bad candidates for “if you repeat it enough times, you’ll eventually come out ahead.” You can only change careers so many times, and feedback about whether you’re in a “everything I’m doing is irrelevant” world doesn’t always come readily. And beyond this, you’ve only got one life – one single chance to do something in this world. You’ve got to make it count. But if you choose via EUM, then sometimes, it probably won’t.

Perhaps you say: “yes, but if everyone with your values does EUM, then you get to repeat the choice across people, rather than across time.” But this isn’t enough, either.

Pretty clearly, some other argument is needed.

III. Against “apparently the scary math says so?”

The type of argument I like best appeals to a cluster of related theorems pointing at EUM’s equivalence to satisfying various attractive constraints on rationality (often formulated as “axioms”). But I don’t like the way this argument is often left as a black box of scary math.

In particular, when I was first learning about EUM, it felt like people would often mention the theorems, without explaining how or why they work. Sometimes, they would get as far as listing and debating some set of axioms: but then they’d just move from discussing the axioms to the stating theorem, while skipping the proof. Indeed, it remains unclear to me how many fans of EUM have ever actually engaged with the proofs in question.

Perhaps you say: “But, if the proofs are valid, do you need to understand them?” And indeed, not necessarily. But I wanted to. In particular: it felt like EUM was asking a lot of me. It was asking me, for example, to predictably lose – to predictably let people die, to predictably waste my time, for the sake of … something. “Rationality?” Rationality is not an end in itself. What, then? If I was going to make few-shot, high-stakes, predictably-losing life choices on EUM-ish grounds, I wanted to get it “in my gut.” And I hoped that being able to see the flow of logic, from premises to conclusion, would help.

What’s more, absent deeper understanding, I felt some worry about internal coercion. If I left the theorems as black boxes, it felt easy to round them off to: “apparently the scary math I’ll never understand says that I have do EUM, otherwise I’m silly and bad.” And this didn’t feel like a healthy or sustainable set up – especially if EUM was going to ask me to do lots of emotionally unrewarding things. Indeed, I wonder how many people currently relate to EUM this way – and about the costs of doing so.

Plus, there was this stuff about fanaticism (i.e., cases where EUM leads to obsession with tiny probabilities of very high stakes outcomes). It felt like people liked EUM until they didn’t. Some probabilities were “pascalian,” and some weren’t. 1%, I guess, wasn’t (I do think this is right). Why not? How do you tell the difference? Hand-wave, hand-wave, we don’t know. (I, at least, still don’t know. My current best guess is something about bounded utility functions, which you maybe want anyway, but I haven’t worked it out, and I don’t expect to like it.) Bit of a red flag? Maybe not a time to just sit pretty with the definitely-fine math (see also: infinite ethics). And maybe understanding it better could shed light (indeed: the proof of the VNM theorem I’ll present explicitly assumes bounded utility functions, which don’t lead to fanaticism – but it’s easy to not know that “blah proof assumes bounded utility functions,” if you’ve never actually looked at the proof in question).

(Also: some people think these theorems are relevant, or aren’t, to whether we should expect advanced AI systems to kill us all by default – see e.g. Omohundro (2008), Yudkowsky (1, 2), Shah (2018), Ngo (2019), Grace (2021). I’m not going to delve into this topic, but I think it’s another reason to actually understand how the theorems work — and it’s part of what prompts my own interest.)

I’m hoping, here, to write the type of thing I would’ve loved to read, when I was first looking into EUM. I want to acknowledge, though, that the relevant audience might be correspondingly limited. In particular: there’s going to be a bit of (basic) math, but I’m also going to work through it slowly, informally, and with various simplifying assumptions. So I worry I’ll lose the math-phobic as soon as we hit the symbols, and bore/frustrate the math-fluent. So it goes. Hopefully, there are a few readers in my specific demographic, for whom it scratches the right itch.

I also want to acknowledge that I’m not speaking from a place of “I’ve gone through and really understood the original versions of all these proofs, including for infinite cases.” Often, indeed, the original version has been too long/intense for me. Rather, I’ve tried to find, understand, and present some relatively short and comprehensible explanation of the basic thing going on in at least some finite version of the proof. For me, at this point, it’s enough. Those who seek more depth, though, can consult the original sources, which I’ll link to (and for those who want a more comprehensive but still accessible introduction to EUM, I recommend Peterson (2017) – though he doesn’t include various of the theorems I’ll discuss).

IV. Is being an EUM-er trivial?

I want to head off one other objection before we dive in.

No one (sensible) thinks you should go around doing comprehensive, explicit EU calculations for very decision. Nor, indeed, do philosophical fans of EUM necessarily claim that ideally rational agents do this. Rather, they generally claim that ideally rational agents act like an EUM-ers. That is, if you’re an ideally rational agent with respect to some set of alternatives A, it’s possible to construct a probability assignment p and a utility function u, such that you make the same choices about A that an EUM-er with u and p would make.

But now perhaps we wonder: is constructing such a p and u trivial?

The worry, then, is that representation is too cheap. I can represent as anything as an EUM-er, or as violating EUM, if I try.

I do think there are tricky issues in this vicinity. But for a given physical system S, and a given set of alternatives A, we should take care to distinguish between:

  1. Does it make sense to think of S as having preferences over A, and if so, how do we tell what they are?
  2. For any given set of preferences over A, is it trivial to represent this set of preferences as maximizing expected utility?

(1) is hard, and I won’t tackle it here. But (2) is easy: the answer is no. For example, if your preferences over A are intransitive, they can’t be represented as maximizing expected utility: maybe you prefer puppies to flowers to mud to puppies, but there is no function u to the real numbers such that u(puppies) > u(flowers) > u(mud) > u(puppies). Indeed, this is the type of thing we learn from the theorems I’ll discuss.

The examples above get their force from not knowing how to answer (1). Presented with a given episode of real-world behavior in a physical system S, they observe that this episode is compatible, in principle, with some set of preferences over some set of alternatives, and some utility function representing those preferences, such that the episode is EU-maximizing (or not). And because they don’t know how to answer (1), they assume these preferences are viable candidates for S’s preferences.

Plausibly, though, (1) has an answer. A rock, for example, doesn’t have preferences. A twitching madman doesn’t, actually, maximally prefer a world where he twitches in exactly X way (intuition pump: if God took him to a realm “beyond the world,” and offered him a chance to create a “I twitch in exactly X way” world by pressing button A, or a “I am sane and healthy” world by pressing button B, he would not, consistently, press button A. Rather, he would twitch until he hit a random button). And presumably, the answer to (1) would allow us to evaluate whether your feelings about fruit are sensitive to their temporal location in some consistent way; or whether you do, in fact, prefer one life saved to a thousand.

But the theorems I’ll discuss aren’t about (1). They assume that you are a preference-haver, and that you are trying to define preferences over some set of alternatives. And they tell you that if and only if you do this in XYZ attractive ways, then you’re representable as an EUM-er. And doing it in XYZ ways is not trivial at all. You can’t just randomize, or flail around. Rocks will have trouble. Madmen, too.

What’s more, in the real world, excuses that appeal to (1)-ish ambiguities seem like cold comfort – at least when I try to apply them to myself. Suppose I choose A over B above. I can say, if I want, that actually, I am an EUM-er; I just value one life saved more than a thousand. But do I? Or suppose I get money-pumped. I can say, if I want, that I actually just like it when fruit changes hands. But do I?

Maybe, if I say these things, I don’t have to label myself “irrational,” or my choice a “mistake.” Maybe I’ve blocked some sort of abstract insult. But if my values aren’t actually like this, am I coming out ahead? This was supposed to be about lives, or fruit. Who cares about abstract insults? That’s not the bullying to worry about. And anyway, if it’s not, actually, me who is EUM-ish, but rather some forced re-interpretation, am I, perhaps, still insulted?

Granted, figuring out your “true values” is tough and ambiguous stuff (see here for trickiness); and figuring out the “true values” of some other physical system, even more so. But even absent some nice theory, we shouldn’t mistake this trickiness for “anything goes,” or confuse it with the (false) claim that all preference relations over alternatives are representable as EU-maximizing.

In the next post, I start diving into substantive arguments in favor of EUM, starting with choices like A vs. B above.


MaxRa @ 2022-03-17T15:18 (+2)

Thanks, I haven't come to grips with this idea, yet, looking forward to the next posts! :)  

The way I came to think about this is to visualize sets of future worlds that I'm choosing between. For example imagine I'd be choosing between two options that I think result in

A) a 10% chance of saving the lives of 100 people, and

B) a 99% chance of saving the live of 1 random person.

Then I would imagine choosing between 

A) 100[1] worlds in which 10 worlds will have 100 fewer deaths, and 

B) 100 worlds in which 99 of them have 1 fewer death. 

Then I'd choose A) because EUM and think something something multiverse and imagine that there actually exist this big set of post-decision worlds and that I expect them to be actually made up by 10% with worlds where 100 lives were saved. I should feel sad in whichever of these worlds I end up in because why would it matter in which of those I personally end up in? And I should feel a little extra sad if I end up in a world that didn't save 100 lives because it's a small update that the fraction of good worlds was smaller than 10%.

  1. ^

    choosing n=100 is theoretically arbitrary, but in practice probably the easiest to think about while still capturing a lot of different worlds. Maybe this relates to the point of Pascal's mugging: When it's not even one world in 100, or 1000, most people should be wary of acting on such speculations because they won't be able to evaluate the world's likelihood or how such a rare world will concretely look like.