Whose expected value? Making sense of EA epistemology with Ideal Reflection

By JesseClifton @ 2025-12-21T16:58 (+47)

Cross-posted from my Substack.

Summary. I state the “Ideal Reflection” principle, which says that our beliefs should match our expectations of what an ideal version of ourselves would believe. I argue that Ideal Reflection can help rationalize how Effective Altruists use the concept of expected value, and that whether we accept Ideal Reflection has important real-world implications.

Here’s an aspect of Effective Altruist discourse that has confused me for the longest time: EAs will often talk about “the” expected value of an intervention, or otherwise talk in a way that suggests that expected value is an objective feature of the world. But this clashes with the usual Bayesian story, where expected values are a matter of subjective belief.

For example, the 80,000 Hours page on expected value says (my emphasis):

Making explicit estimates of expected value is sometimes useful as a method — such as when you’re comparing two global health interventions — but it’s often better to look for useful rules of thumb and robust arguments, or even use gut intuitions and snap decisions to save time… little of our work at 80,000 Hours involves explicit or quantitative estimates of the impact of different careers. Rather, we focus on finding good proxies for expected value, such as the ‘importance, neglectedness, and tractability’ framework for comparing problems, or the concepts of leverage and career capital.

Not only does this passage talk about expected value as if it is objective, it talks about expected value estimates sometimes being “useful”. “Useful” by what standard? Does “useful” mean… “leads to greater expected value”? If so, how do we know, since it seems like we’re saying we don’t know “the” expected value? If not, what is the standard of usefulness? What is going on?

One way of making sense of this is to bring in an ideal version of ourselves. “The” expected value is the expected value that would be assigned by an agent that has the same evidence as us, but is a vastly more powerful reasoner. Something like a perfect Bayesian who can reason over a ~maximally granular and exhaustive set of hypotheses, and has seen everything we’ve seen. So, “the” expected values are still subjective (they’re the ideal agent’s subjective expectations), in line with standard Bayesianism. At the same time, they’re a kind of objective fact, since we are imagining that there is some objective fact of the matter about what an ideal version of ourselves would believe.

That doesn’t get us far. How can the ideal agent’s expectations, unknown to us, be action-guiding? You could say we should try to “approximate” the ideal. But what notion of “closeness” to the ideal are we talking about? Why is that notion of closeness normatively relevant? And in what sense are we justified in believing that any particular strategy actually gets us “closer” to the ideal agent? (See also Violet Hour on “approximating” the ideal.)

Here’s what I think one should say. We can at least form beliefs about what the ideal agent would think. We can say, for example, “It really seems to me like this intervention is great, but when I think about all the different considerations the ideal agent will have thought of, I’m pretty unsure what they would think.” Then we can require that our expectations should match our best guesses about the ideal agent’s expectations.

We don’t want to assume that people can always come up with numerical expected values, though. So we will just assume that people can make comparative expected value judgements, like “I expect this intervention to be better than that one”. This is in line with how EAs talk about expected value, acknowledging we often can’t come up with all-things-considered numerical values but still making comparative judgements like “The expected sign of so-and-so is positive” (example).^[1]

Our principle is then:

Ideal Reflection. Our expected values should match our expected values of the ideal agent’s expected values. More precisely: Let X and Y be unknown quantities (like total welfare under two different interventions), and let E*[X] and E*[Y] be the ideal agent’s (unknown) expectations for these quantities. Then, our expectation for X is greater than our expectation for Y if and only if our expectation for E*[X] is greater than our expectation for E*[Y].^[2]

In epistemology, “reflection” principles say that we should defer, in some sense, to other versions of ourselves. The original reflection principle says that, if we know our future self would have some belief, we should adopt it too. The “Awareness Reflection” principle says that we should defer to a version of ourselves that has conceived (is “aware”) of more possibilities than we have (Steele and Stefánsson 2021, chap. 7). Ideal Reflection takes this to its natural conclusion: While we’re deferring to an epistemically superior version of ourselves, how about deferring to the most epistemically superior version?

I’m not the first to consider deferring to some kind of epistemic ideal, of course. After a quick look through the philosophy literature I couldn’t find the exact same formulation, but Christensen’s (2010) Rational Reflection and Spohn’s (2023) Full Reflection are closely related. And in their excellent “Effective Altruism’s Implicit Epistemology”, Violet Hour discusses a very similar potential rationalization of EA practice.

Returning to the exegesis of the 80K quote above: We have to somehow form beliefs about what the ideal agent would believe. Sometimes forming these beliefs can involve doing explicit expected value calculations in simple models (“expected value estimates”), and sometimes it can involve thinking about the kinds of “proxies” the 80K quote mentions. When 80K says that explicit expected value estimates can be “useful”, they are saying that we can sometimes take them as informative to our all-things-considered beliefs about the ideal agent’s expectations.

It’s plausible to me that we should satisfy Ideal Reflection. Even if people aren’t thinking in terms of Ideal Reflection, I think that they probably should be! But so what? We’ve got a clearer way of understanding EA talk about “the” expected values of different interventions. Who cares? What does it matter to what we do in practice?

I think it might matter a lot. Here’s a quick survey of questions Ideal Reflection raises, some of which I’ll likely flesh out in later posts:

It’s not so easy to make sense of what “the ideal” agent could even mean! Among other things, we would need to specify “the ideal” through some sort of reflection process (e.g., for reflecting on normative principles for choosing priors, updating on self-locating evidence, etc). And, as Joe Carlsmith argues, it is likely that there is not a unique privileged reflection process.
- If the most coherent version of EA epistemology and decision theory is centrally about forming beliefs about an ideal agent, but this concept doesn’t make any sense, then something has to give.
- And even if indeterminacy in “the ideal” is acceptable it may have important implications, such as making our own expectations more indeterminate, pushing towards cluelessness.
If we reject Ideal Reflection, what does that imply? What do Effective Altruists mean by expected value-talk? Is it coherent? And aren’t we going to be able to imagine some hypothetical agents to whom we want to defer, even if not the “ideal”? Which ones are they, exactly, and what follows as to what we should believe?
Ideal Reflection gives us a clear practical recommendation: When forming beliefs, don’t just ask yourself “What do I think about this?”, ask yourself, “What do I think the ideal agent, who reasons over vastly more hypotheses than we, may think in entirely different ontologies, etc, would believe about this?” If people did this consistently I think they might end up way more uncertain about the net effects of interventions.
- In fact I think it’s a good argument that we should have highly indeterminate beliefs and therefore be clueless, for basically the reasons given here.
If Ideal Reflection is right, it should probably inform how we design AI systems to reason about philosophically thorny topics, which could be relevant to making sure they don’t go off the rails in various ways. For example, Ideal Reflection is closely related to Anthony’s discussion of “(really) open-minded updatelessness” here.

References

Christensen, David. 2010. “Rational Reflection.” Philosophical Perspectives. A Supplement to Nous 24 (1): 121–140.

Spohn, Wolfgang. 2023. “A Generalization of the Reflection Principle.” Dialectica 77 (1): 73–96.

Steele, K., and H. O. Stefánsson. 2021. Beyond Uncertainty: Reasoning with Unknown Possibilities.

^{^}
Still, it’s a bit mysterious what these comparative judgements are. What makes them comparative expectations (i.e., an appropriate adaptation of the usual notion of “expected value”), rather than comparative medians or modes or whatever? For starters, it seems like comparative expectations should result from some kind of “weighing up” of the plausibility times magnitude of different outcomes, even if crude. But I don’t have a full account to give.
^{^}
This looks kind of like the Law of Total Expectation, according to which a perfectly Bayesian agent’s expectations will equal the expected value of their expectations given some future evidence, E[X] = E[E[X | Future Evidence]]. But it isn’t the same thing! Here, we are uncertain about what we would believe if we were to become arbitrarily good reasoners, holding our evidence fixed.

Ben_West🔸 @ 2025-12-23T08:54 (+5)

I thought this was an interesting point, thanks for writing it.

Charlie_Guthmann @ 2025-12-23T04:32 (+2)

Thanks for the post — There is definitely a certain fuzziness at times about value claims in the movement and I have been critical at times of similar things. Also chatgpt edited this but (nearly) all thoughts are my own hope that's ok!

I see a few threads here that are easy to blur:

1) Metaethics (realism vs anti-realism) is mostly orthogonal to Ideal Reflection.
You can be a realist or anti-realist and still endorse (or reject) a norm like “defer to what an idealized version of you would believe, holding evidence fixed.” Ideal Reflection doesn’t have to claim there’s a stance-independent EV “out there”; it can be a procedural claim about which internal standpoint is authoritative (idealized deliberation vs current snap judgment), and about how to talk when you’re trying to approximate that standpoint. I'm not saying you claimed the opposite exactly but language was a bit confusing to me at times.

2) Ideal Reflection is a metanormative framework; EA is basically a practical operationalization of it.
Ideal Reflection by itself is extremely broad. But on its own it doesn’t tell you what you value, and it doesn’t even guarantee that you can map possible world-histories to an ordinal ranking. It might seem less hand-wavy but its lack of assumptions makes it hard to see what non trivial claims can follow. Once you add enough axioms/structure to make action-guiding comparisons possible (some consequentialist-ish evaluative ranking, plus willingness to act under uncertainty), then you can start building “upward” from reflection to action.

It also seems to me (and is part of what makes EA distinctive) that EA ecosystem was built by unusually self-reflective people — sometimes to a fault — who tried hard to notice when they were rationalizing, to systematize their uncertainty, and to actually let arguments change their minds.

On that picture, EA is a specifc operationalization/instance of Ideal Reflection for agents who (a) accept some ranking over world-states/world-histories, and (b) want scalable, uncertainty-aware guidance about what to do next.

3) But this mainly helps with the “upward” direction; it doesn’t make the “downward” direction easier.
I think of philosophy as stacked layers: at the bottom are the rules of the game; at the top is “what should I do next.” EA (and the surrounding thought infrastructure) clarifies many paths upward once you’ve committed to enough structure to compare outcomes. But it’s not obvious why building effective machinery for action gives us privileged access to bedrock foundations. People have been trying to “go down” for a long time. So in practice a lot of EAs seem to do something like: “axiomatize enough to move, then keep climbing,” with occasional hops between layers when the cracks become salient.

4) At the community level, there’s a coordination story that explains the quasi-objective EV rhetoric and the sensitivity to hidden axioms.
Even among “utilitarians,” the shape of the value function can differ a lot — and the best next action can be extremely sensitive to those details (population ethics, welfare weights across species, s-risk vs x-risk prioritization, etc.). Full transparency about deep disagreements can threaten cohesion, so the community ends up facilitating a kind of moral trade: we coordinate around shared methods and mid-level abstractions, and we get the benefits of specialization and shared infrastructure, even without deep convergence.

It's true - Institutionally, founder effects + decentralization + concentrated resources (in a world with billionaires) create path dependence: once people find a lane and drive — building an org, a research agenda, a funding pipeline — they implicitly assume a set of rules and commit resources accordingly. As the work becomes more specific, certain foundational assumptions become increasingly salient, and it’s easy for implicit axioms to harden and complexify over time. To some extent you can say that is what happened, although on the object level it feels like we have picked pretty good stuff to work on in my view. And charitably, when 80k writes fuzzy definitions of the good, it isn't necessarily that the employees and org don't have more specific values, it's that they think its better to leave it at the level of abstraction to build the best coalition right now. And also that they are trying to help you build up from what you have to making a decision.