4. Why existing approaches to cause prioritization are not robust to unawareness

By Anthony DiGiovanni @ 2025-06-02T08:55 (+35)

We’re finally ready to see why unawareness so deeply undermines action guidance from impartial altruism. Let’s recollect the story thus far:

  1. First: Under unawareness, “just take the expected value” is unmotivated.
  2. Second: Likewise, “do what seems intuitively good and high-leverage, then you’ll at least do better than chance” doesn’t work. There’s too much room for imprecision, given our extremely weak evidence about the mechanisms that dominate our impact, and our track record of finding sign-flipping considerations.
  3. Third: Hence, we turned to UEV, an imprecise model of strategies’ impact under unawareness. But there are two major reasons comparisons between strategies’ UEV will be indeterminate: the severe lack of constraints on how we should model the space of possibilities we’re unaware of, and imprecision due to coarseness.

The EA community has proposed several approaches to “robust” cause prioritization, which we might think avoid these problems. These approaches don’t explicitly use the concept of UEV, and not all the thinkers who proposed them necessarily intended them to be responses to unawareness. But we can reframe them as such. I’ll outline these approaches below, and show why each of them is insufficient to justify comparisons of strategies’ UEV. Where applicable, I’ll also explain why wager arguments don’t support following these approaches. Afterwards, I’ll share some parting thoughts on where to go from here.

Why each of the standard approaches is inadequate

(See Appendix D for more formal statements of these approaches. I defer my response to cluster thinking to Appendix E, since that’s more of a meta-level perspective on prioritization than a direct response to unawareness. In this list, the “contribution” of a hypothesis  to the UEV for some strategy equals {value under } x {probability of  given that strategy}. Recall that our awareness set is the set of hypotheses we’re aware of.)

Table 3. Outline of the standard approaches to strategy comparisons under unawareness.

ApproachReasoningSources
Symmetry: The contribution of the catch-all to the UEV is equal for all strategies.Since we have no idea about the implications of possibilities we haven’t considered, we have no reason to think they’d push in one direction vs. the other, in expectation. Then by symmetry, they cancel out.

St. Jules and Soares.[1] Cf. Greaves’s (2016) discussion of “simple cluelessness”, and the principle of indifference.

Extrapolation: There’s a positive correlation between (i) the difference between strategies’ UEV and (ii) the difference between the contributions of the awareness set to their UEV.Even if our sample of hypotheses we’re aware of is biased, the biases we know of don’t suggest the difference in UEV should have the opposite sign to the difference in EV on the awareness set. And we have no reason to think the unknown biases push in one direction vs. the other, in expectation.Cotton-Barratt, and Tomasik on “Modeling unknown unknowns”.
Meta-extrapolation: Take some reference class of past decision problems where a given strategy was used. The strategy’s UEV is proportional to its empirical average EV over the problems in this reference class.If there were cases in the past where (i) certain actors were unaware of key considerations that we’re aware of, but (ii) they nonetheless took actions that look good by our current lights, then emulating these actors will be more robust to unawareness.Tomasik on “An effective altruist in 1800”, and Beckstead on “Broad approaches and past challenges”.
Simple Heuristics: The simpler the arguments that motivate a strategy, the more strongly the strategy’s UEV correlates with the EV we would have estimated had we not explicitly adjusted for unawareness.More conjunctive arguments have more points of possible failure and hence are less likely to be true, especially under deep uncertainty about the future. Then strategies supported by simple arguments or rules will tend to do better.

Christiano. This approach is sometimes instead motivated by Meta-extrapolation, or by avoiding overfitting (Thorstad and Mogensen 2020).

 

Focus on Lock-in: The more a strategy aims to prevent a near-term lock-in event, the more strongly the strategy’s UEV correlates with the EV we would have estimated had we not explicitly adjusted for unawareness.Our impacts on near-term lock-in events (e.g., AI x-risk and value lock-in) are easy to forecast without radical surprises. And it’s easy to assess the sign of large-scale persistent changes to the trajectory of the future.Tarsney (2023), Greaves and MacAskill (2021, Sec. 4), and Karnofsky on “Steering”.
Capacity-Building: The more a strategy aims to increase the influence of future agents with our values, the higher its UEV.Our more empowered (i.e., smarter, wiser, richer) successors will have a larger awareness set. So we should defer to them, by favoring strategies that increase the share of future empowered agents aligned with our values. This includes gaining resources, trying to increase civilizational wisdom, and doing research.Christiano, Greaves and MacAskill (2021, Sec. 4.4), Tomasik on “Punting to the future”, and Bostrom on “Some partial remedies”.

Symmetry

Key takeaway

We can’t assume the considerations we’re unaware of “cancel out”, because when we discover a new consideration, this assumption no longer holds.

According to this approach, we should think the catch-all affects all strategies’ UEV equally. After all, it appears we have no information either way about the strategies’ likelihood of leading to outcomes in the catch-all.

Response

Upon closer inspection, we do have infomation about the catch-all, as we saw when we tried to model it. When we update our model of the catch-all on the discovery of a new hypothesis, the symmetry breaks. Specifically:

A common intuition is, “If it really is so unclear how to weigh up these updates about the catch-all, don’t the optimistic and pessimistic updates cancel out in expectation?” But our actual epistemic state is that we have reasons pointing in both directions, which don’t seem precisely balanced. If we treated these reasons as balanced anyway, when we could instead suspend judgment, we’d introduce artificial precision for no particular reason.[2]

(How about an intuition like, “If I can’t tell whether the considerations I’m unaware of would favor  or , these considerations are irrelevant to my decision-making”? I’ll respond to this below.)

Extrapolation

Key takeaway

We can’t trust that the hypotheses we’re aware of are a representative sample. Although we don’t know the net direction of our biases, this doesn’t justify the very strong assumption that we’re precisely unbiased in expectation.

Although the post-AGI future seems to involve many unfamiliar mechanisms, perhaps we can still get some (however modest) information about an intervention’s net impact based on our awareness set. And if these considerations are a representative sample of the whole set of considerations, it would make sense to extrapolate from the awareness set. (Analogously, when evaluating coarse hypotheses, we could extrapolate from a representative sample of fine-grained scenarios we bring to mind.)

Response

Our analysis of biased sampling showed that our awareness set is likely highly non-representative, so straightforward extrapolation is unjustified. What about the “Reasoning” for Extrapolation listed in Table 3? The claim seems to be that we should regard our net bias as zero ex ante, after adjusting for any (small) biases we know how to adjust for. But here we have the same problem as with Symmetry. If it’s unclear how to weigh up arguments for our biases being optimistic vs. pessimistic, then treating them as canceling out is ad hoc. The direction of our net bias is, rather, indeterminate.

Meta-extrapolation

Key takeaway

We can’t trust that a strategy’s past success under smaller-scale unawareness is representative of how well it would promote the impartial good. The mechanisms that made a strategy work historically could actively mislead us when predicting its success on a radically unfamiliar scale.

The problem with Extrapolation was that strategies that work on the awareness set might not generalize to the catch-all. We can’t directly test how well strategies’ far-future performance generalizes, of course. But what if we use strategies that have successfully generalized under unawareness with respect to more local goals? We could look at historical (or simulated?) cases where people were unaware of considerations highly relevant to their goals (that we’re aware of), and see which strategies did best. From Tomasik:

[I]magine an effective altruist in the year 1800 trying to optimize his positive impact. … What [he] might have guessed correctly would have been the importance of world peace, philosophical reflection, positive-sum social institutions, and wisdom. Promoting those in 1800 may have been close to the best thing this person could have done, and this suggests that these may remain among the best options for us today.

The underlying model here is: We have two domains, (i) a distribution of decision problems from the past and (ii) the problem of impartially improving overall welfare. If the mechanisms governing successful problem-solving under unawareness are relevantly similar between (i) and (ii), then strategies’ relative performance on (i) is evidence about their relative performance on (ii).

Response

I agree that we get some evidence from past performance, as per this model. But that evidence seems quite weak, given the potentially large dissimilarities between the mechanisms determining local vs. cosmic-scale performance. (See footnote for comments on a secondary problem.[3])

Concretely, take Tomasik’s example. We can mechanistically explain why a 19th century EA would’ve improved the next two centuries on Earth by promoting peace and reflection (caveats in footnote[4]). On such a relatively small scale, there’s not much room for the sign of one’s impact to depend on factors as exotic as, say, the trajectory of space colonization by superintelligences or the nature of acausal trade. Obviously, the next two centuries went in some surprising directions. But the dominant mechanisms affecting net human welfare were still socioeconomic forces interacting with human biology (recall my rebuttal of the “superforecasting track records” argument). So on this scale, the consistent benefits of promoting peace and reflection aren’t that surprising. By contrast, we’ve surveyed various reasons why the dominant mechanisms affecting net welfare for all sentient beings might differ from those affecting local welfare.

We might think the above is beside the point, and reason as follows:

Weak evidence is still more informative than no evidence. As long as past success generalizes to some degree to promoting the impartial good, we can wager on Meta-extrapolation. If we’re wrong, every strategy is (close to) equally effective ex ante anyway.

Like we saw in the imprecision FAQ, however, this wager fails because of “insensitivity to mild sweetening”: When we compare strategies’ UEV, we need to weigh the evidence from past performance against other sources of evidence — namely, object-level considerations about strategies’ impact on certain variables (e.g., how they affect lock-in events). And, as shown in the third post, our estimates of such impacts should be highly imprecise. So the weak evidence from past performance, added on top of other evidence we have no idea how to weigh up, isn’t a tiebreaker.[5]

Simple Heuristics   

Key takeaway

The argument that heuristics are robust assumes we can neglect complex effects (i.e., effects beyond the “first order”), either in expectation or absolutely. But under unawareness, we have no reason to think these effects cancel out, and should expect them to matter a lot collectively.

We’ve just seen why heuristics aren’t justified by their past success. Nor are they justified by rejecting detailed world models,[6] nor by the reliability of our intuitions.

The most compelling motivation for this approach, instead, goes like this (Christiano;[7] see his post for examples): Informally, let’s say an effect of a strategy is some event that contributes to the total value of the world resulting from that strategy. E.g., speeding up technological growth might have effects like “humans in the next few years are happier due to greater wealth” and “farmed animals in the next few years suffer more due to more meat consumption”. The effects predicted by arguments with fewer logical steps are more likely. Whereas complex effects are less likely or cancel out. So strategies supported by simple reasoning will have higher UEV.

Response

I’d agree that if we want to say how widely a given effect holds across worlds we’re unaware of, logical simplicity is an unusually robust measure. But this doesn’t mean the overall contribution of complex effects to the UEV is small in expectation. More precisely, here’s my understanding of how the argument above works:

The counterargument: In order to say that the UEV is dominated by 1-step effects, we need either (for ):

  1. a reason to think the -step effects average out to some small range of values, or
  2. a sufficiently low upper bound on our total credence in all -step effects.

But all the reasons for severe imprecision due to unawareness that we’ve seen, and the counterargument to symmetric “canceling out”, undermine (1). And we’ve also seen reasons to consider our unawareness vast, suggesting that the -step effects we’re unaware of are collectively significant. This undermines (2).

Focus on Lock-in

Key takeaway

Even if we focus on near-term lock-in, we can’t control our impact on these lock-in events precisely enough, nor can we tease apart their relative value when we only picture them coarsely.

This approach is a more general version of the logic critiqued in our case study: The long-term trajectory (or existence) of Earth-originating civilization could soon be locked into an “attractor state”, such as extinction or permanent human disempowerment. Plausibly, then, we could have a large-scale, persistent, and easy-to-evaluate impact by targeting these attractor states. And these states seem foreseeable enough to reliably intervene on. So, trying to push toward better attractor states appears to score well on both outcome robustness and implementation robustness. (Which is why I think Focus on Lock-in is one of the strongest approaches to UEV comparisons, in theory.)

Response

Unfortunately, as illustrated by the case study, we seem to have severe unawareness at the level of both the relative value, and likelihood given our strategy, of different attractor states. To recap, if our candidate intervention is “try to stop AIs from permanently disempowering humans”:

Put differently: There’s a significant gap between “our ‘best guess’ is that reaching a given coarse-grained attractor would be positive” and “our concrete attempts to steer toward rather than away from this attractor will likely succeed and ‘fail gracefully’ (i.e., not land us in a worse-than-default attractor), and our guess at the sign of the intended attractor is robust”. When we account for this gap, we can’t reliably weigh up the effects that dominate the UEV.

This discussion also implies that, once again, we can’t “wager” on Focus on Lock-in. The problem is not that the future seems too chaotic for us to have any systematic impact. Instead, we’re too unaware to say whether, on net, we’re avoiding locking in something worse.

Capacity-Building

Key takeaway

This approach doesn’t help for reasons analogous to those of Focus on Lock-In.

Even if the direct interventions currently available to us aren’t robust, what about trying to put our successors in a better position to positively intervene? There seem to be two rough clusters of Capacity-Building (CB) strategies:

  1. High-footprint: Community/movement building; broad values advocacy; gaining significant influence on AGI development, deployment, or governance; aiming to improve civilizational “wisdom” (see, e.g., the conclusion of Carlsmith (2022)).
    • Argument: You should expect the value of the future to correlate with the share of future powerful agents who have your goals. This is based on both historical evidence and compelling intuitions, which don’t seem sensitive to the details of how the future plays out.
  2. Low-footprint: Saving money; selectively making connections with influential actors with very similar priorities; and doing (but only selectively sharing) research.
    • Argument: If your future self has more resources/information, they’ll presumably take actions with higher UEV than whatever you can currently do. Thus, if you can gain more resources/information, holding fixed how much you change other variables, your strategy will have higher UEV than otherwise.

Response

High-footprint. Once more, historical evidence doesn’t seem sufficient to resolve severe imprecision. Perhaps the mechanism “gaining power for our values increases the amount of optimization for our values” works at a very general level, so the disanalogies between the past and future aren’t that relevant? But the problems with Focus on Lock-in recur here, especially for implementation robustness: How much “power” “our values” stably retain into the far future is a very coarse variable (more details in footnote[8]). So, to increase this variable, we must intervene in the right directions on a complex network of mechanisms, some of which lie too far in the future to forecast reliably. Our overall impact on how empowered our values will be, therefore, ends up indeterminate when we account for systematic downsides (e.g., attention hazards, as in the case of AI x-risk movement building). Not to mention off-target effects on lock-in events. For example, even if pushing for the U.S. to win the AI race does empower liberal democratic values, this could be outweighed by increased misalignment risk.[9]

We might have the strong intuition that trying to increase some simple variable tends to increase that variable, not decrease it. Yet we’ve seen compelling reasons to distrust intuitions about our large-scale impact, even when these intuitions are justified in more familiar, local-scale problems.

Low-footprint. I’d agree that low-footprint CB could make our successors discover, or more effectively implement, interventions that are net-positive from their epistemic vantage point. That’s a not-vanishingly-unlikely upside of this strategy. But what matters is whether low-footprint CB is net-positive from our vantage point. How does this upside compare to the downsides from possibly hindering interventions that are positive ex post? (It’s not necessary for such interventions to be positive ex ante, in order for this to be a downside risk, since we’re evaluating CB from our perspective; see Appendix C.) Or other off-target large-scale effects we’re only coarsely aware of (including, again, our effects on lock-in)?

Real-world implementations of “saving money” and “doing research” may have downsides like:

Intuitively, these side effects may seem like hand-wringing nitpicks. But I think this intuition comes from mistakenly privileging intended consequences, and thinking of our future selves as perfectly coherent extensions of our current selves. We’re trying to weigh up speculative upsides and downsides to a degree of precision beyond our reach. In the face of this much epistemic fog, unintended effects could quite easily toggle the large-scale levers on the future in either direction.

Another option: Rejecting (U)EV?

Key takeaway

Suppose that when we choose between strategies, we only consider the effects we can weigh up under unawareness, because (we think) the other effects aren’t decision-relevant. Then, it seems arbitrary how we group together “effects we can weigh up”.

No matter how we slice it, we seem unable to show that the UEV of  dominates that of , for any given  and . Is there any other impartial justification we could offer for our choice of strategy?

The arguments in this sequence seem to apply just as well to any alternative to EV that still explicitly aggregates over possible worlds, such as EV with risk aversion or discounting tiny probabilities. This is because the core problem isn’t tiny probabilities of downsides, but the severe imprecision of our evaluations of outcomes.

Here’s another approach. Clifton notes that, even if we want to make choices based purely on the impartial good, perhaps we get action guidance from the following reasoning, which he calls Option 3 (paraphrasing):

I suspend judgment on whether  results in higher expected total welfare than . But on one hand,  is better than  with respect to some subset of their overall effects (e.g., donating to AMF saves lives in the near term), which gives me a reason in favor of . On the other hand, I have no clue how to compare  to  with respect to all the other effects (e.g., donating to AMF might increase or decrease x-risk; Mogensen 2020), which therefore don’t give me any reasons in favor of . So, overall, I have a reason to choose  but no reason to choose , so I should choose .[10]

In our setting, “all the other effects” might be the effects we’re only very coarsely aware of (including those in the catch-all).

I have some sympathy for Option 3 as far as it goes. Notably, insofar as Option 3 is action-guiding, it seems to support a focus on near-term welfare, which would already suggest significant changes to current EA prioritization. I’m keen for others to carefully analyze Option 3 as a fully-fledged decision theory, but for now, here are my current doubts about it:

Conclusion and taking stock of implications

Stepping back, here’s my assessment of our situation:

So, I’m not aware of any plausible argument that one strategy has better net consequences under unawareness than another. Of course, I’d be excited to see such an argument! Indeed, one practical upshot of this sequence is that the EA project needs more rethinking of its epistemic and decision-theoretic foundations. That said, I think the challenge to impartial altruist action guidance runs very deep.

We might be tempted to say, “If accounting for unawareness implies that no strategy is better than another, we might as well wager on whatever looks best when we ignore unawareness.” This misses the point.

First, as we’ve seen in this post, wager arguments don’t work when the hypotheses you set aside say your expected impact is indeterminate, rather than zero.

Second, my claim is that no strategy is better than another with respect to the impartial good, according to our current understanding of epistemics and decision theory. Even if we suspend judgment on how good or bad our actions are impartially speaking, again, we can turn to other kinds of normative standards to guide our decisions. I hope to say more on this in future writings.

Finally, whatever reason we may have to pursue the strategy that looks best ignoring unawareness, it’s not the same kind of reason that effective altruists are looking for. Ask yourself: Does “this strategy seems good when I assume away my epistemic limitations” have the deep moral urgency that drew you to EA in the first place?

“But what should we do, then?” Well, we still have reason to respect other values we hold dear — those that were never grounded purely in the impartial good in the first place. Integrity, care for those we love, and generally not being a jerk, for starters. Beyond that, my honest answer is: I don’t know. I want a clear alternative path forward as much as the next impact-driven person. And yet, I think this question, too, misses the point. What matters is that, if I’m right, we don’t have reason to favor our default path, and the first step is owning up to that. To riff on Karnofsky’s “call to vigilance”, I propose a call to reflection.

This is a sobering conclusion, but I don’t see any other epistemically plausible way to take the impartial good seriously. I’d love to be wrong.


Appendix D: Formal statements of standard approaches to UEV comparisons

Recalling the notation from Appendix B, here’s some shorthand:

Then:

  1. Symmetry: For any pair of strategies , we have  for all  in .
  2. Extrapolation: For any pair of strategies , we have for all  in .
  3. Meta-extrapolation: Consider some reference class of past problems where the decision-maker had significant unawareness, relative to a “local” value function . Let  be two categories of strategies used in these problems, and  be the average local value achieved by strategies in . Then if  is relatively similar to  and  is relatively similar to , we have  for all  in .
  4. Simple Heuristics: Let  be some measure of the complexity of the arguments that motivate . Then for any strategy , we have that  correlates more strongly with the EV we would have estimated had we not explicitly adjusted for unawareness, the lower  is (for all  in ).
  5. Focus on Lock-in: Let  be the amount of time between implementation of  and the intended impact on some target variable, and  be the amount of time the impact on the target variable persists. Then for any strategy , we have that  correlates more strongly with the EV we would have estimated had we not explicitly adjusted for unawareness, the higher  is and the lower  is (for all  in ).
  6. Capacity-Building: Let , where  represents the beliefs of some successor agent and  is the strategy this agent follows given that we follow strategy . (So  is an element of the UEV, with respect to our values and the successor’s beliefs, of whatever the successor will do given what we do.) Then  for all  in ,[12] and (we suppose) we can form a reasonable estimate . In particular, capacity-building strategies tend to have large and positive .

Appendix E: On cluster thinking

One approach to cause prioritization that’s commonly regarded as robust to unknown unknowns is cluster thinking. Very briefly, cluster thinking works like this: Take several different world models, and find the best strategy according to each model. Then, choose your strategy by aggregating the different models’ recommendations, giving more weight to models that seem less likely to be missing key parameters.

Cluster thinking consists of a complex set of claims and framings, and I think you can agree with what I’ll argue next while still endorsing cluster thinking over sequence thinking. So, for the sake of scope, I won’t give my full appraisal here. Instead, I’ll briefly comment on why I don’t buy the following arguments that cluster thinking justifies strategy comparisons under unawareness (see footnotes for supporting quotes):

  1. “We can tell which world models are more or less likely to be massively misspecified, i.e., feature lots of unawareness. So strategies that do better according to less-misspecified models are better than those that don’t. This is true even if, in absolute terms, all our models are bad.”[13]

    • Response: This merely pushes the problem back. Take the least-misspecified model we can think of. How do we compare strategies’ performance under that model? To answer that, it seems we need to look at the arguments for particular approaches to modeling strategy performance under unawareness, which I respond to in the rest of the post.

  2. “World models that recommend doing what has worked well across many situations in the past, and/or following heuristics, aren’t very sensitive to unawareness.”[14]

    • Response: I address these in my responses to Meta-extrapolation and Simple Heuristics. The same problems apply to the claim that cluster thinking itself has worked well in the past.

Appendix F: Toy example of failure of implementation robustness

Recall the attractor states defined in the case study. At a high level: Suppose you know your intervention will move much more probability mass (i) from Rogue to Benevolent, than (ii) from Rogue to Malevolent. And suppose Benevolent is better than Rogue, which is better than Malevolent (so you have outcome robustness), but your estimates of the value of each attractor are highly imprecise. Then, despite these favorable assumptions, the sign of your intervention can still be indeterminate.

More formally, let:

And suppose an intervention  shifts 10% of probability mass from Rogue to Benevolent, and 1% from Rogue to Malevolent.

Then, there’s some  in  (representing the worst-case outcome of the intervention) such that the change in UEV (written ) is:

.

So,  is not robustly positive.

References

Beckstead, Nicholas. 2013. “On the Overwhelming Importance of Shaping the Far Future.” https://doi.org/10.7282/T35M649T

Carlsmith, Joseph. 2022. “A Stranger Priority? Topics at the Outer Reaches of Effective Altruism.” University of Oxford.

Greaves, Hilary. 2016. “Cluelessness.” Proceedings of the Aristotelian Society 116 (3): 311–39.

Greaves, Hilary, and William MacAskill. 2021. “The Case for Strong Longtermism.” Global Priorities Institute Working Paper No. 5-2021, University of Oxford.

Mogensen, Andreas L. 2020. “Maximal Cluelessness.” The Philosophical Quarterly 71 (1): 141–62.

Tarsney, Christian. 2023. “The Epistemic Challenge to Longtermism.” Synthese 201 (6): 1–37.

Thorstad, David, and Andreas Mogensen. 2020. “Heuristics for Clueless Agents: How to Get Away with Ignoring What Matters Most in Ordinary Decision-Making.” Global Priorities Institute Working Paper No. 2-2020, University of Oxford.

  1. ^

     Quote (emphasis mine): “And if I expect that I have absolutely no idea what the black swans will look like but also have no reason to believe black swans will make this event any more or less likely, then even though I won't adjust my credence further, I can still increase the variance of my distribution over my future credence for this event.”

  2. ^

     See also Buhler, section “The good and bad consequences of an action we can’t estimate without judgment calls cancel each other out, such that judgment calls are unnecessary”.

  3. ^

     Assume our only way to evaluate strategies is via Meta-extrapolation. We still need to weigh up the net direction of the evidence from past performance. But how do we choose the reference class of decision problems, especially the goals they’re measured against? It’s very underdetermined what makes goals “relevantly similar”. E.g.:

    • Relative to the goal “increase medium-term human welfare”, Tomasik’s philosophical reflection strategy worked well.
    • Relative to “save people’s immortal souls”, the consequences of modern philosophy were disastrous.

    The former goal matches the naturalism of EA-style impartial altruism, but the latter goal matches its unusually high stakes. Even with a privileged reference class, our sample of past decision problems might be biased, leading to the same ambiguities noted above.

  4. ^

     Technically, we don’t know the counterfactual, and one could argue that these strategies made atrocities in the 1900s worse. See, e.g., the consequences of dictators reflecting on ambitious philosophical ideas like utopian ideologies, or the rise of factory farming thanks to civilizational stability. At any rate, farmed animal suffering is an exception that proves the rule. Once we account for a new large set of moral patients whose welfare depends on different mechanisms, the trend of making things “better” breaks.

  5. ^

    More formally: Our set of probability distributions  (in the UEV model) should include some  under which a strategy that succeeded in the past is systematically worse for the impartial good (that is, the UEV is lower) than other strategies.

  6. ^

     Recap: Adherence to a heuristic isn’t in itself an impartial altruistic justification for a strategy. We need some argument for why this heuristic leads to impartially better outcomes, despite severe unawareness.

  7. ^

     Quote: “If any of these propositions is wrong, the argument loses all force, and so they require a relatively detailed picture of the world to be accurate. The argument for the general goodness of economic progress or better information seems to be much more robust, and to apply even if our model of the future is badly wrong.”

  8. ^

     “Power”, for example, consists of factors like:

    • access to advanced AI systems (which domains is it more or less helpful for them to be “advanced” in?);
    • influence over governments and militaries (which kinds of “influence”?);
    • memetic fitness;
    • bargaining leverage, and willingness to use it;
    • capabilities for value lock-in; and
    • protections from existential risk.

    I’m not sure why we should expect that by increasing any one of these factors, we won’t have off-target effects on the other factors that could outweigh the intended upside.

  9. ^

     We might try hedging against the backfire risks we incur now by also empowering our successors to fix them in the future (cf. St. Jules). However, this doesn’t work if some downsides are irreversible by the time our successors know how to fix them, or if we fail to empower successors with our values in the first place. See also this related critique of the “option value” argument for x-risk reduction.

  10. ^

     As Clifton emphasizes, this reasoning is not equivalent to Symmetry. The claim isn’t that the net welfare in “all the other effects” is zero in expectation, only that the indeterminate balance of those effects gives us no reason for/against choosing some option.

  11. ^

     We might wonder: “Doesn’t the counterargument to Option 3 apply just as well to this approach of ‘suspend judgment on the impartial good, and act based on other normative criteria?” It’s out of scope to give a full answer to this. But my current view is, even if the boundaries between effects we are and aren’t “clueless about” in an impartial consequentialist calculus are arbitrary, the boundary between impartial consequentialist reasons for choice and non-impartial-consequentialist reasons for choice is not arbitrary. So it’s reasonable to “bracket” the former and make decisions based on the latter.

  12. ^

     We take the expectation over s’ with respect to our beliefs p, because we’re uncertain what strategy our successor will follow.

  13. ^

     Quote: “Cluster thinking uses several models of the world in parallel … and limits the weight each can carry based on robustness (by which I mean the opposite of Knightian uncertainty: the feeling that a model is robust and unlikely to be missing key parameters).”​

  14. ^

     Quote (emphasis mine): “Correcting for missed parameters and overestimated probabilities will be more likely to cause “regression to normality” (and to the predictions of other “outside views”) than the reverse. … And I believe that the sort of outside views that tend to get more weight in cluster thinking are often good predictors of “unknown unknowns.” For example, obeying common-sense morality (“ends don’t justify the means”) heuristics seems often to lead to unexpected good outcomes…”


Magnus Vinding @ 2025-06-02T14:33 (+13)

Kudos for writing up this series, Anthony! :) It's not easy to face deeply unpleasant and inconvenient (potential) conclusions, let alone to stare them down and write about them.

I have a lot of thoughts on what you write, but for now I'll share just three of them (in separate comments).

Magnus Vinding @ 2025-06-02T15:50 (+10)

Toward the very end, you write:

“But what should we do, then?” Well, we still have reason to respect other values we hold dear — those that were never grounded purely in the impartial good in the first place. Integrity, care for those we love, and generally not being a jerk, for starters. Beyond that, my honest answer is: I don’t know.

You obviously don't exclude the following, but I would strongly hope that — beyond just integrity, care for those we love, and not being a jerk — we can also at a minimum endorse a commitment to reducing overt and gratuitous suffering taking place around us, even if it might not be the single best thing we can do from the perspective of perfect impartiality across all space and time. This value seems to me to be on a similarly strong footing as the other three you mention, and it doesn't seem like it stands or falls with perfect [or otherwise very strong] cosmic impartiality. I suspect you agree with its inclusion, but I feel like it deserves emphasis in its own right.

Relatedly, in response to this:

Ask yourself: Does “this strategy seems good when I assume away my epistemic limitations” have the deep moral urgency that drew you to EA in the first place?

I would say "yes", e.g. if I replace "this strategy" with something like "reducing intense suffering around me seems good [even] when I assume away my epistemic limitations [about long-term cosmic impacts]”. That does at least carry much of the deep moral urgency that motivates me. I mean, just as I can care for those I love without needing to ground it in perfect cosmic impartiality, I can also seek to reduce the suffering of other sentient beings without needing to rely on a maximally impartial perspective.

Anthony DiGiovanni @ 2025-06-09T08:28 (+6)

Thanks for this Magnus, I have complicated thoughts on this point, hence my late reply! To some extent I'll punt this to a forthcoming Substack post, but FWIW:

As you know, relieving suffering is profoundly important to me. I'd very much like a way to make sense of this moral impulse in our situation (and I intend to reflect on how to do so).

But it's very important that the problem isn't that we don't know "the single best thing" to do. It's that if I don't ignore my effects on far-future (etc.) suffering, I have no particular reason to think I'm "relieving suffering" overall. Rather, I'm plausibly increasing or decreasing suffering elsewhere, quite drastically, and I can't say these effects cancel out in expectation. (Maybe you're thinking of "Option 3" in this section? If so, I'm curious where you disagree with my response.)

The reason suffering matters so deeply to me is the nature of suffering itself, regardless of where or when it happens — presumably you'd agree. From that perspective, and given the above, I'm not sure I understand the motivation for your view in your second paragraph. (The reasons to do various parochial things, or respect deontological constraints, aren't like this. They aren't grounded in something like "this thing out there in the world is horrible, and should be prevented wherever/whenever it is [or whoever causes it]".)

Magnus Vinding @ 2025-06-13T12:58 (+2)

Yeah, my basic point was that just as I don't think we need to ground a value like "caring for those we love" in whether it has the best consequences across all time and space, I think the same applies to many other instances of caring for and helping individuals — not just those we love.

For example, if we walk past a complete stranger who is enduring torment and is in need of urgent help, we would rightly take action to help this person, even if we cannot say whether this action reduces total suffering or otherwise improves the world overall. I think that's a reasonable practical stance, and I think the spirit of this stance applies to many ways in which we can and do benefit strangers, not just to rare emergencies.

In other words, I was just trying to say that when it comes to reasonable values aimed at helping others, I don't think it's a case of "it must be grounded in strong impartiality or bust". Descriptively, I don't think that reflects virtually anyone's actual values or revealed preferences, and I don't think it's reasonable from a prescriptive perspective either (e.g. I don't think it's reasonable or defensible to abstain from helping a tormented stranger based on cluelessness about the large-scale consequences).

Anthony DiGiovanni @ 2025-06-14T15:25 (+2)

I've replied to this in a separate Quick Take. :) (Not sure if you'd disagree with any of what I write, but I found it helpful to clarify my position. Thanks for prompting this!)

Jim Buhler @ 2025-06-25T07:28 (+5)

I've just realized that I find your objections to Clifton's Option 3 much less compelling when applied to something like the following scenario I'm making up:

Four miles away from you, there's a terrorist you want dead. A sniper is locked on his position (accounting for gravity). No one has ever hit a target from this distance. The sniper will overwhelmingly likely miss the shot because of factors (other than gravity) affecting the bullet you cannot estimate from that far (the different wind layers, the Earth's rotation, etc.). You're tempted to think "well, there's no harm in trying" expect that the terrorist is holding a kid you do not want dead and they cover exactly has much surface area the way they stand. Say your utility for hitting the target is +1trillion and -1trillion for hitting the kid, and you are an EV maximizer, and you're the one who has to tell the sniper whether to take the shot or to stand down. If you thought the sniper's shot was even a tiny bit more likely to hit the target than the kid, you would tell her to take it. But you don't think that! (Something like the principle of indifference seemingly doesn't work here.) You're (complexely) clueless and therefore indifferent between shooting and not shooting. But this was before you remembered that Emily, the sniper, told you the other day that her shoulder hurts everytime she takes a shot. You care about Emily's shoulder pain much less than you care about where the bullet ends (say utility -1 if she shoots). But well, doesn't that give you a very good Option-3-like reason to tell her to stand down?

If I take your objections to Option 3 and replace some of the words to make it apply to my above scenario, I intuitively find them almost crazy. Do you have the same feeling? Is there a relevant difference with my scenario that I'm missing? (Or maybe my strong intuition for why we should tell Emily to stand down is actually not because of Option 3 being compelling but something else?)

MichaelStJules @ 2025-06-27T00:53 (+4)

Against option 3, you write:

There are many different ways of carving up the set of “effects” according to the reasoning above, which favor different strategies. For example: I might say that I’m confident that an AMF donation saves lives, and I’m clueless about its long-term effects overall. Yet I could just as well say I’m confident that there’s some nontrivially likely possible world containing an astronomical number of happy lives, which the donation makes less likely via potentially increasing x-risk, and I’m clueless about all the other effects overall. So, at least without an argument that some decomposition of the effects is normatively privileged over others, Option 3 won’t give us much action guidance.

Wouldn't you also say that the donation makes these happy lives more likely on some elements of your representor via potentially increasing x-risk? So then they're neither made determinately better off nor determinately worse off in expectation, and we can (maybe) ignore them.

Maybe you need some account of transworld identity (or counterparts) to match these lives across possible worlds, though.

Anthony DiGiovanni @ 2025-06-27T13:52 (+2)

Maybe you need some account of transworld identity (or counterparts) to match these lives across possible worlds

That's the concern, yeah. When I said ”some nontrivially likely possible world containing an astronomical number of happy lives”, I should have said these were happy experience-moments, which (1) by definition only exist in the given possible world, and (2) seem to be the things I ultimately morally care about, not transworld persons.[1] Likewise each of the experience-moments of the lives directly saved by the AMF donation only exist in a given possible world.

  1. ^

    (Or spacetime regions.)

Magnus Vinding @ 2025-06-02T15:12 (+3)

You write the following in the first post in the sequence (I comment on it here because it relates closely to similar remarks in this post):

if my arguments hold up, our reason to work on EA causes is undermined.

This claim seems to implicitly assume that perfect impartiality [edit: or very strong forms of impartiality] across all space and time is the only reason or grounding we could have for working on EA causes. But that's hardly the case — there are countless alternative reasons or moral stances that could ground support for EA (or work on EA causes). For example, one could be normatively near-termist, or one could gradually discount the interests of beings depending on how distant they seem to be from our potential to predictably help them (and one could then still aim to be effective and indeed impartial within the parameters of those frameworks).

Of course, one might disagree with views that aren't perfectly impartial, and argue that they're implausible, but that's different from saying that they can't coherently ground work on EA causes. (And indeed, the arguments raised in this series could be viewed as lending some support to such alternative views, or at least give us reason to be more curious about them.)

Anthony DiGiovanni @ 2025-06-02T16:07 (+2)

perfect impartiality across all space and time

 

Sorry this wasn't clear — my arguments throughout this sequence don't just apply to perfect impartiality. I use "impartiality" loosely, in the sense in the first sentence of the intro: "gives moral weight to all consequences, no matter how distant". (I couldn't think of a better, non-clunky term for this.) See also footnote 4 of the first post:

“Nontrivial moral weight to distant consequences” is deliberately vague. I mean to include not only unbounded total utilitarianism, but also various bounded-yet-scope-sensitive value functions (see Karnofsky, section “Holden vs. hardcore utilitarianism”, and Ngo)

I'm happy to grant that normative neartermism is immune to my arguments. When I said "our reason to work on EA causes", I meant "the reason to work on such causes that my target audience actually endorses."

Magnus Vinding @ 2025-06-02T16:31 (+3)

I use "impartiality" loosely, in the sense in the first sentence of the intro: "gives moral weight to all consequences, no matter how distant".

Thanks for clarifying. :)

How about views that gradually discount at the normative level based on temporal distance, like  or so? They would give weight to consequences no matter how distant, and still give non-trivial weight to fairly distant consequences (by ordinary standards), yet the weight would go to zero as the distance grows. If normative neartermism is largely immune to your arguments, might such "medium-termist" views largely withstand them as well?

(FWIW, I think views of that kind might actually be reasonable, or at least deserve some weight, in terms of what one practically cares about and focuses on — in part for the very reasons you raise.)

Anthony DiGiovanni @ 2025-06-08T09:31 (+2)

I think such a view might also be immune to the problem, depends on the details. But I don't see any non-ad hoc motivation for it. Why would sentient beings' interests matter less intrinsically when those beings are more distant or harder to precisely foresee?

(I'm open to the possibility of wagering on the verdicts of this kind of view due to normative uncertainty. But different discount rates might give opposite verdicts. And seems like a subtle question when this wager becomes too Pascalian. Cf. my thoughts here.)

Magnus Vinding @ 2025-06-13T11:50 (+3)

Why would sentient beings' interests matter less intrinsically when those beings are more distant or harder to precisely foresee?

I agree with that sentiment :) But I don't think one would be committed to saying that distant beings' interests matter less intrinsically if one "practically cares/focuses" disproportionally on beings who are in some sense closer to us (e.g. as a kind of mid-level normative principle or stance). The latter view might simply reflect the fact that we inhabit a particular place in time and space, and that we can plausibly better help beings in our vicinity (e.g. the next few thousands of years) compared to those who might exist very far away (e.g. beyond a trillion years from now), without there being any sharp cut-off between our potential to help them.

FWIW, I don't think it's ad hoc or unmotivated. As an extreme example, one might consider a planet with sentient life that theoretically lies just inside our future light cone from time t_now, such that if we travelled out there today at the theoretical maximum speed, then we, or meaningful signals, could reach them just before cosmic expansion makes any further reach impossible. In theory, we could influence them, and in some sense merely wagging a finger right now has a theoretical influence on them. Yet it nevertheless seems to me quite defensible to practically disregard (or near-totally disregard, à la asymptotic discount) these effects given how remote they are (assuming a CDT framework).

Perhaps such a position can be viewed from the lens of an "applicability domain": to a first approximation, the ideal of total impartiality is plausibly "practically morally applicable" on all of Earth and on and somewhat beyond our usual timescales. And we are right to strongly endorse it at this unusually large scale (i.e. unusual relative to prevailing values). But it also seems plausible that its applicability gradually breaks down when we approach extreme values.

Indeed, bracketing off "infinite ethics shenanigans" could be seen as an implicit acknowledgment of such a de-facto breakdown or boundary in the practical scope of impartiality. After all, there is a non-zero probability of an infinite future with sentient life, even if that's not what our current cosmological models suggest (cf. Schwitzgebel's Washout Argument Against Longtermism). Thus, it seems that if we limit infinite outcomes from dominating everything, we have already set some kind of practical boundary (even if it's a practical boundary of asymptotic convergence toward zero across an in-theory infinite scope). If so, it seems that the question is to clarify the nature and scope of that practical boundary, not whether it's there or not.

One might then say that infinite ethics considerations indeed count as an additional, perhaps also devastating challenge to any form of impartial altruism. But in that case, the core objection reduces to a fairly familiar objection about problems with infinities. If we make an alternative case, in which we assume that infinities can be set aside or practically limited, then it seems we have already de facto assumed some practical boundary.

Anthony DiGiovanni @ 2025-06-17T13:20 (+3)

In theory, we could influence them, and in some sense merely wagging a finger right now has a theoretical influence on them. Yet it nevertheless seems to me quite defensible to practically disregard (or near-totally disregard, à la asymptotic discount) these effects given how remote they are

Sorry, I'm having a hard time understanding why you think this is defensible. One view you might be gesturing at is:

  1. If a given effect is not too remote, then we can model actions A and B's causal connections to that effect with relatively high precision — enough to justify the claim that A is more/less likely to result in the effect than B.
  2. If the effect is highly remote, we can't do this. (Or, alternatively, we should treat A and B as precisely equally likely to result in the effect.)
  3. Therefore, we can only systematically make a difference to effects of type (1). So only those effects are practically relevant.

But this reasoning doesn't seem to hold up for the same reasons I've given in my critiques of Option 3 and Symmetry. So I'm not sure what your actual view is yet. Can you please clarify? (Or, if the above is your view, I can try to unpack why my critiques of Option 3 and Symmetry apply just as well here.)

Magnus Vinding @ 2025-06-02T16:16 (+2)

I meant "the reason to work on such causes that my target audience actually endorses."

I suspect there are many people in your target audience who don't exclusively endorse, or strictly need to rely on, the views you critique as their reason to work on EA causes (I guess I'm among them).

Magnus Vinding @ 2025-06-02T14:39 (+2)

The first thought I have is mostly an impression or something that stood out to me: it seems to me like the word choices here sometimes don't quite reflect the point being made or the full range of views being critiqued, arguably including the strongest competing views.

For example, when talking about heuristics that are supposed to be "robust", or strategies we can "reliably intervene on", or whether we can "reliably weigh up" relevant effects, etc, it seems to me that these word choices convey something much stronger than what would necessarily be endorsed by weaker and arguably more defensible types of views that differ from yours.

After all, views different from yours might agree that no heuristics are outright robust and that we can't reliably weigh up the relevant effects, while nevertheless holding something weaker, like that some heuristics are slightly better than others in expectation, or that we in general can do meaningfully (or even just marginally) better than nothing.

Of course, I get that language is always a bit vague, and I'm sure you meant to include a very broad range of views, including very "weak" ones. But at least in terms of how I read it, the language often seemed to invoke a much stronger and more narrow set of views than necessary.

Anthony DiGiovanni @ 2025-06-02T15:10 (+2)

Thanks Magnus — I'm not sure I understand your objection or which specific language choices in the posts seem too strong, yet.

nevertheless holding something weaker, like that some heuristics are slightly better than others in expectation, or that we in general can do meaningfully (or even just marginally) better than nothing

Can you say a bit more about why you think I haven't adequately responded to this perspective, in the sequence? ("Degrees of imprecision", "The 'better than chance' argument", and my response to "Meta-extrapolation" explain why I don't think "X is slightly better than Y in expectation" makes sense here.)

Magnus Vinding @ 2025-06-02T15:18 (+2)

I should probably have made it more clear that this isn't an objection, and maybe not even much of a substantive point, but more just a remark on something that stood out to me while reading, namely that the views critiqued often seemed phrased in much stronger terms than what people with competing views would necessarily agree with.

Some of the examples that stood out were those I included in quotes above.

Anthony DiGiovanni @ 2025-06-04T07:12 (+2)

Some of the examples that stood out were those I included in quotes above

I'm still confused, sorry. E.g., "reliably" doesn't mean "perfectly", and my hope was that the surrounding context was enough to make clear what I mean. I'm not sure which alternative phrasings you'd recommend (or why you think there's a risk of misrepresentation of others' views when I've precisely spelled out, e.g., the six "standard approaches" in question).