Rethinking the Value of Working on AI Safety

By JohanEA @ 2025-01-09T14:15 (+47)

Summary

EAs seeking to maximize their expected impact should reconsider working on AI safety directly.
I am not arguing that working on AI safety is bad, but that we don’t know how effective working in this area might be.
Adopting a worldview diversification strategy might offer a more robust approach to address this issue, especially for early-career individuals.
In addition to diversifying on a community level, there are two promising pathways to achieve this: pursuing a meta career and earning to give.

Disclaimer: This post challenges widely held beliefs in EA and may feel unsettling. If you're currently facing personal difficulties, consider reading it at another time. The reason I am sharing this post is because I think it might save you many hours, help you realize your mission as an Effective Altruist more effectively, and because I wish someone had shared this with me before I started working on AI safety.

Introduction

If you are reading this, it is likely that you subscribe to the core principles of EA and that you have spent a decent amount of your time figuring out how to use your career to maximise your expected impact. Figuring out how to do that isn’t easy and it takes agency, an open mind and the courage to think again. Thank you for being willing to do this.

Like me, you likely have ended up concluding that working on AI safety is one of the best ways to do the most good. Your thought process might have been quite similar to mine:

Since the future could be so vast, we ought to put a large amount of resources into ensuring that it goes well. This future should be filled with many happy beings who are free of unnecessary suffering. Also, we could create a radically better state for humanity and other sentient beings if we only have enough time.
If we go extinct in the process, we lose our entire potential and everything the universe could have become.
Hence, we ought to buy ourselves enough time to figure things out and uphold our potential.
As a next step, you likely thought about which cause to work on to contribute to this goal. Biosecurity, climate change, AI safety, great power conflict.
It turns out that I considered it most likely for humanity to be eradicated through a rogue ASI. This is because there is a large difference between 99% of humanity dying through a nuclear disaster and 100% through a rogue AI. With 1% or 0.1% surviving, our potential isn't destroyed. And then there is still the chance of a suffering risk.
Hence, from an expected impact maximization perspective, working on AI safety, with the aim of reducing existential risk or preventing an s-risk, seems like the single best option.

Throughout the last two years of having only worked on AI safety, and having thought a lot about it, I have updated toward not being certain about this at all anymore. In fact, I think there are good reasons to believe that reducing x-risk might be actively harmful.

There are two reasons that I want to focus on to illustrate why the expected value of working on AI safety isn’t obviously good.

1. Complex Cluelessness

The first major source of doubt, even for those who do believe in the moral salience of preserving humanity, is the sheer complexity of the future. Accurately predicting downstream consequences of actions that attempt to positively steer the long-term future is notoriously difficult, if not impossible. An initiative that seems beneficial today can have wild, unintended ramifications. If you help ensure humanity’s survival and contribute to reducing existential risk, you may also inadvertently help proliferate factory farming deep into the universe, environmental devastation across multiple planets, or even countless digital simulations that generate enormous suffering. One concrete example is Open Philanthropy's initial decision to support OpenAI with millions of dollars, which may have inadvertently increased the risk of extinction by laying the foundation for a now-soon-to-be-for-profit company, with unclear moral alignment. This is just one example that shows that it is often very difficult to predict the influence of your actions, even in the relative short-term.

For a more detailed critique of x-risk reduction efforts being positive in expectation, I recommend reading this post. If you want to learn more about complex cluelessness, you can take a look at this page.

2. Moral Uncertainty

The second significant reason for doubting the value of AI safety and interventions aimed at reducing existential risk is moral uncertainty. What if the moral framework justifying the preservation of humanity’s future at any cost is not the correct one? There are many moral frameworks, and it seems unreasonable to categorically dismiss certain perspectives simply because they "don’t resonate."

Even within utilitarianism, the divergence between standard and negative utilitarian perspectives is particularly relevant. Traditional utilitarianism balances the creation of happiness with the reduction of suffering, while negative utilitarianism exclusively prioritizes reducing suffering. This distinction can lead to vastly different conclusions.

Consider this: today, an estimated 100 billion animals are slaughtered every year, the vast majority in factory farms. If these beings experience pain and suffering, it’s plausible that, should humanity colonize space and pursue protein for a "balanced diet," we could perpetuate animal suffering on an astronomical scale.

You might argue that humanity will expand its moral circle or develop cost-effective lab-grown meat. However, even if the chance of this not happening is as low as 10%, and you give some weight to utilitarianism, the expected value of AI safety’s contribution to reducing existential risk still isn’t clear. And it might even be extremely negative.

Here is a calculation conducted by OpenAI o1 to provide an example of how you might arrive at the conclusion that, both under classical and negative utilitarianism, space colonisation could be seen negatively. This does not even account for other worldviews or the argument that creating happy beings might not outweigh the negative experiences present in the universe (elaboration).

How a worldview diversification strategy can address this issue

To be clear, I am not arguing that AI safety is definitively bad. It might turn out to be the single most valuable pursuit ever. However, I am arguing that we simply don’t know, given moral uncertainty and complex cluelessness. A broad outcome distribution indicates that working on AI safety could bring extremely positive outcomes (e.g., preventing extinction and enabling flourishing) or extremely negative outcomes (e.g., perpetuating astronomical suffering), but we currently lack reliable estimates of how probable each scenario is.

We’re unsure how large each tail is. We’re also unsure how to weigh outcomes across different moral frameworks—for instance, how to compare a small probability of vast suffering against a larger probability of moderate flourishing.

The existence of many plausible yet conflicting worldviews and the complexity of future causal chains mean that any “best guess” at the EV is laden with extreme uncertainties.

Using the INTP framework (importance, neglectedness, tractability, and personal fit), this essentially means reconsidering how we evaluate causes to work on. Specifically, when selecting a cause—particularly in the context of longtermist causes—we should place significantly less emphasis on the “importance” domain due to moral uncertainty. Similarly, we should de-emphasize the “tractability” domain because of complex cluelessness.

In light of this uncertainty, a promising strategy to hedge against it is a worldview diversification strategy (WDS). A WDS means allocating resources—time, money, influence—across multiple plausible causes and moral frameworks, proportional to how likely you think each framework might be correct and how large the stakes appear.

WDS seems to maximize expected impact by balancing the risks of over-committing to a single worldview against the benefits of spreading resources across multiple plausible frameworks, thereby ensuring that we don't overlook substantial potential value from worldviews we may be underestimating.

A section in Holden Karnofsky’s post alludes to this quite well:

"When accounting for strong uncertainty and diminishing returns, worldview diversification can maximize expected value even when one worldview looks “better” than the others in expectation. One way of putting this is that if we were choosing between 10 worldviews, and one were 5x as good as the other nine, investing all our resources in that one would – at the relevant margin, due to the “diminishing returns” point – be worse than spreading across the ten."

What are the practical implications?

Logistically, there are three main ways to pursue a diversification strategy with your career:

1. Meta career

This involves working on interventions that enhance the overall effectiveness of the EA community or other high-impact initiatives. Examples include: Global priorities research, EA community building, Improving philanthropic decision-making, Consulting to boost the productivity of organizations, and more.

2. Earning to give

This means pursuing a job with high-earning potential and donating a significant portion of your income to fund impactful work. By doing so, you can directly support critical projects and individuals working on high-priority issues.

3. Specializing in a specific cause

This option involves, ironically, committing to a particular cause, such as AI safety. While this doesn’t diversify your individual efforts, you contribute to the broader community-level diversification strategy pursued across the EA movement. This approach combines the advantages of having specialized experts in each area with the broad coverage needed to navigate uncertainty.

Which of these paths should you pursue?

Right from the start, I want to emphasize that this is not intended to be career advice. Instead, it aims to help you explore different possibilities and considerations in your cause prioritization while encouraging you to think through various perspectives.

I highly recommend forming your own view and treating this as supplementary material, since this list is certainly not exhaustive.

Meta Career

While every pathway is unique, the general idea behind a meta career is to work on interventions that increase the resources available for more effective diversification or cause prioritization, either globally or within the EA community.

Pros

You don’t need to commit to a specific set of worldviews, allowing you to diversify your impact across multiple cause areas. This helps address challenges like moral uncertainty and complex cluelessness.
Even beyond EA, you might end up supporting projects or organizations outside the typical EA canon.
You are likely to work with value-aligned people, which can be personally rewarding.

Cons

You have less control over how your resources are being diversified and may struggle to account for varying credences in different worldviews.
Large coordination or meta initiatives run the risk of creating single points of failure—for instance, if the entire movement’s strategic direction is steered by just a few meta organizations or research agendas.
If you don’t consider extinction through AGI as a likely scenario, some meta pathways may be significantly less impactful due to automation risks (e.g., Global Priorities Research may be more at risk of automation than EA Community Building).

Specialising in a specific cause (using AI safety as an example^[1])

Pros

You can build in-depth expertise and become a key knowledge resource in the field, potentially enabling you to have a very large impact.
Working on globally neglected areas, like AI safety, involves addressing issues of potentially enormous importance. There are many plausible scenarios where this work makes sense. Even with low tractability, the EV often remains high due to the large stakes and the worldview bet involved.
Collaborating with value-aligned individuals in a community with shared worldviews provides a sense of belonging and the opportunity to contribute to something larger than yourself.
You can more easily impress people in your social circles by tackling a global problem with big consequences.

Cons

Committing to a specific worldview means that once you’ve built substantial specialized capital (credentials, networks, reputation), it can become psychologically and professionally challenging to pivot away, even if your views about its importance or tractability change.
By committing to a specific pathway, you are more susceptible to confirmation bias and the sunk-cost fallacy when encountering new evidence that should prompt an update. This can make it harder—similar to what happened with Yann LeCun—to reconsider or move away from the cause.
It is generally very difficult to understand whether you are even contributing to the objectives of x-risk reduction, or whether you are making things worse (Open Philanthropy example) ^[2].
Your mental health could suffer from losing hope in the prospects of AI safety or feeling paralyzed by uncertainty about what to work on. This, for obvious reasons, may reduce your effectiveness compared to pursuing meta work or earning to give.

When you should consider choosing / continuing this path:

You believe it makes sense to place a bet on worldviews where AI safety and x-risk reduction are good in expectation.
Short timelines and an established position (or strong personal fit for this cause) make the cost of pivoting too high.
You think that in scenarios where AGI is successfully aligned and scaled to safe superintelligence, your ability to influence outcomes will be minimal, or that global problems will be resolved very quickly (hence, reducing the need for earning to give).

Earning to give

Pros

You can easily adjust your giving proportions over time as new evidence emerges and the world changes. This allows you to fund work you are most confident in or to support organizations that specialize in cause prioritization, which can allocate your donations effectively ^[3].
A wide variety of high-income career options increases your chances of finding a path that you can excel at.
In worlds where there isn’t a fast takeoff from AGI to ASI (and you have the necessary capital), you might contribute more effectively to direct causes like AI safety by outsourcing work to highly intelligent AI agents.

Cons

Personal wealth accumulation may lead to value drift, reducing the likelihood of donating significant amounts or maintaining commitment to giving.
If you have short AGI timelines and a high p(doom), starting with earning to give is unlikely to impact cause areas in the short term unless you already hold a high-paying job.
You are less likely to work alongside value-aligned people with shared aspirations of "doing good better," which can be demotivating.
In worlds where AGI and ASI are successfully aligned, large donations may have little impact because global problems might be solved rapidly.

When you should consider choosing this path:

You are highly uncertain about the value of specific cause areas and don’t find it appealing to contribute to diversification at a community level (i.e., working on something when you're uncertain about its impact).
You believe you can diversify more effectively by donating to organizations like Open Philanthropy, which can allocate funds on your behalf, especially given the challenges of moral uncertainty.
You have medium to long-term AGI timelines or a low p(doom), or you’re uncertain about the timelines and risks altogether.
You have a strong personal fit for a high-earning career pathway and feel more motivated to pursue this compared to working on direct causes or meta-level interventions.

An important consideration

No path is guaranteed to have a positive impact, and each has their downsides. There does not seem to be a clear winner among these three options. Much depends on your personal fit and how well you think you could excel in a given pathway.

Having said that, I do want to share one possibility that seems quite plausible to me.

Progress in AI shows no signs of slowing down or hitting a wall. In fact, it’s highly plausible that progress will accelerate with the new paradigm of scaling inference-time compute. In just a few years, we will (in my estimation) see AI systems capable of performing nearly any form of intellectual labor more effectively than humans. Assuming humanity survives and remains in control, real-world results will likely be achieved more effectively through capital and financial resources rather than trying to compete with hyperintelligent AI systems. Please refer to this post for an elaboration on this point.^[4] This would be a strong argument in favour of pursuing earning to give.

Final Reflections

AI safety is often seen as a top priority in the EA community, but its expected value is far from certain due to moral uncertainty and complex cluelessness. While working on AI safety contributes to the community’s broader diversification strategy, other pathways—such as specialising in more neglected causes, meta careers or earning to give—may offer equally or even more promising ways to increase your impact.

Your personal fit plays a crucial role in determining the best path for you and given the arguments laid out in this post, I encourage you to weigh these more heavily as you might have done before.

Cause prioritisation isn’t easy, and I bet that I haven’t made it easier with this post. However, being an EA isn’t easy, and my goal wasn’t to provide easy answers but to provoke deeper reflection and encourage a more robust exploration of these ideas. By questioning common assumptions I hope we can collectively navigate the uncertainties and complexities to make better-informed decisions about where to focus our efforts.

I invite you to critically engage with these reflections and share your thoughts. I would love to explore together with you how we can increase our impact in the face of profound moral uncertainty and cluelessness.

Thank you for reading! I am very grateful to be part of a community that takes the implications of ideas seriously.

Invitation to a Discussion Session

Because this is a nuanced topic—one with truly high stakes—I am thinking of organising at least one online discussion where we can explore these ideas together. Anyone is welcome, whether you’re convinced, on the fence, or firmly opposed.

If you would like to be notified about this, please share with me your email address here https://forms.gle/hKv6A9jDS8dSiRrk8

Acknowledgements

I am grateful to Jan Wehner, Leon Lang, Josh Thorsteinson and Jelle Donders for providing thoughtful and critical feedback on this post. Their insights were very valuable in challenging my ideas and helping me refine my writing. While their feedback played a significant role in improving this work, it’s important to note that they may not endorse the views expressed here. Any remaining errors or shortcomings are entirely my own.

^{^}
A general note for people still choosing their career path:
If you accept the arguments of moral uncertainty and complex cluelessness, and you recognize that the overall importance of AI safety remains unclear after considering these points, relying solely on the community diversification strategy to justify continuing work in AI safety becomes less compelling. The "importance" domain should carry less weight in your decision-making, while neglectedness and personal fit should be given greater emphasis. Assuming an equal level of personal fit, this could lead you to favor pivoting toward areas that are more neglected or where progress is easier to achieve. Within the EA community, AI safety already receives substantial attention and resources.
^{^}
However, the argument for diversification at a broader community level also applies within AI safety, offering greater assurance that your work contributes to the overall diversification of the AI safety community, even if you can't be sure about how good the specific thing is you are working on.
^{^}
The development of AGI will likely introduce many new crucial new considerations that render current diversification efforts within the EA community less effective. In such scenarios, you could have a greater impact by funding neglected and important work or alternative approaches to a cause.
^{^}
A counterargument to this reasoning is that it is highly uncertain whether you would be able to significantly influence society post-AGI, even with substantial capital. Speculating about the state of affairs once AGI is developed is inherently difficult. Furthermore, if we succeed in building aligned ASI, it is unclear to what extent humans would be needed to solve global problems, as ASI might handle these tasks far more effectively.

Jim Buhler @ 2025-01-10T09:26 (+9)

Do you think that AI safety is i) at least a bit good in expectation (but like with a determinate credence barely higher than 50% because high risk/uncertainty) or ii) you don't have determinate credences and feel clueless/agnostic about this? I feel like your post implicitly keeps jumping back and forth between these two positions, and only (i) could support your conclusions. If we assume (ii), everything falls apart. There's no reason to support a cause X (or the exact opposite of X) to any degree if one is totally clueless about whether it is good.

Thanks for writing this :)

Johan de Kock @ 2025-01-10T15:19 (+5)

One of the reasons I wrote this post was to reflect on excellent comments like yours. Thank you for posting and spotting this inconsistency!

You rightly point out that I jump between i) and ii). The short answer is that, at least for AI safety, I feel clueless or agnostic about whether this cause is positive in expectation. @Mo Putera summarised this nicely in their comment. I am happy to expand on the reasons as to why I think that.

What is your perspective here? If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion? I know you have been also doing some in-depth thinking on the topic of cluelessness.

Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.

Although, to be fair, this is unclear as well if one is also clueless about different moral frameworks. For example, helping a young child who fell off their skateboard might seem altruistic but could inadvertently increase their ambition, leading them to become the next Hitler or a power-seeking tech CEO. And to take this to the next level: not taking an action also has downsides (e.g not addressing the ongoing suffering in the world). Yaay!

If conclusion ii) is correct for all causes, altruism would indeed seem not possible from a consequentialist perspective. I don’t have a counterargument at the moment.

I would love to hear your thoughts on this!

Thank you for engaging :)

Jim Buhler @ 2025-01-10T21:17 (+10)

If you do have a determinate credence above 50% for AI safety work, how do you arrive at this conclusion?

It happens that I do not. But I would if I believed there was evidence robust to unknown unknowns in favor of assuming "AI Safety work" is good, factoring in all the possible consequences from now until the end of time. This would require robust reasons to believe that current AI safety work actually increases rather than decreases safety overall AND that increased safety is actually good all things considered (e.g., that human disempowerment is actually bad overall). (See Guillaume's comment on the distinction). I won't elaborate on what would count as "evidence robust to unknown unknowns" in such a context but this is a topic for a future post/paper, hopefully.

Next, I want to push back on your claim that if ii) is correct, everything collapses. I agree that this would lead to the conclusion that we are probably entirely clueless about longtermist causes, probably the vast majority of causes in the world. However, it would make me lean toward near-term areas with much shorter causal chains, where there is a smaller margin of error—for example, caring for your family or local animals, which carry a low risk of backfiring.

Sorry, I didn't mean to argue against that. I just meant that work you are clueless about (e.g. maybe AI safety work in your case?) shouldn't be given any weight in your diversified portfolio. I didn't mean to make any claim about what I personnally think we should or shouldn't be clueless about. The "everything falls apart" was unclear and probably unwarranted.

Johan de Kock @ 2025-01-12T11:54 (+1)

I agree with your reasoning, and the way you’ve articulated it is very compelling to me! It seems that the bar this evidence would need to reach is, quite literally, impossible.

I would even take this further and argue that your chain of reasoning could be applied to most causes (perhaps even all?), which seems valid.

Would you disagree with this?

Your reply also raises a broader question for me: What criteria must an intervention meet for our determinance credence in its expected value being positive to exceed 50%, thereby justifying work on it?

Jim Buhler @ 2025-01-13T09:30 (+3)

I would even take this further and argue that your chain of reasoning could be applied to most causes (perhaps even all?), which seems valid.
Would you disagree with this?

I mean, I didn't actually give any argument for why I don't believe AI safety is good overall (assuming pure longtermism, i.e., taking into account everything from now until the end of time). I just said that I would believe it if there was evidence robust to unknown unknowns. (I haven't argued that there wasn't such evidence already; although the burden of the proof is very much on the opposite claim tbf). But I think this criterion applies to all causes where unknown unknowns are substantial, and I believe this is all of them as long as we're evaluating them from a pure longtermist perspective, yes. And whether there is any cause that meets this criterion depends on one's values I think. From a classical utilitarian perspective (and assuming the trade-offs between suffering and pleasure that most longtermists endorse), for example, I think there's very plausibly none that does meet this criterion.

Guillaume Corlouer @ 2025-01-10T12:25 (+6)

Thanks for writing this post! It is important to think about the implications of cluelessness and moral uncertainty for AI safety. To clarify the value of working on AI safety, it helps to decompose the problem into two subquestions:

Is the outcome that we are aiming for robust to cluelessness, and moral uncertainty?
Do we know of an intervention that is robustly good to achieve that outcome? (i.e. an intervention that is at least better than doing nothing to achieve that outcome)

An outcome could be reducing X-risks from AI which could at least happen in two different ways: value lock-in from a human-aligned AI or from a non-human AI controlled future. Reducing value lock-in seems robustly good, and i won't argue for that here.

If the outcome we are thinking about is reducing extinction from AI, then the near-term case from reducing extinction from AI seems more robust to cluelessness, and I feel that the post could have emphasised it a bit more. Indeed, reducing the risk of extinction from AI, for all the people alive today and in the next few generations, looks good from a range of moral perspectives (it is at least determinate good for humans) even though it is indeterminate in the long-term. But then, one has to compare short term AI X-risks with other interventions that makes things determinately good in the short term from an impartial perspective, like working on animal welfare, or reducing extreme poverty.

AI seems high stakes, even though we don't know which way it will go in the short/long term, which might suggest focusing more on capacity building instead of a more direct intervention (I would put capacity building in some of the paths that you suggested, as a more general category to put careers like earning to give). This could hold as long as the capacity building (for ex. putting ourselves in a position to make things go well w.r.t AI when we have more information) has low risk of backfiring (don't make things worse) that is.

If we grant that reducing X-risks from AI seems robustly good, and better than alternative short-term causes (which is a higher bar than ``better than doing nothing''), then we still need to figure out interventions that are robust to reduce X-risks from AI (i.e. so that we don't make things worse). I already mentioned some non-backfire capacity building (if we find such kind of capacity building). Beyond capacity building; it's not completely clear to me that there are robustly good interventions in AI safety, and I think more work is needed to prioritize interventions.

It seems useful to think of one's career as being part of a portfolio, and work on things where one could plausibly be in a position to do excellent work, unless the intervention that one is working on is not determinately better than doing nothing.

Greg_Colbourn @ 2025-01-14T20:03 (+2)

Beyond capacity building; it's not completely clear to me that there are robustly good interventions in AI safety, and I think more work is needed to prioritize interventions.

I think it's pretty clear^[1] that stopping further AI development (or Pausing) is a robustly good intervention in AI Safety (reducing AI x-risk).

^{^}
But see this post for some detailed reasoning.

Johan de Kock @ 2025-01-12T11:31 (+2)

I enjoyed reading your insightful reply! Thanks for sharing, Guillaume. You don’t make any arguments I strongly disagree with, and you’ve added many thoughtful suggestions with caveats. The distinction you make between the two sub-questions is useful.

I am curious, though, about what makes you view capacity building (CB) in a more positive light compared to other interventions within AI safety. As you point out, CB also has the potential to backfire. I would even argue that the downside risk of CB might be higher than that of other interventions because it increases the number of people taking the issue seriously and taking proactive action—often with limited information.

For example, while I admire many of the people working at PauseAI, I believe there are quite a few worlds in which those initially involved in setting up the group have had a net-negative impact in expectation. Even early on, there were indications that some people were okay with using violence or radical methods to stop AI (which was then banned by the organizers). However, what happens if these tendencies resurface when “shit hits the fan”? To push back on my own thinking, it still might be a good idea to work on PauseAI due to community diversification argument within AI safety (footnote two).

I agree that other forms of CB, such as MATS, seem more robust. But even here, I can always find compelling arguments for why I should be clueless about the expected value. For instance, an increased number of AI safety researchers working on solving an alignment problem that might ultimately be unsolvable could create a false sense of security.

Greg_Colbourn @ 2025-01-14T19:53 (+2)

However, what happens if these tendencies resurface when “shit hits the fan”?

I don't think this could be pinned on PauseAI, when at no point has PauseAI advocated or condoned violence. Many (basically all?) political campaigns attract radical fringes. Non-violent moderates aren't responsible for them.

Guillaume Corlouer @ 2025-01-12T19:23 (+1)

I am curious, though, about what makes you view capacity building (CB) in a more positive light compared to other interventions within AI safety. As you point out, CB also has the potential to backfire. I would even argue that the downside risk of CB might be higher than that of other interventions because it increases the number of people taking the issue seriously and taking proactive action—often with limited information.

Yeah, just to clarify, CB is not necessarily better than other interventions. However, CB with low backfire risks could be promising. This does not necessarily mean doing community building, since community building could backfire depending on how it is done (for example maybe if it is done in a very expansive non-careful way it could more easily backfire). I think the PauseAI example that you gave is a good example of potentially non robust intervention, or at least I would not count it as a low backfire risk capacity building intervention.

One of the motivation of CB would be to put ourselves in a better position to pursue some intervention if we end up less clueless. It might be that we don't in fact end up less clueless, and that while we have done CB, there are still no robust interventions that we can pursue after some time. In that case, it would be better to pursue determinately good short-term interventions even after doing CB (but then we have to pay the opportunity cost of the resources spent doing CB rather than doing the interventions good in the short term directly).

I am still uncertain about low backfire CB interventions (that are better than doing something good directly), perhaps some way of increasing capital or well targeted community building could be good examples, but it seems like an open question to me.

Greg_Colbourn @ 2025-01-14T19:35 (+3)

You mention S-risk. I tend to try not to think too much about this, but it needs to be considered in any EV estimate of working on AI Safety. I think appropriately factoring it in could be overwhelming in terms of concluding that preventing ASI from being built is the number 1 priority. Preventing space colonisation x-risk could be achieved by letting ASI extinct everything. But how likely is ASI-induced extinction over ASI-induced S-risk (ASI simulating, or physically creating, astronomical amounts of unimaginable suffering, on a scale far larger than human space colonisation could ever achieve)?

Jan Wehner🔸 @ 2025-01-09T18:15 (+3)

Great post Johan! It got me thinking more deeply about the value of working on x-risk reduction and how we ought to act under uncertainty. I think people (including me) doing direct work on x-risk reduction would do well to reflect on the possibility of their work having (large) negative effects.

I read the post as making 2 main arguments:
1) The value of working on x-risk reduction is highly uncertain
2) Given uncertainty about the value of cause areas, we should use worldview diversification as a decision strategy

I agree with 1, but am not convinced by 2). WDS might make sense for a large funder like OpenPhil, but not for individuals seeking to maximize their positive impact on the world. Diversification makes sense to reduce downside risk or because of diminishing returns of investing in one option. I think neither of those apply to individuals interested in maximizing the EV of their positive impact on the world. 1) While protecting against downside risks (e.g. losing all your money) makes sense in investing, it is not important if you're only maximizing Expected Value. 2) Each individual's contributions to a cause area won't change things massively, so it seems implausible that there are strong diminishing returns to their contributions.

However, I am in favor of your suggestion of doing worldview diversification at the community level, not at the level of individuals. To an extent, EA already does that by emphasising Neglectedness. Perhaps practically EAs should put more weight on neglectedness, rather than working on the single most important problem they see.

Mo Putera @ 2025-01-10T08:57 (+4)

I think you're not quite engaging with Johan's argument for the necessity of worldview diversification if you assume it's primarily about risk reduction or or diminishing returns. My reading of their key point is that we don't just have uncertainty about outcomes (risk), but uncertainty about the moral frameworks by which we evaluate those outcomes (moral uncertainty), combined with deep uncertainty about long-term consequences (complex cluelessness), leading to fundamental uncertainty in our ability to calculate expected value at all (even if we hypothetically want to as EV-maximisers, itself a perilous strategy), and it's these factors that make them think worldview diversification can be the right approach even at the individual level.

Johan de Kock @ 2025-01-11T13:52 (+3)

Mo, thank you for chiming in. Yes, you understood the key point, and you summarised it very well! In my reply to Jan, I expanded on your point about why I think calculating the expected value is not possible for AI safety. Feel free to check it out.

I am curious, though: do you disagree with the idea that a worldview diversification approach at an individual level is the preferred strategy? You understood my point, but how true do you think it is?

Johan de Kock @ 2025-01-11T13:48 (+3)

Hi Jan, I appreciate the kind words and the engagement!

You correctly summarized the two main arguments. I will start my answer by making sure we are on the same page regarding what expected value is.

Here is the formula I am using:

(EV) = (p) × (+V) + (1 − p) × (−W)
p = probability
V = magnitude of the good outcome
−W = magnitude of the bad outcome

As EAs, we are trying to maximize EV.

Given that I believe we are extremely morally uncertain about most causes, here is the problem I have encountered: Even if we could reliably estimate the probability of how good or bad a certain outcome is, and even how large +V and −W are, we still don’t know how to evaluate the overall intervention due to moral uncertainty.

For example, while the Risk-Averse Welfarist Consequentialism worldview would “aim to increase the welfare of individuals, human or otherwise, without taking long-shot bets or risking causing bad outcomes,” the Total Welfarist Consequentialism worldview would aim to maximize the welfare of all individuals, present or future, human or otherwise.

In other words, the two worldviews would interpret the formula (even if we perfectly knew the value of each variable) quite differently.

Which one is true? I don’t know. And this does not even take into account other worldviews, such as Egalitarianism, Nietzscheanism, or Kantianism, and others that exist.

To make matters worse, we are not only morally uncertain, but I am also saying that we can’t reliably estimate the probability of how likely a certain +V or −W is to come into existence through the objectives of AI safety. This is linked to my discussion with Jim about determinate credences (since I didn’t initially understand this concept well, ChatGPT gave me a useful explanation).

I think complex cluelessness leads to the expected value formula breaking down (for AI safety and most other longtermist causes) because we simply don’t know what p is, and I, at least, don’t have a determinate credence higher than 50%.

Even if we were to place a bet on a certain worldview (like Total Welfarist Consequentialism), this wouldn’t solve the problem of complex cluelessness or our inability to determine p in the context of AI safety (in my opinion).

This suggests that this cause shouldn’t be given any weight in my portfolio and implies that even specializing in AI safety on a community level doesn’t make sense.

Having said that, there are quite likely some causes where we can have a determinate credence way above 50% for p in the EV formula. And in these cases, a worldview diversification strategy seems to be the best option. This requires making a bet on a set or one worldview, though. Otherwise, we can’t interpret the EV formula, and doing good might not be possible.

Here is an (imperfect) example of how this might look and why a WDS could be the best to pursue. The short answer to why WDS is preferred appears to be that, given moral uncertainty, we don't need to choose one out of 10 different worldviews and hope that it is correct; instead, we can diversify across different ones. Hence, this seems to be a much more robust approach to doing good.

What do you make of this?

Anthony DiGiovanni @ 2025-02-02T16:49 (+2)

This is linked to my discussion with Jim about determinate credences (since I didn’t initially understand this concept well, ChatGPT gave me a useful explanation).

FYI, I don't think ChatGPT's answer here is accurate. I'd recommend this post if you're interested in (in)determinate credences.

Greg_Colbourn @ 2025-01-14T19:18 (+2)

Going to push back a bit on the idea that preventing AI x-risk might be net negative as a reason for not trying to do this. Applying the reversal test: given the magnitude of expected values either way, positive or negative, from x-risk reduction, it would be an extreme coincidence if doing nothing was the best action. Why not work to increase x-risk? I appreciate there are other second order effects that may be overwhelming - the bad optics of being viewed as a "supervillain" for one! But this perhaps is a useful intuition pump for why in fact decreasing x-risk is likely to actually be +EV.

Gideon Futerman @ 2025-01-12T00:13 (+2)

Just flagging, it seems pretty strange to have something about career choice in 'Community'

Johan de Kock @ 2025-01-12T00:38 (+1)

Thank you for flagging. I actually tried removing the tag before you mentioned this, since I agree. I tried removing it multiple times already (when I wanted to replace it with a better tag), but it isn't working ...

Just before I posted it, I selected the tag because it stated: "The community topic covers posts about the effective altruism community, as well as applying EA in one's personal life". But the subsequent points didn't match too well, and I didn't think that it would be shown in a separate section.

@Toby Tremlett🔹 or @Sarah Cheng , could you please remove the tag?

Rethinking the Value of Working on AI Safety

Summary

Introduction

1. Complex Cluelessness

2. Moral Uncertainty

How a worldview diversification strategy can address this issue

What are the practical implications?

Which of these paths should you pursue?

Meta Career

Pros

Cons

Further Reading

Specialising in a specific cause (using AI safety as an example[1])

Pros

Cons

When you should consider choosing / continuing this path:

Earning to give

Pros

Cons

When you should consider choosing this path:

An important consideration

Final Reflections

Invitation to a Discussion Session

Acknowledgements

Specialising in a specific cause (using AI safety as an example^[1])