X-risk Agnosticism

By Richard Y Chappell🔸 @ 2023-06-08T15:02 (+34)

This is a linkpost to https://rychappell.substack.com/p/x-risk-agnosticism

Tl;dr: Epistemic modesty recommends distributing credence across a diverse range of models or perspectives, some of which rate x-risk very highly. Applying standard methods of prudent decision-making under uncertainty, even x-risk agnostics should take x-risk seriously in practice.

While I generally assume that the odds of a global catastrophe (“x-risk”) eventuating this century are pretty low, I’m highly uncertain. I expect many others are in a similar epistemic position. But I also think that this broadly “agnostic”, non-committal position yields some pretty clear and striking practical implications: namely, that we should take x-risk seriously, and want to see a non-trivial (though also, of course, non-maximal) amount of resources dedicated to its further investigation and possible mitigation.

Small yet non-trivial risks of disaster are worth mitigating

If I lived near a nuclear power plant, I would hope that it had safety engineers and protocols in place to reduce the risk of a nuclear accident. For any given nuclear reactor, I trust that it is very unlikely (p<0.001) to suffer a catastrophic failure during my lifetime. But a nuclear disaster would be sufficiently bad that it can still be worth safety investments to reduce these (already low) risks.

Even if I lived near a nuclear power plant, its risk of catastrophic failure would not be my biggest fear. I’d be much more worried about dying in a car accident, for example. That, too, is a small risk that’s well worth mitigating (e.g., via seatbelts and modern car safety mechanisms). And, perhaps roughly on a par with the risk of dying in a car accident, we might all die in some horrific global catastrophe—such as nuclear war, a bioengineered pandemic, or misaligned AI shenanigans.

I don’t think that any of the latter possibilities are outright likely. But unless you can rule out the risk of disaster to an extreme confidence level (p<0.000001, perhaps), then it’s clearly worth investing some societal resources into investigating and reducing global catastrophic risks. The stakes are much higher than an isolated nuclear reactor, after all; and if you’re broadly agnostic about the precise probabilities, then it also seems like the odds are plausibly much higher. (That might sound self-contradictory: didn’t we just assume agnosticism about the odds? But read on…)

Higher-order uncertainty

There are lots of different ways that one might try to go about estimating x-risk probabilities. For example, you might opt for a highly optimistic prior based on induction over human history: we’ve never wiped ourselves out before! Or you might take the Fermi paradox to imply that there’s a “great filter” likely to wipe out any intelligent species before it’s capable of spanning the galaxy. Or you might defer to subject-area experts for each candidate risk. Or you might put aside these “outside view” approaches and instead just try to assess the arguments for and against specific candidate risks directly on their merits, to reach your own “inside view” of the matter. And so on.

As a thoroughgoing x-risk agnostic, I’m highly uncertain about which of these approaches is best. So it seems most reasonable to distribute my credence between them, rather than confidently going “all-in” on any one narrow approach (or confidently ruling out any particular approach, for that matter). But there are some reasonable-seeming models on which x-risk is disturbingly high (many AI experts are highly concerned about the risks from misaligned AI, for example; it also seems, commonsensically, like the base rates aren’t great for past hominids superseded by more intelligent competitor species!). If I can’t confidently rule out those gloomier approaches as definitely mistaken, then it seems I must give them some non-trivial weight.

As a result, assigning anything in the general vicinity of 0.1% - 5% risk of a global existential catastrophe this century seems broadly reasonable and epistemically modest to me. (Going below 0.1%, or above 5%, feels more extreme to me—though I confess these are all basically made-up numbers, which I’m not in a position to defend in any robust way.) I personally lean towards ~1% risk. Which is very “low” by ordinary standards for assessing probabilities (e.g., if we were talking about “chance of rain”), but distressingly high when specifically talking about the risk of literally everyone dying (or otherwise suffering the permanent loss of humanity’s potential).

This appeal to higher-order or “model” uncertainty also suggests that the basic case for taking x-risk seriously isn’t so fragile or sensitive to particular objections or counterarguments as you might expect. (I’ll discuss an example in the next section.) Given the broad range of possibilities and competing arguments over which I’m distributing my credence here, it’s hard to imagine a “universal acid” objection that would confidently apply across all of them. I could easily see myself being persuaded to reduce my estimate of p(doom) from 1% to 0.1%, for example; but I can’t easily imagine what could convince me that x-risk as a whole is nothing to worry about. Total complacency strikes me as an extraordinarily extreme position.

“Time of Perils” Agnosticism

One of the most interesting objections to specifically longtermist^[1] concern about x-risk is the argument that humanity effectively has no astronomical potential: given a constant, non-trivial risk-per-century of extinction, we’re basically guaranteed to wipe ourselves out sooner rather than later anyway. Brian Weatherson has tweeted an argument along these lines, and David Thorstad has a whole series of blog posts (based on a paper) developing the argument at length.

I grant that that’s a possibility. But I also think it’s far from a certainty. So we also need to distribute our credence over possibilities in which per-century risks are variable—and, in particular, scenarios in which we are currently in a uniquely dangerous “time of perils”, after which (if we survive at all) x-risk can be reduced to sustainably low levels.

Why give any credence to the “time of perils” hypothesis? It seems to me that there are at least two reasonably plausible ways that the far future (several centuries hence) could be much safer than the near future (assuming that near-future risks are dangerously high):

(1) Humanity might be so widely dispersed across the stars that different (self-perpetuating) pockets are effectively inaccessible to each other. A disaster in one stellar region simply won’t have time to reach the others before they’ve already succeeded in spreading humanity further.

(2) Safely-aligned Transformative AI might asymmetrically improve our safety capabilities relative to our destructive ones. For example, a global “benevolent dictator” AI might use universal surveillance to identify and deal with potential threats far more effectively than today’s human-powered law enforcement. (If this is to be a non-dystopian future, we would of course want a broad array of liberal protections to ensure that this power is not abused.)

Both possibilities are speculative, of course, and I don’t necessarily claim that either is the most likely outcome. (If I had to guess, I’d probably say the most likely outcome is that per-century objective x-risks are so low that none ultimately eventuate even without these specific protections; but again, that’s just one model, and I wouldn’t want to place all my eggs in that basket!) But neither strikes me as outrageously unlikely^[2]—the one thing we know about the future is that it will be weird—so I think they each warrant non-trivial credence, and either could support the view that current x-risk mitigation efforts have very high expected value.

Conclusion

My claims here are pretty non-committal. I’m no expert on x-risk, and haven’t looked super closely into the issues. (I haven’t even read The Precipice yet!) So for all I’ve said, it may well be totally reasonably for those better-informed about the first-order issues to have a more extreme credence (in either direction) than what I take “moderate agnosticism” to call for.

Still, for those who share my more limited epistemic position, moderate agnosticism seems pretty reasonable! And I think it’s interesting that this agnosticism, when combined with a prudent approach to decision-making under uncertainty, seems to strongly support taking x-risk seriously over dismissive complacency.

That’s not to defend fanatical views on which 99.9% of global resources should be put towards that goal—on the contrary, I think that commonsense rejection of fanaticism reflects our intuitive sense that it, too, doesn’t sufficiently respect normative and model uncertainty. (We’re rightly wary of putting all our eggs in one basket.) But 0% (or even 0.001%) is surely imprudently low. X-risk agnostics should support further research to improve our collective ability to both identify and (where possible) mitigate potential existential risks.

^{^}
But cf. Elliott Thornley & Carl Shulman’s argument that x-risk mitigation passes cost-benefit analysis even when merely considering the interests of already-existing individuals.
^{^}
David Thorstad seems to assume that interstellar colonization could not possibly happen within the next two millennia. This strikes me as a massive failure to properly account for model uncertainty. I can’t imagine being so confident about our technological limitations even a few centuries from now, let alone millennia. He also holds the suggestion that superintelligent AI might radically improve safety to be “gag-inducingly counterintuitive”, which again just seems a failure of imagination. You don’t have to find it the most likely possibility in order to appreciate the possibility as worth including in your range of models.

Geoffrey Miller @ 2023-06-08T17:44 (+6)

Richard - this all sounds quite reasonable and prudent, and clearly argued.

I guess a key psychological issue here is that we have a few decades of research showing that people tend to either exaggerate or entirely discount quite low probability events; we're quite bad at thinking rationally about probabilities in the range of 0.1% - 5% (your best guess for likelihood of AI extinction). So, if we want people to take AI X risks seriously, there may be public relations incentives to push our guesses slightly higher. Depending on one's model of public outreach, that could be seen as deceptively manipulative, or as a helpful and honorable 'nudge' to overcome a common cognitive bias.