Worlds where we solve AI alignment on purpose don't look like the world we live in

By MichaelDickens @ 2026-03-20T14:46 (+71)

(Or: Why I don't see how the probability of extinction could be less than 25% on the current trajectory)

Cross-posted from my website.

AI developers are trying to build superintelligent AI. If they succeed, there's a high risk that the AI will kill everyone. The AI companies know this; they believe they can figure out how to align the AI so that it doesn't kill us.

Maybe we solve the alignment problem before superintelligent AI kills everyone. But if we do, it will happen because we got lucky, not because we as a civilization treated the problem with the gravity it deserves—unless we start taking the alignment problem dramatically more seriously than we currently do.

Think about what it looks like when a hard problem gets solved. Think about the Apollo program: engineers working out minute details; running simulations after simulations; planning for remote possibilities.

Think about what it looks like when a hard problem doesn't get solved. Consider the world's response to COVID.

When I look at civilization's response to the AI alignment problem, I do not see something resembling Apollo. When I visualize what it looks like for civilization to buckle down and make a serious effort to solve alignment, that visualization does not resemble the world we live in.

This is the world we live in:

Is this what taking the alignment problem seriously looks like? Is this what taking extinction risk seriously looks like?

Some people are concerned about AI x-risk, and they have P(doom)s in the 5–25% range. I don't get that. I can't pass an Ideological Turing Test for someone who sees all these problems, but still expects us to avert extinction with >75% probability. I don't understand what would lead one to believe that this is what things look like when we're on track to solving a problem.

There is an argument people sometimes make, to the effect of "the good guys have to do a sloppy job on AI safety, because otherwise the bad guys will beat us to ASI and they'll be even worse." I understand that viewpoint.[3] But that doesn't mean P(doom) is low. You could believe something like: if Anthropic builds ASI, there's a 50% chance we die; if xAI or Meta builds ASI, there's a 75% chance we die; therefore, Anthropic has to build ASI first. I don't know anyone who believes this; everyone who wants to race to build ASI seems to have a P(doom) on the order of 25% or less.

I can imagine a world where AI companies are still racing to build ASI, but they're taking the challenges appropriately seriously and investing accordingly in safety. In that world, I can see P(doom) being less than 25%. But that's not the world we live in.


  1. Cryptanalysis is a good example of what it looks like when flaws are hard to catch. Standard practice when releasing a new cryptographic algorithm is to circulate it among experts and spend 2–5 years trying to break the algorithm before anyone uses it in the real world. ↩︎

  2. I won't say "people don't commit this fallacy", because there is no alternative world where people don't make reasoning errors. But in the good world, we have checks in place to make sure that ultimate decisions aren't made on the basis of those errors. ↩︎

  3. That still doesn't explain the fact that AI companies exhibit an embarrassing level of sloppiness. There are various alignment challenges that AI companies could address without significantly slowing down, but they still act oblivious to them. ↩︎


NickLaing @ 2026-03-21T06:18 (+8)

This is a brilliant summary of the situation. I actually find a straightforward list of bullets like this more compelling and easier to understand than something like Yudowsky's book.

MichaelDickens @ 2026-03-21T20:52 (+4)

Thanks for the kind words! A major downside of this post is that I made a whole bunch of strong claims and spent minimal time justifying them, and I wasn't sure if people would find that approach interesting or useful. I mostly just wrote this post because I wanted to express myself.

Denkenberger🔸 @ 2026-03-21T20:38 (+4)

Some people are concerned about AI x-risk, and they have P(doom)s in the 5–25% range. I don't get that. I can't pass an Ideological Turing Test for someone who sees all these problems, but still expects us to avert extinction with >75% probability. I don't understand what would lead one to believe that this is what things look like when we're on track to solving a problem.

P(doom) does not necessarily equal extinction. Paul Christiano had (in 2023) P(AI takeover) at 22%, and P(most humans die from takeover) = 11% (but then other ways of most people dying). But he has much lower probabilities of extinction due to pseudo pico kindness, acausal trade, etc.