I enjoyed most of IABIED

By Buck @ 2025-09-17T04:34 (+73)

I listened to "If Anyone Builds It, Everyone Dies" today.

I think the first two parts of the book are the best available explanation of the basic case for AI misalignment risk for a general audience. I thought the last part was pretty bad, and probably recommend skipping it. Even though the authors fail to address counterarguments that I think are crucial, and as a result I am not persuaded of the book’s thesis and think the book neglects to discuss crucial aspects of the situation and makes poor recommendations, I would happily recommend the book to a lay audience and I hope that more people read it.

I can't give an overall assessment of how well this book will achieve its goals. The point of the book is to be well-received by people who don't know much about AI, and I’m not very good at predicting how laypeople will respond to it; seems like results so far are mixed, and reviews from people who are familiar with AI risk are fairly negative. So I’ll just talk about whether I think the arguments in the book are reasonable enough that I want them to be persuasive to the target audience, rather than whether I think they’ll actually succeed.

Thanks to several people for helpful and quick comments and discussion, especially Oli Habryka and Malo Bourgon!

Synopsis

Here's a synopsis and some brief thoughts, part-by-part:

In part 1, they explain what neural nets are and why you might expect powerful AIs to be misaligned. I thought it was very good. I think it's a reasonable explanation of basic ML and an IMO excellent exploration of what the evolution analogy suggests about AI goals (though I think that there are a bunch of disanalogies that the authors don’t discuss, and I imagine I’d dislike their discussion of that if they did write it). I agreed with most of this section.
- I thought the exploration of the evolution analogy was great – very clearly stated and thoughtful. I don’t remember if I’ve previously read other versions of this argument that also made all the points here (though there are many important subtleties to the argument that it doesn’t discuss; for example, it almost totally ignores instrumental alignment).
- Overall, I thought this part does a great job of articulating arguments. The text does respond to a bunch of counterarguments, but they mostly felt like really naive and rudimentary counterarguments to me, and I felt like most of the counterarguments that I see in the wild (e.g. from people on Twitter, who are mostly much more informed about AI than the audience of this book) were left unaddressed. I have no idea whether the authors’ prioritization of counterarguments was right for that audience, and I do think it would be handy to have a version of this book somewhat more appropriate for AI twitter people.
Part 2, where they tell a story of AI takeover, is solid; I only have one footnoted quibble^[1].
- In general, they try to tell the story as if the AI company involved is very responsible, but IMO they fail to discuss some countermeasures the AI company should take (e.g. I would take those actions if I were in charge of a ten-person team, assuming the rest of the company is being reasonably cooperative with my team). This doesn't hurt the argument very much, because it's easy to instead read it as a story about a real, not-impressively-responsible AI company.
Part 3, where they try to talk about the state of countermeasures and how the risk should be responded to, varied between okay and awful, and overall felt pretty useless to me. If I were recommending the book to someone, I would plausibly recommend that they skip it.
- They give a general discussion of how engineering problems are often hard when you don't have good feedback loops or good understanding of the underlying science, and when the technology involves fast-moving components (e.g. nuclear reactors); this content was fine.
- They very briefly discuss automated AI alignment research as a proposal for mitigating AI risk, but their arguments against that plan do not respond to the most thoughtful versions of these plans. (In their defense, the most thoughtful versions of these plans basically haven't been published, though Ryan Greenblatt is going to publish a detailed version of this plan soon. And I think that there are several people who have pretty thoughtful versions of these plans, haven't written them up (at least publicly), but do discuss them in person.)
- (They also criticize some other naive plans for mitigating AI risk, like “train it to be curious, so that it preserves humanity”; I think their objections to those are fine.)
- They propose that GPU clusters (perhaps as small as 8 GPUs!) be banned or restricted somehow and suggest some other calls to action; I don’t think the ideas here are very good. (I’m told that the online resources go into more detail on their proposals; my concern isn’t that the proposals aren’t detailed enough, but that they aren’t very good interventions to push for.)

I personally really liked the writing throughout (unlike e.g. Shakeel I didn't find the sentences torturous at all). I'm a huge fan of Eliezer's fiction and most of his non-fiction that doesn't talk about AI, so maybe this is unsurprising. I often find it annoying to read things Eliezer and Nate write about AI, but I genuinely enjoyed the experience of listening to the book. (Also, the narrator for the audiobook does a hilarious job of rendering the dialogues and parables.)

My big disagreement

In the text, the authors often state a caveated version of the title, something like "If anyone builds it (with techniques like those available today), everyone dies". But they also frequently state or imply the uncaveated title. I'm quite sympathetic to something like the caveated version of the title^[2]. But I have a huge problem with equivocating between the caveated and uncaveated versions.

There are two possible argument structures that I think you can use to go from the caveated thesis to the uncaveated one, and both rely on steps that are IMO dubious:

Argument structure one:

If anyone built ASI with current techniques in a world that looked like today's, everyone would die.
Tricky hypothesis 1: ASI will in fact be developed in a world that looks very similar to today's (e.g. because sub-ASI AIs will have negligible effect on the world; this could also be because ASI will be developed very soon).
Therefore, everyone will die.

This is the argument that I (perhaps foolishly and incorrectly) understood Eliezer and Nate to be making when I worked with them, and the argument I made when I discussed AI x-risk five years ago, right before I started changing my mind on takeoff speeds.

I think Eliezer and Nate aren’t trying to make this argument—they are agnostic on timelines and they don’t want to argue that sub-ASI AI will be very unimportant for the world. I think they are using what I’ll call “argument structure two”:

If anyone built ASI with current techniques in a world that looked like today's, everyone would die.
The big complication: However, ASI might be built in a world that looks very different from today's: it might be several decades in the future, pretty powerful AI might be available for a while before ASI is developed, researchers might be way more experienced getting AIs to do stuff than they currently are.
Tricky hypothesis 2: But the differences between the world of today and the world where ASI will be developed don't matter for the prognosis.
Therefore, everyone will die.

The authors are (unlike me) confident in tricky hypothesis 2. The book says almost nothing about either the big complication or tricky hypothesis 2, and I think that’s a big hole in their argument that a better book would have addressed.^[3]

I think that explicitly mentioning the big complication is pretty important for giving your audience an accurate picture of what you're expecting. Whenever I try to picture the development of ASI, it's really salient in my picture that that world already has much more powerful AI than today’s, and the AI researchers will be much more used to seeing their AIs take unintended actions that have noticeably bad consequences. Even aside from the question of whether it changes the bottom line, it’s a salient-enough part of the picture that it feels weird to neglect discussing it. (See also Lukas's articulation of this complaint.)

And of course, the core disagreement that leads me to disagree so much with Eliezer and Nate on both P(AI takeover) and on what we should do to reduce it: I don't agree with tricky hypothesis 2. I think that the trajectory between here and ASI gives a bunch of opportunities for mitigating risk, and most of our effort should be focused on exploiting those opportunities. If you want to read about this, you could check out the back-and-forth me and my coworkers had with some MIRI people here, or the back-and-forth Scott Alexander and Eliezer had here.

(This is less relevant given the authors’ goal for this book, but from my perspective, another downside of not discussing tricky hypothesis 2 is that, aside from being relevant to estimating P(AI takeover), understanding the details of these arguments is crucial if you want to make progress on mitigating these risks.)

If they wanted to argue a weaker claim, I'd be entirely on board. For example, I’d totally get behind:

It is pretty unclear whether we will survive or not. There are various reasons to think we might be able to prevent AI takeover. But none of those reasons are airtight, and many of them require that all AI companies with dangerous models implement safety measures competently, and it's very unclear that that will happen.
We should demand an extremely low probability of extinction from AI developers, because extinction would be really bad. And we are not on track to getting to justified confidence in the safety of powerful AI.

But instead, they propose a much stronger thesis that they IMO fail to justify.

This disagreement leads to my disagreement with their recommendations—relatively incremental interventions seem much more promising to me.

(There’s supplementary content online. I only read some of this content, but it seemed somewhat lower quality than the book itself. I'm not sure how much of that is because the supplementary content is actually worse, and how much of it is because the supplementary content gets more into the details of things—I think that the authors and MIRI staff are very good at making simple conceptual arguments clearly, and are weaker when arguments require attention to detail.)

(I will also parenthetically remark that superintelligence is less central in my picture than theirs. I think that there is substantial risk posed by AIs that are not wildly superintelligent, and it's plausible that humans purposefully or involuntarily cede control to AIs that are less powerful than the wildly superintelligent ones the authors describe in this book. This causes me to disagree in a bunch of places.)

I tentatively support this book

I would like it if more people read this book, I think. The main downsides are:

Some people will be persuaded by the parts of the book that I think are wrong, which will have slightly bad consequences but is not a huge problem, and seems overall better than them never engaging with a serious argument about AI x-risk.
Some people will be turned off by the book, especially the most unreasonable parts of it, and we will have missed the opportunity to have someone more reasonable (according to me) than Eliezer and Nate write a similar book and then do a tour etc. I'm less worried about this after reading the book, because the book was good enough that it's hard for me to imagine someone else writing a much better one.
- Relatedly, success of this book will lead Eliezer and Nate to be more prominent public intellectuals on the topic of AI. I don't know whether this is good or bad. It really depends on who they're displacing.
  - I don't love them as representatives of AI safety to the public for a few reasons. Despite the book being impressively cleaned up compared to Eliezer’s usual writing style, I expect them to be somewhat worse at being likable and persuasive to mass audiences in unscripted settings. I think their arguments are often unpersuasive to informed audiences (partially because of the flaws in the arguments that I complained about above, and partially because they don’t know much about empirical ML or empirical evidence about alignment and sometimes come across as blowhards to ML researchers). And I disagree with their recommended actions.
  - I think it would be suboptimal if important stakeholders tried to get advice from them (though again, it depends who they’re displacing), because I don't think that they have good recommendations for what people should do.

Despite my complaints, I’m happy to recommend the book, especially with the caveat that I think it's wrong about a bunch of stuff including the thesis. Even given all the flaws, I don't know of a resource for laypeople that’s half as good at explaining what AI is, describing superintelligence, and making the basic case for misalignment risk. After reading the book, it feels like a shocking oversight that no one wrote it earlier.

^{^}
In their story, the company figures out a way to scale the AI in parallel, and then the company suddenly massively increases the parallel scale and the AI starts plotting against them. This seems somewhat implausible—probably the parallel scale would be increased gradually, just for practical reasons. But if that scaling had happened more gradually, the situation probably still wouldn't have gone that well for humanity if the AI company was as incautious as I expect, so whatever. (My objection here is different from what Scott complained about and Eliezer responded to here—I’m not saying it’s hugely unrealistic for parallel scaling to pretty suddenly lead to capabilities improving as rapidly as depicted in the book, I’m saying that if such a parallel scaling technique was developed, it would probably be tested out with incrementally increasing amounts of parallelism, if nothing else just for practical engineering reasons.)
^{^}
My main problems with the caveated version of the title:
- I again think they’re inappropriately reasoning about what happens for arbitrarily intelligent models instead of reasoning about what happens with AIs that are just barely capable enough to count as ASI. Their arguments (that AIs will learn goals that are egregiously misaligned with human goals and then conspire against us) are much stronger for wildly galaxy-brained AIs than for AIs that are barely smart enough to count as superhuman.
- I don't think it's clear that misaligned superintelligent AI would kill everyone as part of taking over; see discussion here. Note that the expected fatalities from getting taken over by wildly superintelligent AI are probably lower than the fatalities from getting taken over by an AI that is barely able to take over, because in the latter case the AI might have to kill us in order to take our stuff despite not wanting to do so.
^{^}
I don't think Eliezer and Nate are capable of writing this better book, because I think their opinions on this topic are pretty poorly thought through.

Joe Rogero @ 2025-09-17T16:01 (+12)

I felt like most of the counterarguments that I see in the wild (e.g. from people on Twitter, who are mostly much more informed about AI than the audience of this book) were left unaddressed. I have no idea whether the authors’ prioritization of counterarguments was right for that audience, and I do think it would be handy to have a version of this book somewhat more appropriate for AI twitter people.

PSA: The online resources do indeed contain quite a few counter-counterarguments that didn't fit into the book. (Buck probably knows this already, some readers might not.)

David Mathers🔸 @ 2025-09-17T08:31 (+12)

"This disagreement leads to my disagreement with their recommendations—relatively incremental interventions seem much more promising to me."

What's the reasoning here?