The myth of AI “warning shots” as cavalry

By Holly Elmore ⏸️ 🔸 @ 2025-05-28T16:53 (+31)

This is a linkpost to https://hollyelmore.substack.com/p/the-myth-of-ai-warning-shots-as-cavalry

Regulation cannot be written in blood alone.

There’s this fantasy of easy, free support for the AI Safety position coming from what’s commonly called a “warning shot”. The idea is that AI will cause smaller disasters before it causes a really big one, and that when people see this they will realize we’ve been right all along and easily do what we suggest. I can’t count how many times someone (ostensibly from my own side) has said something to me like “we just have to hope for warning shots”. It’s the AI Safety version of “regulation is written in blood”. But that’s not how it works.

Here’s what I think about the myth that warning shots will come to save the day:

1) Awful. I will never hope for a disaster. That’s what I’m trying to prevent. Hoping for disasters to make our job easier is callous and it takes us off track to be thinking about the silver lining of failing in our mission.

2) A disaster does not automatically a warning shot make. People have to be prepared with a world model that includes what the significance of the event would be to experience it as a warning shot that kicks them into gear.

3) The way to make warning shots effective if (God forbid) they happen is to work hard at convincing others of the risk and what to do about it based on the evidence we already have— the very thing we should be doing in the absence of warning shots.

If these smaller scale disasters happen, they will only serve as warning shots if we put a lot of work into educating the public to understand what they mean before they happen.

The default “warning shot” event outcome is confusion, misattribution, or normalizing the tragedy.

Let’s imagine what one of these macabrely hoped-for “warning shot” scenarios feels like from the inside. Say one of the commonly proposed warning shot scenario occurs: a misaligned AI causes several thousand deaths. Say the deaths are of ICU patients because the AI in charge of their machines decides that costs and suffering would be minimized if they were dead.

First, we would hear news stories from one hospital at a time. Different hospitals may attribute the deaths to different causes. Indeed, the AI may have used many different means, and different leads will present themselves first in different crime scenes, and different lawyers may reach for different explanations to cover their clients’ butts. The first explanation that reaches the national news might be “power outage”. Another hospital may report, based on an IT worker’s best guess, a “bad firmware update”. A spate of news stories may follow about our nation’s ailing health tech infrastructure. The family of one of the deceased may begin to publicly question if this was the result of sabotage, or murder, because of their relative’s important knowledge or position. By the time the true scale of the incident becomes apparent, many people will already have settled in their minds that it was an accident, or equipment failure, or that most of the victims were collateral damage from a hit. By the time an investigation uncovers the true cause, much of the public will have moved on from the story. The true cause may be suppressed by a powerful AI industry, or it may simply be disbelieved by a skeptical public. Even if the true story is widely accepted, there will not necessarily be a consensus on the implications. Some people may— for whatever contrarian or motivated reason— be willing to bite the bullet and question whether those patients in the ICU should have been kept alive. A national discussion on end-of-life care may ensue. The next time audiences hear of an event like this that is possibly caused by AI, they will worry less, because they’ve heard that before.

Not so rousing, huh? Just kind of a confused mess. This is the default scenario for candidate “warning shot” events. In short, these deaths some misguided people are praying for are likely to be for nothing— the blood spilled without any getting into the fountain pen— because the connection between the bad effect and the AI danger cause is not immediately clear.

What makes an effective warning shot

To work, a true warning shot must provide a jolt to the gut, an immediate and clear inner knowing triggered by the pivotal event. I felt this when I first saw chatGPT hold a conversation.

The Elmore Model of Effective Warning Shots:

A warning shot works when

chatGPT was a good warning shot because almost everyone knew that they believed that computers couldn’t talk like that. They were surprised, maybe even scared, at a gut level when they saw a computer doing something they thought it could not. Whether they had articulated it in their mind or not, they had a world model that they felt being updated the first time they realized AI could now talk like a human.

I had an ideal warning shot reaction to chatGPT— it resulted in me quitting my job in a different field and forming PauseAI US months later. I was deeply familiar with linguistic and computer science thinking about when or even whether a machine could achieve human-level natural language. I knew that the training method didn’t involve hard coding heuristics or universal grammar, something many linguists would have argued would be necessary, so it meant that a lot of cognition was probably easily discoverable through scaling alone, meaning there were far, far fewer innovation impediments to dangerous superintelligence that I thought there might be. When I saw a computer talk, I knew— immediately, in my gut— that greater than human level AI was happening in my lifetime, and I could feel that I gave some credence to even the most pessimistic AI timelines I had heard in light of this news.

I had spent the better part of my life erecting an elaborate set of dominoes in my mind— I knew exactly how I thought the capabilities of AI related to the safety and health of civilization, what some capabilities would mean about the likelihood of others, how much time was needed for what kind of regulation, who else was (or wasn’t) on this, etc.— all just waiting for the chatGPT moment to knock the first one over. THIS is a warning shot, and most of the action happened beforehand in my head, not with the external event.

We’ve had several flubbed warning shots already that didn’t work because even the experts didn’t agree what was happening as it was happening and what it meant.

If even the experts who discussed contingencies for years can’t agree what constitutes a warning shot, how can the public that’s supposed to be rallied by them?

To prepare the public for warning shots, they need to be educated enough to interpret the event.

It’s really hard to give specific predictions for warning shots because of the nature of AI danger. It’s not just one scenario or threat model. The problem is that frontier general AI has flexible intelligence and can plan unanticipated ways of getting what it wants or otherwise produce unanticipated bad consequences. There are few specific predictions I have for warning shots. What I thought was our last best shot (ba dum tss) as of last year, autonomous self-replication, has already been blown past.

This is why I believe we should be educating people with a worldview, not specific scenarios. While concrete scenarios can be helpful to help people connect the dots as to how, for example, a loss of control scenario could work at all, they can easily be misinterpreted as predictions. If someone hears your scenario as your entire threat model of AI danger, they will not understand the true nature of the danger and they will not be equipped to recognize warning shots. It would be ideal if all we had to do was tell people to be ready for exactly what will happen, but, since we don’t know that, we need to communicate a worldview that gives them the ability to recognize the diverse scenarios that could result from unsafe AI.

I was ready for my chatGPT moment because of my worldview. My mind was laid out like dominoes, so that when chatGPT knocked over the right domino, the entire chain fell. To be in this state of mind requires setting up a lot of dominoes. If you don’t know what you expect to happen and what different results would mean, you may feel surprised or appalled by a warning shot event, but you won’t know what to do with it. It will just be one fallen domino clattering on the ground. When that feeling of surprise or shock is not accompanied swiftly by an update to your world model or inner call to action, it will probably just pass by. And that’s unfortunate because, remember, the default result of a smaller scale AI disaster is that it’s not clear what happened and people don’t know what it means. If I were to learn my laundry list of worldview things after experiencing chatGPT, the flash of insight and gut knowing wouldn’t be there, because chatGPT wouldn’t be the crucial last piece of information I needed to know that AI risk is in fact alarmingly high. (The fantasy effort-free warning shot is, in this way, an example of Last Piece Fallacy.)

The public do not have to be experts in AI or AI Safety to receive warning shots properly. A major reason I went with PauseAI as my intervention is that advocating for the position communicates a light, robust, and valid worldview on AI risk relatively quickly to the public. PauseAI’s most basic stance is that the default should be to pause frontier AI capabilities development to keep us from wading any further into a minefield, to give time for technical safety work to be done, and for proper governance to be established. In the face of a warning shot event, this worldview would point to pausing whatever activity might be dangerous to investigate it and get consent (via governance) from those affected. It doesn’t require gears-level understanding of AI, and it works under uncertainty about exactly how dangerous frontier AI is or the type of AI risk or harm (misalignment vs misuse vs societal harms like deepfakes— all indicate that capabilities have outpaced safety and oversight). PauseAI offers many levels of action, of increasing effort level, to take in response to AI danger, but even if people don’t remember those or seek them out, simply holding the PauseAI position helps to push the Overton window and expand our base.

Regulation is never written only in blood

It’s hard in the trenches fighting for AI Safety. People comfort each other by dreaming of when cavalry comes to rescue led by a warning shot. Cavalry may be coming and it may not. We can’t count on it. And, if that cavalry is on the way, they need us hard at work doing exactly what we’d be doing with or without them to prepare the way.

The appeal of warning shots is 1) they bypass a lot of biases and denial— people get a lot of clarity on what they think in those moments. 2) They create common knowledge. If you’re having that reaction then you know other people probably are too. You don’t have to be the one to bring up the concern— it’s just out there, in the news. 3) They increase salience. People basically agree that the AI industry are untrustworthy and a Pause would be good, but it’s not their top issue. If there were a disaster, the thinking goes, then it would be easy to get people to act. Convincing them through a lot of hard conversations and advocacy before is just unnecessary work, the cavalry-hoper says, because we will gain so many people for free after the warning shot. But in actual fact, what we get from warning shots is going to be directly proportional to the work we do to prepare people to receive them.

Our base are people who have their own understanding of the issue that motivates them to act or hold their position in the Overton window. They are the people who we bring to our side through hard, grind-y conversations, by clipboarding and petitioning, by engaging volunteers, by forging institutional relationships, by enduring the trolls who reply to your op-ed, by being willing to look weird to your family and friends… The allure of warning shots is that they are “high leverage”, so we don’t have to go to the difficult work of winning people over because, after the warning shot, they will helplessly see we were right because our position will be popular. But people will only sense that this is the popular position if a lot of people around them actually feel the warning shot in their gut, and for that they must be prepared.

“Regulation is written in blood” they say, glibly, going back to whatever they were doing before. If it’s true that AI regulation will be written in blood, then it’s on us to work like hell to make sure that that blood ends up on the page and not uselessly spilled on the ground. Whether or not there’s blood involved, regulation is never written in blood alone— and our job is always to fight for regulation written in ink. We honor those who may fall the same way we fight to protect them: by doing the working of raising awareness, changing minds, and rallying support to protect the world from dangerous AI.


SummaryBot @ 2025-05-28T21:36 (+3)

Executive summary: This personal reflection argues that AI "warning shots"—minor disasters that supposedly wake the public to AI risk—are unlikely to be effective without substantial prior public education and worldview-building, and warns against the dangerous fantasy that such events will effortlessly catalyze regulation or support for AI safety efforts.

Key points:

  1. Hoping for warning shots is morally troubling and strategically flawed—wishing for disasters is misaligned with AI safety goals, and assumes falsely that such events will reliably provoke productive action.
  2. Warning shots only work if the public already holds a conceptual framework to interpret them as meaningful AI risk signals; without this, confusion and misattribution are the default outcomes.
  3. Historical “missed” warning shots (e.g., ChatGPT, deceptive alignment research, Turing Test surpassing) show that even experts struggle to agree on their significance, undermining their value as rallying events.
  4. The most effective response is proactive worldview-building, not scenario prediction; preparing people to recognize and respond to diverse risks requires ongoing public education and advocacy.
  5. PauseAI is presented as an accessible framework that communicates a basic, actionable AI risk worldview without requiring deep technical knowledge, helping people meaningfully respond even amid uncertainty.
  6. The fantasy of cavalry via warning shots discourages the necessary grind of advocacy, but regulation (even if catalyzed by tragedy) ultimately relies on groundwork laid in advance—not just on crisis moments.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

Trevor Buteau @ 2025-05-31T01:26 (+1)

Nice write up, Holly. I agree with a lot of what you wrote. 

I'd add another layer here - in order for a "warning shot" to be effective, it can't just be something that connects a chain of dominoes and "makes sense". If it makes "too much sense" and does nothing to surprise you, little may be "noticed" or "learned". I believe this is sometimes referred to as "Boiling the frog". 

To be effective, a warning shot needs to be "shocking", like a bullet whizzing past your ear - that provokes a response! 

But be careful, because if it's TOO shocking, that can circle back around again and trigger denial, as the brain refuses to accept that something SO shocking is happening ("this can't be real!"). 

So an appropriately-calibrated warning shot is really hard to "get right", and even if one is possible, we shouldn't be counting on it. We probably won't get as lucky as Ozymandias in Watchmen. 

ram @ 2025-05-30T00:31 (+1)

This is a very well thought out post and thank you for posting it. We cannot depend on warning shots and a mindset where we believe people will "wake up" due to warning shots is unproductive and misguided.

I believe to spread awareness, we need more work that does the following:

  1. Continue to package existing demos of warning shots into a story that people can understand, re-shaping their world view on the risks of AI
  2. Communicate these packages to stakeholders
  3. Find more convincing demos of warning shots

 

Our base are people who have their own understanding of the issue that motivates them to act or hold their position in the Overton window. They are the people who we bring to our side through hard, grind-y conversations, by clipboarding and petitioning, by engaging volunteers, by forging institutional relationships, by enduring the trolls who reply to your op-ed, by being willing to look weird to your family and friends…

This cannot be overstated, lets put in the work to make this happen, ensuring that our understanding of AI risk is built from first principles so we are resilient to negative feedback!

dsj @ 2025-05-29T03:31 (+1)

Warning shots/accidents are normally discussed in the frame of generating political will, by convincing a previously unpersuaded public or policymakers that AI is unsafe and action must be taken.

I think this is a mistake.

Accidents (which might be relatively small-scale), in AI as in other fields, are useful mainly for generating real-world, non-hypothetical failure cases in all their intricate detail, thereby yielding a model organism which can be studied by engineers (and hopefully reproduced in a controlled manner) to better understand both the circumstances in which such scenarios might arise, and countermeasures to prevent them.

This is analogous to how aircraft accidents are investigated in depth by the NTSB so as to learn how to prevent similar accidents. There’s already political will to make aircraft safe, but there’s only so much that can be done from the ivory tower without real-world experience.

The choices are:

The printing press and electricity were existentially dangerous technologies, because they enabled everything that came after, including AI. When those technologies were developed, however, the world wasn’t globalized enough, nor were nations powerful enough, that a permanent stop button could have been pressed. By contrast, perhaps a permanent “stop AI” button could be pressed today, however I don’t see any way of doing so short of entrenching a permanent totalitarian state.

So that leaves pausing until we make it safe, or muddling through.

But I think the aircraft accident analogy works quite well for AI: there’s only so much that safety research can do from the ivory tower without experience of AIs being used in the real world. So I think the “pause until we make it safe” option is illusory.

That leaves muddling through, as we’ve done with every technology before: We discover problems, hopefully at a small scale, and fix or mitigate them as they arise.

There are no guarantees, but I think it’s our best bet.