Why misaligned AGI won’t lead to mass killings (and what actually matters instead)
By Julian Nalenz @ 2025-02-06T13:22 (–3)
This is a linkpost to https://blog.hermesloom.org/p/why-misaligned-agi-wont-lead-to-mass
I still don't understand the concern about misaligned AGI regarding mass killings.
Even if AGI would, for whatever reason, want to kill people: As soon as that happens, the physical force of governments will come into play. For example the US military will NEVER accept that any force would become stronger than it.
So essentially there are three ways of how such misaligned, autonomous AI with the intention to kill can act, i.e. what its strategy would be:
- "making humans kill each other": Through something like a cult (i.e. like contemporary extremist religions which invent their stupid justifications for killing humans; we have enough blueprints for that), then all humans following these "commands to kill" given by the AI will just be part of an organization deemed as terrorists by the world’s government, and the government will use all its powers to exterminate all these followers.
- "making humans kill themselves": Here the AI would add intense large-scale psychological torture to every aspect of life, to bring the majority of humanity into a state of mind (either very depressed or very euphoric) to trick the majority of the population into believing that they actually want to commit suicide. So like a suicide cult. Protecting against this means building psychological resilience, but that’s more of an education thing (i.e. highly political), related to personal development and not technical at all.
- "killing humans through machines": One example would be that the AI would build its own underground concentration camps or other mass killing facilities. Or that it would build robots that would do the mass killing. But even if it would be able to build an underground robot army or underground killing chambers, first the logistics would raise suspicions (i.e. even if the AI-based concentration camp can be built at all, the population would still need to be deported to these facilities, and at least as far as I know, most people don’t appreciate their loved ones being industrially murdered in gas chambers). The AI simply won't physically be able to assemble the resources to gain more physical power than the US military or, as a matter of fact, most other militaries in the world.
I don't see any other ways. Humans have been pretty damn creative with how to commit genocides and if any computer would start giving commands to kill, the AI won't ever have more tanks, guns, poisons or capabilities to hack and destroy infrastructure than Russia, China or the US itself.
The only genuine concern I see is that AI should never make political decisions autonomously, i.e. a hypothetical AGI “shouldn’t” aim to take complete control of an existing country’s military. But even if it would, that would just be another totalitarian government, which is unfortunate, but also not too unheard of in world history. From the practical side, i.e. in terms of the lived human experience, it doesn’t really matter whether it’s a misaligned AGI or Kim Jong-Un torturing its population.
In the end, psychologically it's a mindset thing: Either we take the approach of "let's build AI that doesn't kill us". Or, from the start, we take the approach of "let's built AI that actually benefits us" (like all the "AI for Humanity" initiatives). It's not like we first need to solve the killing problem and only once we've fixed that once and for all, we can make AI be good for humanity as an afterthought. That would be the same fallacy which the entire domain of psychology has fallen into, where it has been pathological (i.e. just intending to fix issues) instead of empowering (i.e. building a mindset so that the issues don't happen in the first place) for many decades, and only positive psychology is finally changing something. So it very much is about optimism instead of pessimism.
I do think that it's not completely pointless to talk about these "alignment" questions. But not to change anything about AI, but for the software engineers behind it to finally adopt some sort of morality themselves (i.e. who they want to work for). Before there's any AGI that wants to kill large-scale, your evil government of choice will do that by itself, and such governments will give all us software engineers pretty good job offers. The banality of evil is a thing.
Every misaligned AI will initially need to be built/programmed by a human, just to kick off the mass killing. And that evil human won't give a single damn about all your thoughts and ideas and strategies and rules which all the AI alignment folks are establishing. So if AI alignment work is obviously nothing that will actually have any effect on anything whatsoever, why bother with it and not work on ways how AI can add value for humanity instead?
Or am I completely wrong and I’m fundamentally missing a way of how AGI will bring true innovation and yet total monopolization into the global industry of genocides? Tell me, from a very pragmatic, practical point of view, how would it be able to outsmart humans when it comes to genocides?
For more on my opinions, especially why I believe that “safe AI” is a fallacy anyway, see the final sections of my post Debunking the myth of safe AI.
Ian Turner @ 2025-02-06T23:02 (+3)
I would suggest thinking about it this way: Do I need to know what Gary Kasparov's winning move would be, in order to know that he would beat me at chess? The answer is "no", he would definitely win, even if I can't predict exactly how.
As I wrote a couple of years ago, are you able to use your imagination to think of ways that a well-resourced and motivated group of humans could cause human extinction? If so, is there a reason to think that an AI wouldn't be able to execute the same plan?
harfe @ 2025-02-06T13:43 (+3)
the AI won't ever have more [...] capabilities to hack and destroy infrastructure than Russia, China or the US itself.
Having better hacking capability than China seems like a low bar for super-human AGI. The AGI would need to be better at writing and understanding code than a small group of talented humans, and have access to some servers. This sounds easy if you accept the premise of smarter-than-human AGI.
JordanStone @ 2025-02-06T13:43 (+1)
Not an AI safety expert, but I think you might be under-estimating the imagination a superintelligent AI may have. Or maybe you're under-estimating what a post-AGI intelligence may be capable of?
With the "killing humans through machines" option, a superintelligent AI would probably be smart enough to kill us all without taking the time to build a robot army, which would definitely raise my suspicions! Maybe it would hack nuclear weapons and blow us all up, invent and release an airborne super-toxin, or make a self-replicating nanobot - wouldn't see it coming, over as soon as we realised it wasn't aligned.
And for "making humans kill themselves", it might destabilise Governments with misinformation or breakdown communication channels, leading to global conflicts where we kill each other. To stay on the nuclear weapons route, maybe it tricks the USA's nuclear detection system into thinking that nuclear weapons have been fired, and the USA retaliates causing a nuclear war.
I think the fact that I can use my limited human intelligence to imagine multiple scenarios where AI could kill us all makes me very confident that a superintelligent AI would find additional and more effective methods. The question from my perspective isn't "can AI kill us all?" but "how likely is it that AI can kill us all?", and my answer is I don't know but definitely not 0%. So here's a huge problem that needs to be solved.
Full disclosure, I have no idea how AI alignment will overcome that problem, but I'm very glad they're working on it.
titotal @ 2025-02-06T16:23 (+2)
With the "killing humans through machines" option, a superintelligent AI would probably be smart enough to kill us all without taking the time to build a robot army, which would definitely raise my suspicions! Maybe it would hack nuclear weapons and blow us all up, invent and release an airborne super-toxin, or make a self-replicating nanobot - wouldn't see it coming, over as soon as we realised it wasn't aligned.
Drexlerian style nanotech is not a threat for the foreseeable future. It is not on the horizon in any meaningful sense, and may in fact be impossible. Intelligence, even superintelligence, is not magic, and cannot just reinvent a better design than DNA, from scratch, with no testing or development. If drexlerian nanotech becomes a threat, it will be very obvious.
Also, "hacking nuclear weapons"? Do you understand the actual procedure involved in firing a nuclear weapon?
JordanStone @ 2025-02-06T17:14 (+1)
I accept that I don't know actual procedure for firing a nuclear weapon. And no one in the west knows what North Korea's nuclear weapons cybersecurity is like, and ChatGPT tells me its connected to digital networks. So there's definitely some uncertainty and I wouldn't dismiss the possibility outright that nuclear weapons would be more likely to be hacked if superintelligence existed. So I'd guess maybe a 10-20% chance that it's possible to hack nuclear weapons based on what I know.
And I agree that it may be impossible to create drexlerian style nanotech. Maybe a 0.5% chance an ASI could do something like that?
But I don't think the debate here is about any particular scenario that I came up with.
I think if I tried really hard I could come up with about 20 scenarios where an artificial superintelligence might be able to destroy humanity (if you really want me to I can try and list them). And I guess my proposed scenarios would have an average chance of actually working of 1-2%, so maybe around 10% chance that one of my proposed scenarios would work.
But are you saying that the chance of ASI being able to kill us is 0%? In which case every conceivable scenario (including any plan that an ASI could come up with) would have to have a 0% chance of working? I just don't find that possible, human civilisation isn't that robust. It must be at least a 10% chance that one of those plans could work right? In which case significant efforts in AI safety to mitigate this risk are warranted.