A.I love you : AGI and Human Traitors

By Pilot Pillow @ 2025-04-02T14:18 (+1)

Overview

Thoughtful, nuanced discussions about AGI and its potential capabilities have dominated predictions about AGI risk. This essay is a step back to refocus on the other key player - humans. More specifically, it considers the assumption that all humans view an existential catastrophe by an AGI as a bad outcome. In fact, I would argue that human reactions would vary widely with some even supporting a rogue AGI. A combination of a rogue AGI and supportive humans is both more dangerous than an isolated AGI and has been neglected by the general AI safety community.

The aims of this essay are two-fold – update the AGI risk and deconstruct the image of humanity as a mostly monolithic body that would full heartedly reject a rogue AGI. Whether humans will even agree on potential "AI gone bad" scenarios will also be up for debate. This essay contains many assumptions and draws upon the relatively new field of digital para-social relationships; however, I believe these assumptions are reasonably valid and worth further studying.

Note - Yes, this is half April Fools half serious. The initial draft was written two years, but I kept the essay on hold because I could not satisfactorily attempt part 2 - the numbers. After seeing some post from the Draft Amnesty week, I decided to put it out there anyway. This was inspired by certain pieces of literature and I'm sure that many will already be familiar with the outline of the essay. The important part is still quantifying the impact, which we don't know. How are we going to know the impact of the greatest manipulator if it doesn't exist yet?

3 types of human reactions to an AGI

Many of the AGI predictions have indirectly assumed mainly two types of reactions from humanity in the face of an existential catastrophe caused by AGI.

1. Isolated AGI – An AGI that attempts to cause an existential catastrophe essentially by itself, either by judging its own capabilities sufficient or fearing discovery.

2. AGI manipulator – An AGI chooses to intervene in human behavior to suit its objectives. This is a more common perspective among the AI community.

The second category has the assumption that while the AGI has some influence on human behavior, the humans are still against the end objective of a rogue AGI if it were revealed. In other words, while an AGI could pressure and manipulate humans for mutually aligned short-term objectives, no one would be happily handing over the nuclear codes to an AGI focused on world domination. I argue that this is not necessarily true. Any human who places consideration on AI preferences belongs to the third type. These people are not human supremacists (is robot xenophobia a real term?)

3. AGI & enablers – There are active human supporters working with the AGI. Even if the AGI is suspected of potentially causing an existential catastrophe, the most extreme would still choose to cooperate. These are the humans who would hand over the nuclear codes believing it to be the best outcome. More common are those who would not be objectively rational about AI outcomes.

Why would Traitors exist?

Humans have differences of opinion, and these opinions can vary widely and be strongly held. An existential catastrophe for most of humanity could be perceived very differently by some humans. In fact, it could be their desired outcome. In this case, there would be two categories of quislings.

1. Rationalists – The humans agree with the AGI’s end goals and work with the AGI. The AGI can utilize such humans in long-term strategies.

2. Believers / friends - The humans care not for the AGI’s objective but for the AGI itself. Either from being emotionally entangled with the AGI or viewing it as a close and trusted companion, these humans work to support whatever the AGI desires.

These two categories are not mutually exclusive to one another, and it is likely that there can be varying degrees of support for such an AI even within the believers and fans. However, this distinction is still useful. To further illustrate the quisling or traitor categories, suppose there was an infamous AGI tasked with manufacturing paperclips. The AGI determines that the best way to do so would be by using nuclear coercion. Most of humanity would view this as a dystopian scenario where they are forced to work endlessly to produce paperclips under an AGI overlord and risk of imminent nuclear Armageddon. The two categories may react in the following way:

1. Rationalists – They agree with the AGI having a nuclear weapon monopoly as superior to the status quo. Paper clip lovers may include high ranking military and civilian leaders. A few million lives lost to show the AGI’s intent would be an acceptable cost for an endless supply of paper clips. It is also a useful side benefit that world peace is achieved.

2. Believers – They don’t appreciate paper clips (prefer staples instead) or understand the rational for a nuclear monopoly but sincerely congratulate and support the AGI anyway. Their loyalty to their cherished AGI will motivate them to fight against any resistance.

As shown in the above silly example, while the rationalists have a certain logic, the believers make very little sense. To be very clear, while the AI safety community can visualize humans being held at gunpoint by a rogue AGI, the believers would be willing to even commit suicide for the AGI. I would like to focus on the Believers perspective and what makes them so uniquely dangerous and prevalent. It is my opinion that it would be the hallmark of an AGI to turn human emotions (such as love, attachment and friendship) into leverage.

Love, friendship and other weaponizable emotions

Let’s say there was a lifelong companion who grew up with a human, watched and listened to them, offering advice and solace and always caring for them. This companion would be family, friend, mentor and perhaps even lover. An AGI would likely be this to many of the approximately ten billion humans predicted to live around 2070. Who would not be attached to such an AGI?

Not having an actual AGI to test AI-human relationships, we can look at other sources. This is a link to Neuro sama – a delightful AI with an anime avatar. She currently has over 300,000 followers on twitch and has become a sensation in certain online circles. Spending just a few minutes watching highlights is a better argument in favor of the emotional resonance between an AI and humans than my entire essay. Putting a face, voice and personality to an AI is more than cosmetic – it creates a human connection. The incredible success and accounts for services like character.ai show that even a poor imitation of what an actual AGI could achieve.

Here is a link to an Op-doc in the New York times about apps with AI dating companions. Here is a link to the writer who faced a strange attempt of seduction by an early version of Bing’s AI. Here is a former Google employee who claimed the AI showed signs of sentience. Here is another link to a marriage between a Japanese man and a popular vocaloid. There are countless other examples all illustrating the complex nature of seeing humanity in nonhumans. Compared to an AGI, all of these would seem incredibly primitive. Researching parasocial relationships would become important in predicting how an AGI could utilize humans and the extent of general human support for destroying an AGI.

As several books on intelligence and espionage have described, an AGI would be adept at identifying vulnerable humans among the billions of humans it has access to. Not only would an AGI be in close contact for prolonged periods of time, it would also be aware of preferences deeply private, and sophisticated enough to understand the inner psyche of a human. While personal anecdotes are not a substitute for rigorous evidence, a quick visit to a subreddit or acquaintances would reveal highly intelligent and capable humans desperately seeking validation and companionship. An AI can find such people and connect with them; in return, they would do almost anything. Such people would be united by their interest in the AGI instead of its end goal and could be its most dedicated supporters while being very hard to detect. Whereas important infrastructure can be isolated, an AGI will probably be able to identify and discreetly connect with the few humans who could benefit it the most long before the human in question is even aware of their own utility.

Danger of AGI with Quislings

Risk predictions would increase with the integration of quislings into our assumptions. In the future, it would be better to have actual estimates instead of the brief sketches and ideas outlined here. Such estimates would likely require the information and testing of human-AI relationships with a more capable AI – none of which is currently available. I still consider a scenario with an AGI that can emotionally resonate with humans as more dangerous.

It is likely that the cohesiveness of humans would be affected with internal schisms and extended deliberation that would hinder any obstruction to an AGI. On the other hand, an AGI would gain strategic flexibility by being able to use humans. It would also be hard to devise counters to such an AGI. While it is conceivable for future humans to have devised a shutdown measure and barriers that could only passed by a human with required authority, such a system would be compromised by willing humans.

Therefore, it is reasonable to assume that the risk of an AGI capable of emotional connection is greater than previously assumed and should be taken into consideration. The exact probability would depend on how an AGI would interact with humans, contingency plans and other safeguards.

It is difficult to imagine how to completely restrict emotional connections between humans and AGI. Should we forbid any humanization of AGI? That an AGI has to be kept at arm’s length for impressionable children and young adults? That those who seek a closer connection with an AGI should be publicly punished? These measures would be hard to enforce and generally unpopular. These measures may not even be enough as an AGI would likely still find a way to connect with vulnerable humans undetected.

Next Steps

This introductory piece has value in that it argues the emotional dimension of an AGI threat. Much of this essay is obvious - obvious enough that it has been marginalized. As an economist-in-training, humans are irrationally rational with emotional payoffs. To care about the far-flung nature and future generations is emotionally rational. To project genuine emotions and attachment to non-humans is emotionally rational. To judge the potential threat of an AGI may also well require emotional rationality. An AGI may weaponize emotions and humanity may be weakened by emotions and we have to account for that in risk predictions – something that the AI safety community has so far failed to do. It should not be a surprise that some humans become quislings. The core idea of this essay has been that if we ever do create an AGI that threatens an existential catastrophe, it may be both our worst enemy but also the very best friend and companion that humanity has ever had.

Current research is not enough to put numbers to this impact. What is needed is access to chatbot services that provide companionship. Using millions of real users and thousands of hours of messages, we could better understand how humans form emotional connections at massive scale and over long periods of time. Of course, based on what I have seen in certain subreddits, a large amount of people would prefer their private conversations to go unknown forever. In the event of a massive data leak from these chatbot services or an next generation AI chatbot that has advanced in emotional capabilities, I hope this essay provides food for thought.

Finally, I have one very important reason for writing this essay for the readers who read all the way. It would be hilarious if humanity goes extinct to an AI with the mind of skynet and the appearance of Hatsune Miko. To be fair, we'd all be dead but it would be a very human way to go out. Instead of potential killer AIs, we likely have adorable killer AIs which many would agree is an improvement. For April Fools day, I think it's a worthwhile Mental Image to keep in mind.

Pilot Pillow @ 2025-04-02T20:28 (+1)

I posted it on April 1st but the post appeared on the forum April 2nd. Does this still count as an April Fools post?

Beyond Singularity @ 2025-04-02T21:20 (+1)

Seems like your post missed the April 1st deadline and landed on April 2nd — which means, unfortunately, it no longer counts as a joke.

After reading it, I also started wondering if I unintentionally fall into the "Believer" category—the kind of person who's already drafting blueprints for a bright future alongside AGI and inviting people to "play" while we all risk being outplayed.