A Friendly Face (Another Failure Story)
By Karl von Wendt @ 2023-06-20T10:31 (+22)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullGeoffrey Miller @ 2023-06-20T21:15 (+6)
Karl -- very good scenario-building.
I know some EA/LessWrong folks are dubious about the merits of developing specific stories about possible risks and harms, and prefer to stick with the abstract, theoretical level.
However, the human mind runs on stories, communicates through stories, and imagines future risks using stories. Stories are central. The stories don't have to be 100% accurate to be useful in introducing people to new ideas and possibilities.
In addressing earlier potential X risks such as nuclear war and asteroid impacts, I think movies, TV series, and science fiction novels were probably much more influential and effective than think tank reports, academic papers, or nonfiction books. We shouldn't be afraid to tell specific, emotive, memorable stories about AI X-risk.
Critics will say 'It's just science fiction'. So what? Science fiction works when persuading the public.
Karl von Wendt @ 2023-06-21T12:10 (+3)
Thank you! Yes, stories about how the movies "War Games" and "The Day After" changed Ronald Reagan's mind about cyberattacks and the risk of a global nuclear war were in part inspiration for this AI Safety Camp project. Stories can indeed be powerful.
kokotajlod @ 2023-06-20T12:25 (+4)
Another nice story! I consider this to be more realistic the previous one about open-source LLMs. In fact I think this sort of 'soft power takeover' via persuasion is a lot more probable than most people seem to think. That said, I do think that hacking and R&D acceleration are also going to be important factors, and my main critique of this story is that it doesn't discuss those elements and implies that they aren't important.
In addition to building more data centers, MegaAI starts constructing highly automated factories, which will produce the components needed in the data centers. These factories are either completely or largely designed by the AI or its subsystems with minimal human involvement. While a select few humans are still essential to the construction process, they are limited in their knowledge about the development and what purpose it serves.
It would be good to insert some paragraphs, I think, about how FriendlyFace isn't just a single model but rather is a series of models being continually improved in various ways perhaps, and how FriendlyFace itself is doing an increasing fraction of the work involved in said improvement. By the time there are new automated factories being built that humans don't really understand, presumably basically all of the actual research is being done by FriendlyFace and presumably it's far smarter than it was at the beginning of the story.
Karl von Wendt @ 2023-06-21T12:14 (+3)
As replied on LW, we'll discuss this. Thanks!
Stephen McAleese @ 2023-06-20T19:03 (+3)
Excellent story! I believe there's strong demand for scenarios explaining how current AI systems could go on to have a catastrophic effect on the world and the story you described sounds very plausible.
I like how the story combines several key AI safety concepts such as instrumental convergence and deceptive alignment with a description of the internal dynamics of the company and its interaction with the outside world.
AI risk has been criticized as implausible given the current state of AI (e.g. chatbots) but your realistic story describes how AI in its present form could eventually cause a catastrophe if it's not developed safely.
Karl von Wendt @ 2023-06-21T12:13 (+1)
Thank you! I hope this story will have a similar effect outside of the EA community.
Sam Robinson @ 2023-06-21T08:12 (+1)
Hi Karl, just wanted to say that impact aside, I really ENJOYED reading this. Thanks for writing!
Karl von Wendt @ 2023-06-21T12:12 (+1)
Thanks! I should point out that this isn't my work alone, we were a team of six developing the story. Unfortunately, the co-authors Artem, Daniel, Ishan, Peter, and Sofia are not visible in this cross-post from LessWrong.