Agentic Mess (A Failure Story)

By Karl von Wendt @ 2023-06-06T13:16 (+30)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
rime @ 2023-06-06T20:50 (+8)

I only skimmed the post, but it was unusually usefwl for me even so. I hadn't grokked risk from simply runaway replication of LLMs. Despite studying both evolution & AI, I'd just never thought along this dimension. I always assumed the AI had to be smart in order to be dangerous, but this is a concrete alternative.

  1. People will continue to iteratively experiment with and improve recursive LLMs, both via fine-tuning and architecture search.[1]
  2. People will try to automate the architecture search part as soon as their networks seem barely sufficient for the task.
  3. Many of the subtasks in these systems explicitly involve AIs "calling" a copy of themselves to do a subtask.
  4. OK, I updated: risk is less straightforward than I thought. While the AIs do call copies of themselves, rLLMs can't really undergo a runaway replication cascade unless they can call themselves as "daemons" in separate threads (so that the control loop doesn't have to wait for the output before continuing). And I currently don't see an obvious profit motive to do so.
  1. ^

    Genetic evolution is a central example of what I mean by "architecture search". DNA only encodes the architecture of the brain with little control over what it learns specifically, so genes are selected for how much they contribute to the phenotype's ability to learn.

    While rLLMs will at first be selected for something like profitability, that may not remain the dominant selection criterion for very long. Even narrow agents are likely to have the ability to copy themselves, especially if it involves persuasion. And given that they delegate tasks to themselves & other AIs, it seems very plausible that failure modes include copying itself when it shouldn't, even if they have no internal drive to do so. And once AIs enter the realm of self-replication, their proliferation rate is unlikely to remain dependent on humans at all.

    All this speculation is moot, however, if somebody just tells the AI to maximise copies of itself. That seems likely to happen soon after it's feasible to do so.

Karl von Wendt @ 2023-06-07T09:51 (+4)

OK, I updated: risk is less straightforward than I thought. While the AIs do call copies of themselves, rLLMs can't really undergo a runaway replication cascade unless they can call themselves as "daemons" in separate threads (so that the control loop doesn't have to wait for the output before continuing). And I currently don't see an obvious profit motive to do so.

I'm not sure if I understand your point correctly. The LLMs wouldn't have to be replicated, because different copies of the self-replicating agent could access the same LLM in parallel, just like many human users can access ChatGPT at the same time. At the later stage, when the LLM operators try to block access to their LLMs or even take them offline, the agent would have to find a way to replicate at least one LLM just once and run it on sufficiently powerful hardware to use it as (part of) its "brain".

Geoffrey Miller @ 2023-06-20T22:05 (+3)

Karl - I like that you've been able to develop a plausible scenario for a global catastrophic risk that's based mostly on side-effects of evolutionary self-replication, rather than direct power-seeking.

This seems to be a relatively neglected failure mode for AI. When everybody was concerned about nanotechnology back in the 1990s, the 'grey goo scenario' was a major worry (in which self-replicating nanotech turns everything on Earth into copies of itself.)  Your story explores a kind of AI version of the grey goo catastrophe.