Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
By evhub @ 2024-01-12T19:51 (+65)
This is a linkpost to https://arxiv.org/abs/2401.05566
nullBy evhub @ 2024-01-12T19:51 (+65)
This is a linkpost to https://arxiv.org/abs/2401.05566
null