Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

By evhub @ 2024-01-12T19:51 (+65)

This is a linkpost to https://arxiv.org/abs/2401.05566

null