Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
By evhub @ 2024-01-12T19:51 (+65)
This is a linkpost to https://arxiv.org/abs/2401.05566
This is a crosspost, probably from LessWrong. Try viewing it there.
nullBy evhub @ 2024-01-12T19:51 (+65)
This is a linkpost to https://arxiv.org/abs/2401.05566
This is a crosspost, probably from LessWrong. Try viewing it there.
null