Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

By evhub @ 2024-01-12T19:51 (+65)

This is a linkpost to https://arxiv.org/abs/2401.05566

This is a crosspost, probably from LessWrong. Try viewing it there.

null