Discovering Language Model Behaviors with Model-Written Evaluations By evhub @ 2022-12-20T20:09 (+25) null