evhub

Evan Hubinger (he/him/his) (evanjhub@gmail.com)

Head of Alignment Stress-Testing at Anthropic. My posts and comments are my own and do not represent Anthropic's positions, policies, strategies, or opinions.

Previously: MIRI, OpenAI

See: “Why I'm joining Anthropic”

Selected work:

Posts

Introducing Alignment Stress-Testing at Anthropic
by evhub @ 2024-01-12 | +80 | 0 comments

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
by evhub @ 2024-01-12 | +65 | 0 comments

RSPs are pauses done right
by evhub @ 2023-10-14 | +93 | 0 comments

The Hubinger lectures on AGI safety: an introductory lecture series
by evhub @ 2023-06-22 | +44 | 0 comments

Discovering Language Model Behaviors with Model-Written Evaluations
by evhub @ 2022-12-20 | +25 | 0 comments

We must be very clear: fraud in the service of effective altruism is...
by evhub @ 2022-11-10 | +712 | 0 comments

You can talk to EA Funds before applying
by evhub @ 2021-09-28 | +104 | 0 comments

FLI AI Alignment podcast: Evan Hubinger on Inner Alignment, Outer Alignment, and...
by evhub @ 2020-07-01 | +13 | 0 comments