Linkpost: Making deals with early schemers

By Buck @ 2025-06-21T16:34 (+20)

Julian Stastny, Olli Jarvimieni, and I have written a post on making deals with early schemers that might be of interest to this forum. Here's how I summarized it:

New post! It might make sense to pay misaligned AIs to reveal their misalignment and cooperate with us. AIs that are powerful enough to take over probably won't want to accept this kind of deal. But...
Early schemers--that is, AIs that are so misaligned that they conspire against us, but aren't powerful enough to have a reasonable chance of taking over the world--might not have other good options. So we might be able to work out mutually beneficial deals.
Read the post on LW here: https://alignmentforum.org/posts/psqkwsKrKHCfkhrQx/making-deals-with-early-schemers… Or on our blog.
In our post, we discuss various details of this: how you'd structure such trades, what properties of AIs would make them more likely to accept trades, our analysis of the AI's best takeover strategies in the absence of a deal, and what we think we'd get from such a deal.
Many people have talked about these ideas in general before, but I think we get into a lot of specific details that haven't previously been discussed publicly.

This is relevant to people who are interested in AI takeover risk, and also relevant to people who think we should pay AIs for moral reasons (e.g. Matthew Barnett here).