Reframing the burden of proof: Companies should prove that models are safe (rather than expecting auditors to prove that models are dangerous)

By Akash @ 2023-04-25T18:49 (+35)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
Greg_Colbourn @ 2023-04-25T20:10 (+7)

Yes, fully agree. The only reason current models are (relatively) safe is because they are weaker than us. Open AI brag that:

Our mitigations have significantly improved many of GPT-4’s safety properties compared to GPT-3.5. We’ve decreased the model’s tendency to respond to requests for disallowed content by 82% compared to GPT-3.5, and GPT-4 responds to sensitive requests (e.g., medical advice and self-harm) in accordance with our policies 29% more often.

29%!? That's really not going to cut it when the models are able to overpower us (or produce output that does). All the doom will flow through the tiniest cracks of imperfect alignment. In the limit of superintelligent AI, we need 100% perfect, 0 failure modes, alignment. And the burden should totally be on the company releasing the model that has the potential to scale to superintelligence (and endangering all 8 billion people on the planet), to prove that. 

If there are small remaining risks, then releasing the model needs to be a globally democratic decision (e.g. by unanimous vote at the UN).