AI companies aren't really using external evaluators
By Zach Stein-Perlman @ 2024-05-26T19:05 (+86)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullToby Tremlett @ 2024-05-30T12:41 (+17)
I'm curating this post.
I wrote a draft for a feature on a politico piece for the EA newsletter, exploring this same question-- are big labs following through on verbal commitments to share their models with external evaluators? Despite taking "several months" to speak with experts- the politico piece didn't have as much useful information as this blog post. I cut the feature, because I couldn't find as much information in the time I had.
I think work like this is really valuable, filling a serious gap in our understanding of AI Safety. Thanks for writing this Zach!
phgubbins @ 2024-06-04T19:05 (+5)
Cross-commenting from lesswrong for future reference:
I had an opportunity to ask an individual from one of the mentioned labs about plans to use external evaluators and they said something along the lines of:
“External evaluators are very slow - we are just far better at eliciting capabilities from our models.”
They earlier said something much to the same effect when I asked if they’d been surprised by anything people had used deployed LLMs for so far, ‘in the wild’. Essentially, no, not really, maybe even a bit underwhelmed.
SummaryBot @ 2024-05-27T13:32 (+1)
Executive summary: Despite claims, AI companies are not consistently using external evaluators to assess their models for dangerous capabilities before deployment, which could improve risk assessment and provide public accountability.
Key points:
- External evaluators like METR, UK AISI, and Apollo are not being given sufficient pre-deployment access to models to effectively evaluate risks.
- Evaluators should be given deeper access than end users, including versions without safety filters or fine-tuning, to better assess threats from both deployment and potential leaks.
- The costs and challenges of providing external evaluator access are unclear, but likely surmountable.
- Some AI labs have committed to external "red teaming" but this is not a substitute for comprehensive model evaluation by dedicated experts.
- AI labs should also provide deeper model access to safety researchers to support their work, potentially without fully releasing weights.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.