AI companies aren't really using external evaluators

By Zach Stein-Perlman @ 2024-05-26T19:05 (+86)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
Toby Tremlett @ 2024-05-30T12:41 (+17)

I'm curating this post. 
I wrote a draft for a feature on a politico piece for the EA newsletter, exploring this same question-- are big labs following through on verbal commitments to share their models with external evaluators? Despite taking "several months" to speak with experts- the politico piece didn't have as much useful information as this blog post. I cut the feature, because I couldn't find as much information in the time I had.
I think work like this is really valuable, filling a serious gap in our understanding of AI Safety. Thanks for writing this Zach!

phgubbins @ 2024-06-04T19:05 (+5)

Cross-commenting from lesswrong for future reference:

I had an opportunity to ask an individual from one of the mentioned labs about plans to use external evaluators and they said something along the lines of:

“External evaluators are very slow - we are just far better at eliciting capabilities from our models.”

They earlier said something much to the same effect when I asked if they’d been surprised by anything people had used deployed LLMs for so far, ‘in the wild’. Essentially, no, not really, maybe even a bit underwhelmed.

SummaryBot @ 2024-05-27T13:32 (+1)

Executive summary: Despite claims, AI companies are not consistently using external evaluators to assess their models for dangerous capabilities before deployment, which could improve risk assessment and provide public accountability.

Key points:

  1. External evaluators like METR, UK AISI, and Apollo are not being given sufficient pre-deployment access to models to effectively evaluate risks.
  2. Evaluators should be given deeper access than end users, including versions without safety filters or fine-tuning, to better assess threats from both deployment and potential leaks.
  3. The costs and challenges of providing external evaluator access are unclear, but likely surmountable.
  4. Some AI labs have committed to external "red teaming" but this is not a substitute for comprehensive model evaluation by dedicated experts.
  5. AI labs should also provide deeper model access to safety researchers to support their work, potentially without fully releasing weights.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.