Anthropic rewrote its RSP

By Zach Stein-Perlman @ 2024-10-15T14:30 (+32)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
SummaryBot @ 2024-10-16T18:39 (+1)

Executive summary: Anthropic has updated its Responsible Scaling Policy (RSP) with more flexible risk assessment approaches and new capability thresholds, but some changes weaken previous commitments and the policy still lacks strong third-party oversight.

Key points:

  1. New RSP introduces more flexible risk assessment but weakens some previous commitments, like evaluation frequency.
  2. New capability thresholds defined for CBRN weapons, autonomous AI R&D, and model autonomy.
  3. Policy lacks strong third-party auditing for key decisions, relying mainly on CEO and RSO.
  4. Some improvements noted, like sharing future safeguard plans and stance on non-disparagement clauses.
  5. Concerns raised about under-elicitation of model capabilities and missed opportunities for stronger third-party evaluations.
  6. Author identifies potential issues with Anthropic's understanding and communication of its own policy.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.