Unjournal evaluation of "Towards best practices in AGI safety and governance" (Schuett et al, 2023)

By david_reinstein, Lorenzo Pacchiardi @ 2025-06-03T11:18 (+9)

This is a linkpost to https://unjournal.pubpub.org/pub/evalsumagisafety

Do experts agree on what labs should do to make AI safer? Schuett et al's (2023) survey ("Towards best practices in AGI safety and governance: A survey of expert opinion") finds a broad consensus.

We organized two evaluations of this paper (link: summary and ratings, click through to the evaluations). Both evaluators rated the paper highly & found this work valuable & meaningful for policy & discussions of AI safety. They also highlighted important limitations, including sampling bias, classification of practices, interpretation of results, and abstract agreement vs. real-world implementation and gave suggestions for improvements and future work.

For example

Sampling/statement selection bias concerns:

Evaluator 1: "... whether the sample of respondents is disproportionately drawn from individuals already aligned with AGI safety priorities"

Evaluator 2: ..."where statements selected from labs practices are agreed on by labs themselves"

Abstract agreement vs. real-world implementation:

Evaluator 1: "items ~capture agreement in principle, rather than ... [the] grappling with... tradeoffs ... real-world governance inevitably entails."

Evaluator 2: "agreement on practices could indicate a host of different things"; these caveats should should be incorporated more in the stated results.

The evaluation summary highlights other issues that we (as evaluator managers) believe merit further evaluation. We also point to a recent work that is related and complementary.

The Unjournal has published 37 evaluation packages (and 20 are in progress), mainly targeting impactful work in quantitative social science and economics across a range of outcomes and cause areas. See our output at https://unjournal.pubpub.org. See the list of over 200 papers with potential for impact (which we have/or are currently considering or evaluating) here.

david_reinstein @ 2025-08-10T22:07 (+2)

My own take on this is that this suggests we need more work in this area. We need a follow-up with some follow-up work doing a similar survey, taking sample selection and question design more seriously.

Representative quotes from the evaluations

Sampling bias/selection issues

Potential sampling bias – particularly over-representation of safety-minded respondents
There’s a risk that the sample reflects the views of those already more inclined to endorse stringent safety norms. This is particularly important in light of the sample composition, as over 40% of respondents are affiliated with AGI labs

Question selection/design, need for sharper questions

As the authors note, this [agreement] may be partly due to the high-level and generally uncontroversial framing of the statements (e.g., “AGI labs should conduct pre-deployment risk assessments”). But in their current form, the items mostly capture agreement in principle, rather than forcing respondents to grapple with the kinds of tradeoffs that real-world governance inevitably entails.
For example, would respondents still support red teaming or third-party audits if they significantly delayed product releases?

[Emphasis added]

the paper states that the selected practices are extracted from (1) current practices at individual AGI labs and (2) planned practices at individual labs, among other sources.
... These results might suggest a selection bias where statements selected from labs practices are agreed on by labs themselves,

[suggestion to] introduce an inclusion/exclusion criterion to provide a better justification as to why some statements are selected.

Overstated claims for consensus?
[Paper] “findings suggest that AGI labs need to improve their risk management practices. In particular, there seems to be room for improvement when it comes to their risk governance.”

While one can agree with such claim, it is difficult to see how this conclusion can be reached from the paper’s results.