Ideas for improving epistemics in AI safety outreach

By mic @ 2023-08-21T19:56 (+31)

This is a linkpost to https://www.lesswrong.com/posts/SDpaZ7MdH5yRnobrZ/ideas-for-improving-epistemics-in-ai-safety-outreach

In 2022 and 2023, there has been a growing focus on recruiting talented individuals to work on mitigating the potential existential risks posed by artificial intelligence. For example, we’ve seen an increase in the number of university clubs, retreats, and workshops dedicated to introducing people to the issue of existential risk from AI.

However, these efforts might foster an environment with suboptimal epistemics. Given the goal of enabling people to contribute positively to AI safety, there’s an incentive to focus on that without worrying as much about whether our arguments are solid. Many people working on field building are not domain experts in AI safety or machine learning but are motivated due to a belief that AI safety is an important issue. Some participants may hold the belief that addressing the risks associated with AI is important, without fully comprehending their reasoning behind this belief or having engaged with strong counterarguments.

This post is a brief examination of this issue and suggests some ideas to improve epistemics in outreach efforts.

Note: I first drafted this in December 2022. Since then, concern about AI x-risk has been increasingly discussed in the mainstream, so AI safety field builders should hopefully be using fewer weird, epistemically poor arguments. Still, I think epistemics are still relevant to discuss after a recent post noted poor epistemics in EA community building.

What are some ways that AI safety field building may be epistemically unhealthy?

Organizers may promote arguments for AI safety that may be (comparatively*) compelling yet flawed
- Advancing arguments promoting the importance of AI safety while neglecting opposing arguments
  - E.g., citing that x% of researchers believe that AI has an y% chance of causing an existential catastrophe, without the caveat that experts have widely differing views
- Confidently making arguments that are flawed or have insufficiently justified premises
  - E.g., claiming that instrumental convergence is inevitable, assuming that AIs are maximizing for reward (see Reward is not the optimization target, although there are also comments disagreeing with this)
- See also: Rohin Shah’s comment here about how few people can make an argument for working on AI x-risk that he doesn’t think is obviously flawed
- *Simultaneously, I think that most ML people don’t find AI safety arguments particularly compelling.
It’s easy to form the perception that arguments in favor of AI safety are “supposed” to be the more correct ones. People might feel hesitant to voice disagreements.
In a reading group (such as one based on AI Safety Fundamentals), people may go along with the arguments from the readings or what the discussion facilitator says – deferring to authority and being hesitant to think through arguments themselves.
People may participate in reading groups but skim the readings, and walk away with a belief in the conclusions without understanding the arguments; or notice they are confused but walk away regardless believing the conclusions.

Why are good epistemics valuable?

To do productive research, we want to avoid having an understanding of AI x-risk that is obviously flawed
- “incorrect arguments lead to incorrect beliefs which lead to useless solutions” (from Rohin Shah)
Bad arguments are bad for persuading people (or at least, it seems bad if you can’t anticipate common objections from the ML community)
People making bad arguments is bad for getting people to do useful work
Attract more people with good epistemics

For the sake of epistemic rigor, I’ll also make a few possible arguments about why epistemics may be overrated.

Perhaps people can do useful work even if they don’t have an inside view of why AI safety matters or about what technical agendas matter.
You might have people who just end up feeling confused and don’t take any useful action.
Other activities may be comparatively more valuable.
Epistemics might already be good enough.
Alignment researchers disagree about a lot of things and it might be unclear whether someone you disagree with has “bad epistemics” or just has different models/priors.

Ideas to improve epistemics

Embrace more contemporary, grounded arguments about why AIs could be dangerous
- I think a lot of AI x-risk arguments from the 2000s and 2010s are not apt for the current paradigm of AI agents based on large language models. “An Overview of Catastrophic AI Risks” seems more grounded than Superintelligence, and the introductory readings of the AI Governance Course seem stronger than the (current) AI Alignment Course.
Suggested by Thomas Kwa here: “actually trying to understand the arguments yourself and only using correct ones […] sometimes the required steps are as simple as listing out an argument in great detail and looking at it skeptically, or checking with someone who knows ML about whether a current ML system has some property that we assume and if we expect it to arise anytime soon.”
Conduct readings during meetings, so that participants understand the content better
- This has a couple other benefits, such as: you can remember specific points that you want to discuss
Possibly offer rationality workshops
- Notice when you’re feeling confused
- Note: Rationality workshops shouldn’t be there to get people to take unconventional ideas seriously; they should actually be targeted to help improve epistemics
Actively checking participants’ understanding of the content and check if they can make strong arguments in favor or against
- This could look like: opening discussion asking someone to summarize the argument
Don’t have conversations with the goal of persuading people
- I’d prefer to redirect efforts to encourage/support motivated people to apply for opportunities
Actively inviting people to disagree, maybe exploring their ideas
- Note: I don’t want a kumbaya environment where people voice ideas (even if bad) without any pushback.
Stay in touch with the broader ML community (e.g., by following them on Twitter, attending AI events)