Let’s set new AI safety actors up for success

By michel @ 2023-06-26T21:17 (+33)

Summary:

New actors are taking AI risk seriously (thankfully!)  

Tl;dr: There’s a lot of new powerful actors taking an interest in AI safety. I’d guess this is net positive, but as a result EA is losing its relative influence on AI safety. 

A year ago, if you wanted to reduce the risk of AI takeover, you probably engaged with EA and adjacent communities: you probably read LessWrong, or the EA forum, or the Alignment forum; you probably attended EAG(x)s; you probably talked about your ideas for reducing AI risk with with people who identified as EAs. Or at the very least you knew of them. 

That’s changed. Over the past 3 months, powerful actors have taken an interest in catastrophic risks from AI, including existential catastrophes:

EA and adjacent AI safety-focused communities – what I’ll refer to as “last year’s AI alignment community” from now on – are no longer a sole authority on AI risk. If you want to reduce large-scale risks from AI a year from now, it seems increasingly likely you’ll no longer have to turn to last year’s AI alignment community for answers, and can instead look to some other, more ‘traditional’ actors. 

What exactly new actors like governments, journalists, think tanks, academics, and other talent pools do within AI safety is still to be determined – and EA-motivated people who have thought a lot about alignment are uniquely positioned to guide new actors efforts to productively reduce x-risk. 

As I’ll expand on below, I think now is an opportune time to shape the non-EA-adjacent portion of work done on AI safety.

These new actors may have a big influence on AI risk

The new actors who seem to be taking AI safety seriously are powerful and often have credibility, status, or generic power that most actors active in the EA sphere don’t have (yet). 

The “new actors” I’m imagining include academics, journalists, industry talent, governments, think tanks, etc. that previously weren’t interested in AI safety but now areThese actors can do things EA actors currently can’t, like:

What should we do in light of the new actors taking an interest in AI safety? 

Last year’s alignment community could respond to these actors taking an interest in AI safety in a variety of ways, which I see as roughly a continuum:

On the margin, I think we should be more actively forming relationships with new actors and setting them up to do x-risk reducing work, even if it's at the expense of some of our direct work. 

When I thought about ways EA could fail a year ago, I became most worried about the type of failures where the world just moves on without us – the worlds where we’re not in the rooms where it happens (despite writing lots of good forum posts and tweets). I’m worried that’s what could happen soon with AI risk. By not actively trying to position ourselves in existing power structures, we give up seats in the room where it happens to people who I think are less well-intentioned and clear eyed about what’s at stake. 

 

I may well be wrong about this ‘more coordination with new actors is possible and worth investing in’ take: 

Concrete calls to action (and vigilance)

Realistically, not everyone in last year's AI alignment community is well-positioned to coordinate with new actors; I think the burden here falls particularly on actors who have legible credibility outside EA and adjacent communities (e.g., a PhD in ML or policy background, working on the alignment team of an AI lab, a position at an AI org with a strong brand, some impressive CV items, etc.)

Some ex-SERI MATS scholar who is all in on theoretical agent foundations work might be best positioned to keep doing just that. But I think everyone can help in the effort of growing the number of people doing useful alignment work (or at least not putting people off!)
 

Here’s my best guess of concrete things people in last year’s alignment could do:

Closing 

As the rest of the world watches the AI risk discourse unfold, now strikes me as a time to let the virtues of EA show: a clear-eyed focus on working for the betterment of others, a relentless focus on what is true of the world, and the epistemic humility to know we may well be wrong. To anyone trying to tackle a difficult problem where the fate of humanity may be at stake, we should be valuable allies; we are on the same side.

I’m curious whether others agree with this post's argument, or whether they think there are better frames for orienting to the influx of attention on AI risk. 
 

Thank you to Lara Thurnherr for comments on a draft of this post, and Nicholas Dupuis for the conversation that spurred me to get writing. All views are my own.


ChanaMessinger @ 2023-08-02T11:02 (+4)

I liked this!

I appreciated that for the claim I was most skeptical of: "There’s also the basic intuition that more people with new expertise working on a hard problem just seems better", my skepticism was anticipated and discussed.

For me one of the most important things is:

Patch the gaps that others won’t cover

  • E.g., if more academics start doing prosaic alignment work, then ‘big-if-true’ theoretical work may become more valuable, or high-quality work on digital sentience. 
  • There’s probably predictable ‘market failures’ in any discipline – work that isn’t sexy but still very useful (e.g., organizing events, fixing coordination problems, distilling the same argument into new language, etc.). 
  • Generally track whether top priority work is getting covered (e.g., information security, standards and monitoring) 

This, plus avoiding and calling out safety washing, keeping an eye out for overall wrongheaded activities and motions (for instance, probably a lot of regulation is bad by default and some could actively make things worse), seem like the strongest arguments against making big naive shifts because the field is in a broad sense less neglected.

More generally, I think a lot of the details of what kinds of engagements we have with the broader world will matter (and I think in many cases "guide" will be a less accurate description of what's on the table than "be one of the players in the room", and some might be a lot more impactful than others, but I don't have super fleshed out views on which yet!