AGI safety career advice

By richard_ngo @ 2023-05-02T07:36 (+210)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
JAM @ 2023-05-25T14:47 (+11)

Hi Richard! I truly appreciate your enlightening post! It struck me as highly informative, and I believe others will feel the same. In order to reach out to our Spanish-speaking individuals, I have proactively translated it into Spanish, as I'm sure they will also see the immense value in this information.

 

Thanks again!


 

richard_ngo @ 2023-05-25T15:29 (+3)

Thanks! I'll update it to include the link.

Erich_Grunewald @ 2023-05-02T15:47 (+10)

I'll piggyback on this (excellent) post to mention that we're working on some of the governance questions mentioned here at Rethink Priorities.

Roman Leventov @ 2023-05-02T19:48 (+6)

Note: this comment is cross-posted on LessWrong.

Classification of AI safety work

Here I proposed a systematic framework for classifying AI safety work. This is a matrix, where one dimension is the system level:

Another dimension is the "time" of consideration:

There would be 6*4 = 24 slots in this matrix, and almost all of them have something interesting to research and design, and none of them is "too early" to consider.

Richard's directions within the framework

Scalable oversight: (monolithic) AI system * manufacturing time

Mechanistic interpretability: (monolithic) AI system * manufacturing time, also design time (e.g., in the context of the research agenda of weaving together theories of cognition and cognitive development, ML, deep learning, and interpretability through the abstraction-grounding stack, interpretability plays the role of empirical/experimental science work)

Alignment theory: Richard phrases it vaguely, but referencing primarily MIRI-style work reveals that he means primarily "(monolithic) AI system * design, manufacturing, and operations time".

Evaluations, unrestricted adversarial training: (monolithic) AI system * manufacturing, operations time

Threat modeling: system of AIs (rarely), human + AI group, whole civilisation * deployment time, operations time, evolutionary time

Governance research, policy research: human + AI group, whole civilisation * mostly design and operations time.

Takeaways

To me, it seems almost certain that many current governance institutions and democratic systems will not survive the AI transition of civilisation. Bengio recently hinted at the same conclusion.

Human+AI group design (scale-free: small group, org, society) and the civilisational intelligence design must be modernised.

Richard mostly classifies this as "governance research", which has a connotation that this is a sort of "literary" work and not science, with which I disagree. There is a ton of cross-disciplinary hard science to be done about group intelligence and civilisational intelligence design: game theory, control theory, resilience theory, linguistics, political economy (rebuild as hard science, of course, on the basis of resource theory, bounded rationality, economic game theory, etc.), cooperative reinforcement learning, etc.

I feel that the design of group intelligence and civilisational intelligence is an under-appreciated area by the AI safety community. Some people do this (Eric Drexler, davidad, the cip.org team, ai.objectives.institute, the Digital Gaia team, and the SingularityNET team, although the latter are less concerned about alignment), but I feel that far more work is needed in this area.

There is also a place for "literary", strategic research, but I think it should mostly concern deployment time of group and civilisational intelligence designs, i.e., the questions of transition from the current governance systems to the next-generation, computation and AI-assisted systems.

Also, operations and evolutionary time concerns of everything (AI systems, systems of AIs, human+AI groups, civilisation) seem to be under-appreciated and under-researched: alignment is not a "problem to solve", but an ongoing, manufacturing-time and operations-time process.

Peter S. Park @ 2023-05-02T16:14 (+4)

Thank you so much for your insightful and detailed list of ideas for AGI safety careers, Richard! I really appreciate your excellent post.

I would propose explicitly grouping some of your ideas and additional ones under a third category: “identifying and raising public awareness of AGI’s dangers.” In fact, I think this category may plausibly contain some of the most impactful ideas for reducing catastrophic and existential risks, given that alignment seems potentially difficult to achieve in a reasonable period of time (if ever) and the implementation of governance ideas is bottlenecked by public support.

For a similar argument that I found particularly compelling, please check out Greg Colbourn’s recent post: https://forum.effectivealtruism.org/posts/8YXFaM9yHbhiJTPqp/agi-rising-why-we-are-in-a-new-era-of-acute-risk-and

richard_ngo @ 2023-05-02T16:26 (+13)

I don't actually think the implementation of governance ideas is mainly bottlenecked by public support; I think it's bottlenecked by good concrete proposals. And to the extent that it is bottlenecked by public support, that will change by default as more powerful AI systems are released.

Akash @ 2023-05-02T17:02 (+14)

I don't actually think the implementation of governance ideas is mainly bottlenecked by public support; I think it's bottlenecked by good concrete proposals. And to the extent that it is bottlenecked by public support, that will change by default as more powerful AI systems are released.

I appreciate Richard stating this explicitly. I think this is (and has been) a pretty big crux in the AI governance space right now.

Some folks (like Richard) believe that we're mainly bottlenecked by good concrete proposals. Other folks believe that we have concrete proposals, but we need to raise awareness and political support in order to implement them.

I'd like to see more work going into both of these areas. On the margin, though, I'm currently more excited about efforts to raise awareness [well], acquire political support, and channel that support into achieving useful policies. 

I think this is largely due to (a) my perception that this work is largely neglected, (b) the fact that a few AI governance professionals I trust have also stated that they see this as the higher priority thing at the moment, and (c) worldview beliefs around what kind of regulation is warranted (e.g., being more sympathetic to proposals that require a lot of political will).

richard_ngo @ 2023-05-02T18:26 (+15)

I can see a worldview in which prioritizing raising awareness is more valuable, but I don't see the case for believing "that we have concrete proposals". Or at least, I haven't seen any; could you link them, or explain what you mean by a concrete proposal?

My guess is that you're underestimating how concrete a proposal needs to be before you can actually muster political will behind it. For example, you don't just need "let's force labs to pass evals", you actually need to have solid descriptions of the evals you want them to pass.

I also think that recent events have been strong evidence in favor of my position: we got a huge amount of political will "for free" from AI capabilities advances, and the best we could do with it was to push a deeply flawed "let's all just pause for 6 months" proposal.

Akash @ 2023-05-02T19:37 (+20)

Clarification: I think we're bottlenecked by both, and I'd love to see the proposals become more concrete. 

Nonetheless, I think proposals like "Get a federal agency to regulate frontier AI labs like the FDA/FAA" or even "push for an international treaty that regulates AI in a way that the IAEA regulates atomic energy" are "concrete enough" to start building political will behind them. Other (more specific) examples include export controls, compute monitoring, licensing for frontier AI models, and some others on Luke's list

I don't think any of these are concrete enough for me to say "here's exactly how the regulatory process should be operationalized", and I'm glad we're trying to get more people to concretize these. 

At the same time, I expect that a lot of the concretization happens after you've developed political will. If the USG really wanted to figure out how to implement compute monitoring, I'm confident they'd be able to figure it out. 

More broadly, my guess is that we might disagree on how concrete a proposal needs to be before you can actually muster political will behind it, though. Here's a rough attempt at sketching out three possible "levels of concreteness". (First attempt; feel free to point out flaws). 

Level 1, No concreteness: You have a goal but no particular ideas for how to get there. (e.g., "we need to make sure we don't build unaligned AGI")

Level 2, Low concreteness: You have a goal with some vagueish ideas for how to get there (e.g., "we need to make sure we don't build unaligned AGI, and this should involve evals/compute monitoring, or maybe a domestic ban on AGI projects and a single international project). 

Level 3, Medium concreteness: You have a goal with high-level ideas for how to get there. (e.g., "We would like to see licensing requirements for models trained above a certain threshold. Still ironing out whether or not that threshold should be X FLOP, Y FLOP, or $Z, but we've got some initial research and some models for how this would work.)

Level 4, High concreteness: You have concrete proposals that can be debated. (e.g., We should require licenses for anything above X FLOP, and we have some drafts of the forms that labs would need to fill out.)

I get the sense that some people feel like we need to be at "medium concreteness" or "high concreteness" before we can start having conversations about implementation. I don't think this is true.

Many laws, executive orders, and regulatory procedures have vague language (often at Level 2 or in-between Level 2 and Level 3). My (loosely-held, mostly based on talking to experts and reading things) sense quite common for regulators to be like "we're going to establish regulations for X, and we're not yet exactly sure what they look like. Part of this regulatory agency's job is going to be to figure out exactly how to operationalize XYZ."

I also think that recent events have been strong evidence in favor of my position: we got a huge amount of political will "for free" from AI capabilities advances, and the best we could do with it was to push a deeply flawed "let's all just pause for 6 months" proposal.

I don't think this is clear evidence in favor of the "we are more bottlenecked by concrete proposals" position. My current sense is that we were bottlenecked both by "not having concrete proposals" and by "not having relationships with relevant stakeholders."

I also expect that the process of concretizing these proposals will likely involve a lot of back-and-forth with people (outside the EA/LW/AIS community) who have lots of experience crafting policy proposals. Part of the benefit of "building political will" is "finding people who have more experience turning ideas into concrete proposals."

Peter S. Park @ 2023-05-02T17:26 (+4)

Richard, I hope you turn out to be correct that public support for AI governance ideas will become less of a bottleneck as more powerful AI systems are released!

But I think it is plausible that we should not leave this to chance. Several of the governance ideas you have listed as promising (e.g., global GPU tracking, data center monitoring) are probably infeasible at the moment, to say the least. It is plausible that these ideas will only become globally implementable once a critical mass of people around the world become highly aware of and concerned about AGI dangers.

This means that timing may be an issue. Will the most detrimental of the AGI dangers manifest before meaningful preventative measures are implemented globally? It is plausible that before the necessary critical mass of public support builds up, a catastrophic or even existential outcome may already have occurred. It would then be too late.

The plausibility of this scenario is why I agree with Akash that identifying and raising public awareness of AGI’s dangers is an underrated approach.

Vasco Grilo @ 2023-05-03T13:16 (+2)

Thanks for sharing!

If such a model is a strong success it may shift my credences from, say, 25% to 75% in a given proposition. But that’s only a factor of 3 difference, whereas one plan for how to solve governance could be one or two orders of magnitude more effective than another.

Do you have any thoughts on the value of models to determine the effectiveness of plans to solve governance?

Lizka @ 2023-05-02T18:22 (+2)

I really appreciate specific career advice from people working in relevant jobs and the ideas and considerations outlined here, and am curating the post. (I'm also really interested in the discussion happening here.)

Personal highlights (note that I'm interested in hearing disagreement with these points!): 

Neel Nanda @ 2023-05-02T20:01 (+2)

"You’ll need to get hands-on. The best ML and alignment research engages heavily with neural networks (with only a few exceptions). Even if you’re more theoretically-minded, you should plan to be interacting with models regularly, and gain the relevant coding skills. In particular, I see a lot of junior researchers who want to do “conceptual research”. But you should assume that such research is useless until it cashes out in writing code or proving theorems, and that you’ll need to do the cashing out yourself (with threat modeling being the main exception, since it forces a different type of concreteness). ..."

This seems strongly true to me

Lizka @ 2023-05-02T20:22 (+3)

Yeah, I agree on priors & some arguments about feedback loops, although note that I don't really have relevant experience. But I remember hearing someone try to defend something like the opposite claim to me in some group setting where I wasn't able to ask the follow-up questions I wanted to ask — so now I don't remember what their main arguments were and don't know if I should change my opinion.

richard_ngo @ 2023-05-02T20:49 (+7)

I expect a bunch of more rationalist-type people disagree with this claim, FWIW. But I also think that they heavily overestimate the value of the types of conceptual research I'm talking about here.

Arthur Conmy @ 2023-05-03T14:11 (+1)

CC https://www.lesswrong.com/posts/fqryrxnvpSr5w2dDJ/touch-reality-as-soon-as-possible-when-doing-machine that expands on "hands-on" experience in alignment. 

I don't know of any writing that directly contradicts these claims. I think https://www.lesswrong.com/s/v55BhXbpJuaExkpcD/p/3pinFH3jerMzAvmza indirectly contradicts these claims as it broadly criticizes most empirical approaches and is more open to conceptual approaches.

Denis @ 2023-06-26T11:33 (+1)

Loved reading this post, as a person considering working in AI Safety, this is a great resource and answers many questions, including some I hadn't thought of answering. Thanks so much for writing this! 

One question: I am curious to hear anyone's perspective on the following "conflict": 

Point 1: "There is a specific skill of getting things done inside large organizations that most EAs lack (due to lack of corporate experience, plus lack of people-orientedness), but which is particularly useful when pushing for lab governance proposals. If you have it, lab governance work may be a good fit for you."

Point 2: "You need to get hands on" and, related: "Coding skill is a much more important prerequisite, though." 

There may be exceptions, but I would guess (partly based on my own experience) that the kind of people who have a lot of experience getting things done in large organisations typically do not spend much time coding ML models. 

And yet, as I say, I believe both of these are necessary. If I want to influence a major AI / ML company, I will lack credibility in their eyes if I have no experience working with and in large organisations. But I will also lack credibility if I don't have an in-depth understanding of the models and an ability to discuss them specifically rather than just abstractly. 

Specific question: What might the typical learning curve be for the second aspect, to get to the point where I could get hands on with models? My starting point would be having studied FORTRAN in college (!! - yes, that long ago!) and only having one online course of Python. There may be others with different starting points. 

I suppose. ultimately, it still seems likely that it would be quicker even for a total novice to coding to reach some level of meaningful competence than for someone with no experience of organisations to become expert in how decisions are made and plans are approved or rejected, and how to influence this. 

Also are there good online courses anyone would recommend? 

richard_ngo @ 2023-06-26T21:41 (+3)

One question: I am curious to hear anyone's perspective on the following "conflict": 

The former is more important for influencing labs, the latter is more important for doing alignment research.

And yet, as I say, I believe both of these are necessary.

FWIW when I talk about the "specific skill", I'm not talking about having legible experience doing this, I'm talking about actually just being able to do it. In general I think it's less important to optimize for having credibility, and more important to optimize for the skills needed. Same for ML skill—less important for gaining credibility, more important for actually just figuring out what the best plans are.

Also are there good online courses anyone would recommend? 

See the resources listed here.

Denis @ 2023-06-28T17:19 (+1)

Thanks Richard, This is clear now. 

And thank you (and others) for sharing the resources link - this indeed looks like a fantastic resource. 

Denis