(Even) More Early-Career EAs Should Try AI Safety Technical Research

By tlevin @ 2022-06-30T21:14 (+86)

[May 2023 edit: I no longer endorse the conclusion of this post and think that most early-career EAs should be trying to contribute to AI governance, either via research, policy work, advocacy, or other stuff, with the exception of people with outlier alignment research talent or very strong personal fits for other things. I do stand by a lot of the reasoning, though, especially of the form "pick the most important thing and only rule it out if you have good reasons to think personal fit differences outweigh the large differences in importance." The main thing that changed was that governance and policy stuff now seems more tractable than I thought, while alignment research seems less tractable.]

AI Safety Technical Research (AISTR) remains an extremely neglected career path, with only 100-200 people doing full-time AISTR.[1] While that number might grow substantially in the next few years as the current generation of EAs in college makes career choices, I believe the number will still be shockingly low in a few years (as an upper bound, less than 1,000).[2]

This is the case despite a large and growing share of EAs who agree in theory that AI x-risk is the most important problem and years of 80,000 Hours strongly advocating AISTR careers. In my experience, even in settings where this view is a strong consensus (and where quantitative skills are abundant), I find that a lot of people who seem like they could be good at AISTR are not planning on doing so — and have not given AISTR a serious try. I think this is mostly for misguided reasons, and there are benefits to studying AI safety even if you don’t go into AISTR.

I claim that since AISTR seems like the highest-EV path for ambitious and smart people early in their careers, most the plurality of early-career EAs should at least try this path for a few months (or until they get good firsthand information that they aren’t suited for it). Trying it for a few months might mean reading this starter pack, taking an AI course, doing EA Cambridge’s Technical Alignment Curriculum, and/or trying to write up some distillations of existing research. For more, see "How to pursue a career in technical AI alignment."[3] (Importantly, if at any point in this early exploration you find yourself completely bored and miserable, you can simply quit there!)

The best reason not to do AISTR, if my core claims are right, is that community-building, (and maybe massively scalable projects, or policy) could be even higher-impact, depending on your personal fit for those paths. However, I argue that most community-builders and policy professionals should build some fairly deep basic familiarity with the field in order to do their jobs effectively. [Edit: following a comment from Locke_USA, I want to tone down some of the claims relating to policy careers. Locke points out that AI policy research is also extremely neglected; in my own research I have found that there are lots of really important questions that could use attention including from people without much technical background. Relatedly, they point out that getting deep technical knowledge has opportunity costs in building policy-related knowledge and skills, and while technical credentials like a CS degree could be useful in policy careers, self-studying ML for a few months could actually be negative. I think these are strong points, and I've modified a few claims above and below via strikethrough.[4]]

Epistemic status: I’m pretty confident (c. 90%) that the optimal number of EAs planning on doing AISTR is higher than the status quo, less confident (c. 70%) that >50% of early-career EAs should try it, mostly because the “super-genius” thing discussed below seems plausible.

The core case for trying

The core case is very simple, so I’ll just state it here, note a couple other advantages, and explore the ways it could be wrong or not apply to an individual case:

  1. You might be very good at a lot of different things, but it is hard to know whether you could be very good at a specific thing until you try it. I think the kinds of EAs who might get really good at policy or philosophy stuff tend to be good, model-based thinkers in general, which means they have some chance of being able to contribute to AISTR.
  2. So, you should at least try the thing that, if you were very good at it, you would have the highest EV. At a first approximation, it seems like the differences between the impact of various careers are extremely large, so an impact-maximizing heuristic might mean something like “go down the list of highest-upside-if-true hypotheses about what you could be good at, in order.”
  3. That thing is AISTR. AI seems potentially very powerful, and it seems like (in part because so few people are trying) we haven’t made tons of progress on making sure AI systems do what we want. While other careers in the AI space — policy work, community-building, ops/infrastructure, etc. — can be very highly impactful, that impact is predicated on the technical researchers, at some point, solving the problems, and if a big fraction of our effort is not on the object-level problem, this seems likely to be a misallocation of resources.[5]

These three together should imply that if you think you might be very good at AISTR — even if this chance is small — you should give it a shot.

It also seems like technical-adjacent roles could be very impactful, i.e., people can make valuable contributions without being the ones solving very hard math problems. As John Wentworth writes: “Many technical alignment researchers are bad-to-mediocre at writing up their ideas and results in a form intelligible to other people. And even for those who are reasonably good at it, writing up a good intuitive explanation still takes a lot of work, and that work lengthens the turn-time on publishing new results.” Being technically fluent but not a “super-genius” (see below) would mean you could significantly improve communication around the field about new alignment concepts, which seems incredibly high-EV.

Other reasons to try

If your other career plans involve things related to AI, like AI policy/strategy or EA community-building, having an inside view on AI safety seems really important, and taking some first steps on alignment research seems like a great way to start building one.

AI Policy/Strategy

For people who eventually decide to do AI policy/strategy research, early exploration in AI technical material seems clearly useful, in that it gives you a better sense of how and when different AI capabilities might develop and helps you distinguish useful and “fake-useful” AI safety research, which seems really important for this kind of work. (Holden Karnofsky says “I think the ideal [strategy] researcher would also be highly informed on, and comfortable with, the general state of AI research and AI alignment research, though they need not be as informed on these as for the previous section [about alignment].)

Even people who decide to follow a path like “accumulate power in the government/private actors to spend it at critical AI junctures,” it seems very good to develop your views about timelines and key inputs; otherwise, I am concerned that they will not be focused on climbing the right ladders or will not know who to listen to. Spending a few months really getting familiar with the field, and then spending a few hours a week staying up to date, seems sufficient for this purpose. I'm convinced by Locke_USA's comment that for applied policy research or government influence, the optimal allocation of your time is very unlikely to include months of AISTR training, especially outside traditional contexts like CS coursework in college. 

Community-Building

First of all, a huge part of a community-builder’s job is identifying the most promising people and accelerating their path to high impact as quickly as possible. In the context of AI alignment (which I claim is the most important cause area in which to build community), this involves the skill of identifying when someone has just said an interesting thought about alignment, which means you probably need to be conversant on some level in alignment research. More generally, a pair of recent forum posts from Emma Williamson and Owen Cotton-Barratt emphasize the benefits of community-builders spending time getting good at object-level things, and AI safety seems like an obvious object-level thing to spend some time on, for the above reasons.

It also seems like community-builders being better informed on AISTR would make us a more appealing place for people who could do very good AISTR work, whom we want to attract. Communities, as a rule, attract people who like doing what people in the community do; if the community primarily talks about EA in fellowship-style settings and strategizes about outreach, it’ll mostly attract people who like to talk about EA. If the community is full of people spending significant amounts of time learning about AI, it will instead attract people who like to do that, which seems like a thing that reduces existential risks from AI.[6] I am, consequently, very bullish on the Harvard AI Safety Team that Alexander Davies started this past spring as a community-building model.

Broader worldview-building benefits

Finally, the path towards trying AISTR yields benefits along the way. Learning some coding is broadly useful, and learning some ML seems like it builds some good intuitions about human cognition. (My evidence for this is just the subjective experience of learning a little bit about ML and it producing interesting thoughts about the human learning process, which was helpful as I wrote curricula and lesson plans for EA fellowships and a formal academic course.) Also, insofar as learning about the technical alignment field builds your sense of what the future might look like (for the reasons articulated in the policy/strategy section), it could just help you make better long-run decisions. Seems useful.

Uncertainties and caveats

Reasons this could be wrong, or at least not apply to an individual person, follow from each of the three core claims:

  1. Some people already know they will not be very good at it. In my understanding, people in the field disagree about the value of different skills, but it seems like you should probably skip AISTR if you have good evidence that your shape rotator quantitative abilities aren’t reasonably strong — e.g., you studied reasonably hard for SAT Math but got <700 (90th percentile among all SAT takers).[7] [Edit: comments from Linch and Lukas have convinced me that "shape rotator" is a counterproductive term here, so I've replaced it with "quantitative."]
    Other traits that should probably lead you to skip the AISTR attempt: you’re already in a position where the opportunity cost would be very high, you have good reason to think you could be exceptionally good at another priority path, or you have good reason to think you are ill-suited for research careers. But I emphasize good reason because I think people’s bar for reaching this conclusion is too low.[8]
  2. a.) Being world-class at a less important thing could be more impactful than being very good at the most important thing. People in the AISTR field also seem to disagree about the relative value of different skill levels. That is, some people think we need a handful of Von Neumann-style “super-geniuses” to bring about paradigm shifts and “mere geniuses” will just get in the way and should work on governance or other issues instead. Others, e.g. at Redwood Research and Anthropic, seem to think that “mere geniuses” doing ML engineering, interpretability research, “AI psychology,” or data labeling are an important part of gaining traction on the problem. If the former group are right, then merely being “very good” at alignment — say, 99th percentile — is still basically useless. And since you probably already know whether you’re a super-genius, based on whether you’re an IMO medalist or something, non-super geniuses would therefore already have good reason to think they won’t be useful (see point 1 above).
    But my sense is that there’s enough probability that Redwood et al are right, or that skills like distillation are important and scarce enough, that “mere geniuses” should give it a try, and I think there are probably “mere geniuses” who don’t know who they are (including for racial/gendered reasons).
    b.) You might produce more total AISTR-equivalents by community-building. The classic “multiplier effect” case: if you spend 5% of your career on community-building, you only need to produce 0.05 “you-equivalents” for this to be worth it. I hope to address this in a later post, but I think it’s actually much harder for committed EAs to produce “you-equivalents” than it seems. First of all, impact is fat-tailed, so even if you counterfactually get, say, three people to “join EA,” this doesn’t mean you’ve found 3 “you-equivalents” or even 0.05. Secondly, you might think of spending a year community-building now as equivalent to giving up the last year of your impact, not the first, so the bar could be a lot higher than 0.05 you-equivalents.[9]
  3. AISTR might not be the most important thing. You could think governance is even more important than technical research (e.g. political lock-in is a bigger problem than misaligned AI), that longtermism is wrong, or that biorisk is more pressing. This is well beyond the scope of this post; I will just say that I don’t agree for the reasons listed in “the core case for trying” above.
    Edit: both in light of Locke's comment and to be consistent with my earlier claim about massively scalable projects, I should say that if you could be great at other neglected kinds of research, at influencing key decisions, or at management, this could definitely outweigh the importance/neglectedness of AISTR. I don't want to convince the next great EA policy researcher that they're useless if they can't do AISTR. My intention with this post is mostly to convince people that if you're starting from a prior that career impact is very fat-tailed, it takes good evidence from personal fit (see #1 above) to move to the second- or third-most (etc.) most impactful careers. As I argue in #2 of the "core case for trying," based on this fait-tailed-ness, I think the best career search process starts at the top and proceeds downward with good evidence, but the default process is more likely to look like the socio-emotional thing I describe below, followed by rationalizing this choice.

So, why are so few EAs doing AISTR?

I think EAs overrate how many other EAs are doing AI safety technical research because the few hundred people who are doing AISTR tend to be disproportionately visible in the community (perhaps because being super enmeshed in the EA/longtermist community is a strong factor in deciding to do AISTR). 

This leads EAs who aren’t already doing AISTR to think, “My comparative advantage must not be in AISTR, since there are so many people way smarter than me [which sometimes means ‘people fluent in concepts and vocabulary that I might be able to learn with a couple weeks of focused effort’] who are doing technical research, but at least I can hang in policy and philosophy discussions.” This kind of comparative identity seems to be very powerful in shaping what we think we can or can’t do well. I observe that we prefer to spend more time and focus on things we’re already pretty good at compared to the people around us, probably as a social-status and self-esteem thing. (This is part of a broader observation that career decisions among high-achieving students are primarily identity-driven, social, and emotional, and highly responsive to status incentives.)

But reactive identity formation is a very bad way of making career decisions if your goal is to positively affect the world. Adding 0.01% of quality-adjusted work to AISTR is worth more than adding 0.1% of quality-adjusted policy/philosophy work if AISTR is >10x more valuable. It might mean a somewhat lower place on various totem poles, but I hope our egos can take that hit to reduce the odds that we go extinct.

Thanks to Alexander Davies, Nikola Jurkovic, Mauricio, Rohan Subramani, Drake Thomas, Sarthak Agrawal, Thomas Kwa, and Caleb Parikh for suggestions and comments, though they do not necessarily endorse all the claims herein (and may not have read the final version of the post). All mistakes are my own.

Appendix by Aaron Scher: How To Actually Try AISTR 

Trevor’s note: this is, in Aaron’s words, “Aaron Scher’s opinion, highly non-expert”; I’m including this more as a plausible path rather than a proposed default. Another idea I’ve heard from MIRI folks that I like is “write a solution to the alignment problem, right now, with your current state of knowledge, and then get feedback on why it doesn’t work, which will likely direct you to learning the relevant math and statistics and CS, and iterate this until you start having good novel insights.” Finally, you might just try building overall software engineering skills as a first step, which seems like a good career fit for some people anyway. With any of these, your mileage may vary.

Anyway, here’s Aaron’s suggestion:

  1. ^

    This order-of-magnitude estimate aggregates a few sources: “fewer than 100” as of 2017, “around 50” as of 2018, 430 on the EA survey working on AI (though this includes governance people), and a claim that Cambridge’s AI Safety Fundamentals Technical Track had “close to 100 facilitators” (but most of them don’t actually work in AISTR).

  2. ^

    At EAGxBoston, 89 out of ~970 attendees listed “AI Safety Technical Research” as an “Area of Expertise.” But attendees seem to interpret the term “expertise” loosely. The alphabetically first 5 of these included 4 undergraduates (including one whom I know personally and does not plan to go into AISTR) and 1 PhD student. Naively, if there are 10,000 EAs and EAGxBoston was a representative sample by issue area, this would imply ~700 people interested in AISTR in EA, including undergrads but excluding non-EA researchers, and we should then downweight since AISTR people seem disproportionately likely to go to EAGx conferences (see the “Why Are So Few EAs Doing AISTR?” section) and many of these will probably do something else full-time.

  3. ^

    Thanks to Michael Chen's comment for suggesting this.

  4. ^

    Please do not take this to mean that I don't find any of the other comments persuasive! Several raise good objections to specific claims, but this is the only one that I think actually forces a modification of the core argument of the post.

  5. ^

    Much of this language comes from an excellent comment by Drake Thomas.

  6. ^

    This is largely inspired by Eli Tyre's critiques of EA-community-building-as-Ponzi-scheme.

  7. ^

    Note the "e.g." — I know this is an imperfect proxy, possibly neither precise nor accurate. That is, I imagine some people in the field would say >>90th percentile shape rotator skills are probably necessary. On the other hand, as Neel Nanda pointed out to me, some people might get lower SAT math scores but still be useful AISTRs; maybe you have executive function issues (in which case, see if you have ADHD!), or maybe you’re excellent in some other way such that you could be a great software engineer but aren’t great at visuospatial reasoning. As roon points out, “world-famous programmer of ReactJS Dan Abramov” admits to not being able to rotate a cube in his head. Seems like it would be pretty useful to establish what the lower bounds of math ability are; in the words of many a great scholar, further research is needed.

  8. ^

    For example, I think my own reasoning for going into policy, “I am very interested in policy, am a good writer and talker, and am a US citizen,” was insufficient to skip this stage, but if I had gotten rapidly promoted in a competitive policy job that would have been pretty good. Likewise, an identity like “I’m a people person” is a bad reason not to do research; an experience like working on a long project with weak feedback mechanisms and hating it for those reasons is more convincing.

  9. ^

    My model for policy careers is that delaying by one year costs 10-20% of your lifetime impact (assuming roughly 20-year AI timelines, over which time you get ~4 promotions that ~3x your impact). Maybe AISTR careers are less back-loaded, in which case the number would be lower.

  10. ^

    Note, however, that some people just “bounce off” ELK or other angles on AISTR but find other angles really interesting.


Benjamin_Todd @ 2022-07-02T10:58 (+25)

Great post!

Just a minor thing, but this might be aggressive:

My model for policy careers is that delaying by one year costs 10-20% of your lifetime impact (assuming roughly 20-year AI timelines, over which time you get ~4 promotions that ~3x your impact). Maybe AISTR careers are less back-loaded, in which case the number would be lower.

This model only works if you have 100% probability on AI in 20 year, but for most people it's their median, so there's still a 50% chance it happens in the second half of your career – and that scenario adds a lot of EV because your productivity will be so high at that point, so it's probably not wise to ignore it.

My guess is it's better to model it as a discount rate, where there's an x% chance of AI happening each year. To get a 50% chance of AI in 20 years (and all future efforts being moot), that's something like a 3.5% annual chance. Or to get a 50% chance in 10 years,  that's something like a 6.5% annual chance.

When I try to model  with a 5% chance annual chance of AI, with your productivity increasing 30% per year to a peak of 100x, I get the cost of a one delay to be 6.6% of lifetime impact.

So that's high but significantly lower than 10-20%.

It's worth taking a 1 year delay if you can increase your career capital (i.e. future productivity) by ~7%, or explore and find a path that's 7% better – which might be pretty achievable.

levin @ 2022-07-02T23:49 (+2)

Thanks for this -- the flaw in using the point estimate of 20-year timelines (and on the frequency and value of promotions) in this way occurred to me, and I tried to model it with guesstimate I got values that made no sense and gave up. Awesome to see this detailed model and to get your numbers!

That said, I think the 5% annual chance is oversimple in a way that could lead to wrong decisions at the margin for the trade-off I have in mind, which is "do AI-related community-building for a year vs. start policy career now." If you think the risk is lower for the next decade or so before rising in the 2030s, which I think is the conventional wisdom, then the 5% uniform distribution incorrectly discounts work done between now and the 2030s. This makes AI community-building now, which basically produces AI technical research starting in a few years, look like a worse deal than it is and biases towards starting the policy career. 

Benjamin_Todd @ 2022-07-03T12:32 (+3)

Agree there could be issues like that. 

My sheet is set up so you can change the discount rate for the next 10 years. If I switch it to 2% for the next 10yr, and then make it 7% after that, the loss increases to 8% rather than 7%.

I should maybe also flag that you can easily end up with different discount rates for movement building and direct work, and I normally suggest treating them separately. The value of moving forward a year of movement building labour depends on your model of future movement growth – if EA will hit a plateau at some point, the value comes from getting to the plateau sooner; whereas if EA will still be exponentially growing when an xrisk happens, then the value is (very approximately) the EA growth rate in the year where our efforts become moot.

Locke_USA @ 2022-07-03T17:04 (+24)

From a policy perspective, I think some of the claims here are too strong.

This post lays out some good arguments in favor of AISTR work but I don't think it's super informative about the comparative value of AISTR v. other work (such as policy), nor does it convince me that spending months on AISTR-type work is relevant even for policy people who have high opportunity costs and many other things they could (/need to) be learning. As Linch commented: "there just aren't that many people doing longtermist EA work, so basically every problem will look understaffed, relative to the scale of the problem".

Based on my experience in the field for a few years, a gut estimate for people doing relevant, high-quality AI policy work is in the low dozens (not counting people who do more high-level/academic "AI strategy" research, most of which is not designed for policy-relevance). I'm not convinced that on the current margin, for a person who could do both, the right choice is to become the 200th person going into AISTR than the 25th person going into applied AI policy.

Specific claims:

For people who eventually decide to do AI policy/strategy research, early exploration in AI technical material seems clearly useful, in that it gives you a better sense of how and when different AI capabilities might develop and helps you distinguish useful and “fake-useful” AI safety research, which seems really important for this kind of work. (Holden Karnofsky says “I think the ideal [strategy] researcher would also be highly informed on, and comfortable with, the general state of AI research and AI alignment research, though they need not be as informed on these as for the previous section [about alignment].)

Re: the Karnofsky quote, I also think there's a big difference between "strategy" and "policy". If you're doing strategy research to inform e.g. OP's priorities, that's pretty different from doing policy research to inform e.g. the US government's decision-making. This post seems to treat them as interchangeable but there's a pretty big distinction. I myself do policy so I'll focus on that here.

For policy folks, it might be useful to understand but I think in many cases the opportunity costs are too high. My prior (not having thought about this very much) is that maybe 25% of AI policy researchers/practitioners should spend significant time on this (especially if their policy work is related to AI safety R&D and related topics), so that there are people literate in both fields. But overall it's good to have a division of labor. I have not personally encountered a situation in 3+ years of policy work (not on AI safety R&D) where being able to personally distinguish between "useful and fake-useful" AI safety research would have been particularly helpful. And if I did, I would've just reached out to 3-5 people I trust instead of taking months to study the topic myself.

I argue that most ... policy professionals should build some fairly deep familiarity with the field in order to do their jobs effectively

Per the above, I think "most" is too strong. I also think sequencing matters here. I would only encourage policy professionals to take time to develop strong internal models of AISTR once it has proven useful for their policy work, instead of assuming ex ante (before someone starts their policy career) that it will provide useful and so should/could be done beforehand. There are ways to upskill in technical fields while you're already in policy.

Even people who decide to follow a path like “accumulate power in the government/private actors to spend it at critical AI junctures,” it seems very good to develop your views about timelines and key inputs; otherwise, I am concerned that they will not be focused on climbing the right ladders or will not know who to listen to. Spending a few months really getting familiar with the field, and then spending a few hours a week staying up to date, seems sufficient for this purpose.

Again, division of labor and a prudent approach to deference can get you the benefits without these opportunity costs. I also think that in many cases it's simply not realistic to expect successful policy professionals to spend "a few hours a week staying up to date" with an abstract and technical literature. Every week the things that I want to read and have some chance of being decision/strategy-relevant is already >3x as long as what I have time for.

While other careers in the AI space — policy work ... — can be very highly impactful, that impact is predicated on the technical researchers, at some point, solving the problems, and if a big fraction of our effort is not on the object-level problem, this seems likely to be a misallocation of resources.

I think this assumes a particular risk model from AI that isn't the only risk model. Unless I am misreading you, this assumes policy success looks like getting more funding for AI technical research. But it could also look like affecting the global distribution of AI capabilities, slowing down/speeding up general AI progress, targeting deliberate threat actors (i.e. "security" rather than "safety"), navigating second-order effects from AI (e.g. destabilizing great power relations or nuclear deterrence) rather than direct threats from misaligned AI, and many other mechanisms. Another reason to focus on policy careers is that you can flexibly pivot between problems depending on how our threat models and prioritization evolve (e.g. doing both AI and bio work at the same time, countering different AI threat models at different times).

levin @ 2022-07-03T18:16 (+3)

Thanks, Locke, this is a series of great points. In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims. Likewise, the points about the relative usefulness of spending time learning technical stuff are well taken, though I think I put more value on technical understanding than you do; for example, while of course policy professionals can ask people they trust, they have to somehow be able to assess the judgment of these people on the object-level thing. Also, while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration. Relatedly, it seems maybe easier to do costly fit-tests (like taking a first full time job) in technical research and switch to policy than vice versa. Edit: for the final point about risk models, I definitely don't have state funding for safety research in mind; what I mean is that since I think it's very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved. I think there are many things governments and private decision-makers can do to improve the chances this happens before AGI, which is why I'm still planning on pursuing a governance career!

Locke_USA @ 2022-07-03T20:00 (+1)

Thanks!

In particular, the point about even fewer people (~25) doing applied policy work is super important, to the extent that I think I should edit the post to significantly weaken certain claims.

I appreciate you taking this seriously! I do want to emphasize I'm not very confident in the ~25 number, and I think people with more expansive definitions of "policy" would reach higher numbers (e.g. I wouldn't count people at FHI as doing "policy" work even if they do non-technical work, but my sense is that many EAs lump together all non-technical work under headings such as "governance"/"strategy" and implicitly treat this as synonymous with "policy"). To the extent that it feels crux-y to someone whether the true "policy" number is closer to 25 or 50 or 75, it might be worth doing a more thorough inventory. (I would be highly skeptical of any list that claims it's >75, if you limit it to people who do government policy-related and reasonably high-quality work, but I could be wrong.)

while I think the idea of technical people being in short supply and high demand in policy is generally overrated, that seems like it could be an important consideration

I certainly agree that sometimes technical credentials can provide a boost to policy careers. However, that typically involves formal technical credentials (e.g.a CS graduate degree), and "three months of AISTR self-study" won't be of much use as career capital (and may even be a negative if it requires you to have a strange-looking gap on your CV or an affiliation with a weird-looking organization). A technical job taken for fit-testing purposes at a mainstream-looking organization could indeed, in some scenarios, help open doors to/provide a boost for some types of policy jobs. But I don't think that effect is true (or large) often enough for this to really reduce my concerns about opportunity costs for most individuals.

I definitely don't have state funding for safety research in mind; what I mean is that since I think it's very unlikely that policy permanently stops AGI from being developed, success ultimately depends on the alignment problem being solved.

This question is worth a longer investigation at some point, but I don't see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow "solve" the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers). In which case one could flip your argument: since it's very unlikely AGI-related risks will be addressed solely through technical means, success ultimately depends on having thoughtful people in government working on these problems. In reality, for safety-related risks, both technical and policy solutions are necessary.

And for security-related risk models, i.e. bad actors using powerful AI systems to cause deliberate harm (potentially up to existential risks in certain capability scenarios), technical alignment research is neither a necessary nor a sufficient part of the solution, but policy is at least necessary. (In this scenario, other kinds of technical research may be more important, but my understanding is that most "safety" researchers are not focused on those sorts of problems.)

levin @ 2022-07-03T22:32 (+1)

Edited the post substantially (and, hopefully, transparently, via strikethrough and "edit:" and such) to reflect the parts of this and the previous comment that I agree with.

Regarding this:

I don't see many risk scenarios where a technical solution to the AI alignment problem is sufficient to solve AGI-related risk. For accident-related risk models (in the sense of this framework) solving safety problems is necessary. But even when technical solutions are available, you still need all relevant actors to adopt those solutions, and we know from the history of nuclear safety that the gap between availability and adoption can be big — in that case decades. In other words, even if technical AI alignment researchers somehow "solve" the alignment problem, government action may still be necessary to ensure adoption (whether government-affiliated labs or private sector actors are the developers).

I’ve heard this elsewhere, including at an event for early-career longtermists interested in policy where a very policy-skeptical, MIRI-type figure was giving a Q&A. A student asked: if we solved the alignment problem, wouldn’t we need to enforce its adoption? The MIRI-type figure said something along the lines of:

“Solving the alignment problem” probably means figuring out how to build an aligned AGI. The top labs all want to build an aligned AGI; they just think the odds of the AGIs they’re working on being aligned are much higher than I think they are. But if we have a solution, we can just go to the labs and say, here, this is how you build it in a way that we don’t all die, and I can prove that this makes us not all die. And if you can’t say that, you don’t actually have a solution. And they’re mostly reasonable people who want to build AGI and make a ton of money and not die, so they will take the solution and say, thanks, we’ll do it this way now.

So, was MIRI-type right? Or would we need policy levers to enforce adoption of the problem, even in this model? The post you cite chronicles how long it took for safety advocates to address glaring risks in the nuclear missile system. My initial model says that if the top labs resemble today’s OpenAI and DeepMind, it would be much easier to convince them than the entrenched, securitized bureaucracy described in the post: the incentives are much better aligned, and the cultures are more receptive to suggestions of change. But this does seem like a cruxy question. If MIRI-type is wrong, this would justify a lot of investigation into what those levers would be, and how to prepare governments to develop and pull these levers. If not, this would support more focus on buying time in the first place, as well as on trying to make sure the top firms at the time the alignment problem is solved are receptive.

(E.g., if the MIRI-type model is right, a US lead over China seems really important: if we expect that the solution will come from alignment researchers in Berkeley, maybe it’s more likely that they implement it if they are private, "open"-tech-culture companies, who speak the same language and live in the same milieu and broadly have trusting relationships with the proponent, etc. Or maybe not!)
 

Locke_USA @ 2022-07-04T13:53 (+7)

I put some credence on the MIRI-type view, but we can't simply assume that is how it will go down. What if AGI gets developed in the context of an active international crisis or conflict? Could not a government — the US, China, the UK, etc. — come in and take over the tech, and race to get there first? To the extent that there is some "performance" penalty to a safety implementation, or that implementing safety measures takes time that could be used by an opponent to deploy first, there are going to be contexts where not all safety measures are going to be adopted automatically. You could imagine similar dynamics, though less extreme, in an inter-company or inter-lab race situation, where (depending on the perceived stakes) a government might need to step in to prevent premature deployment-for-profit.

The MIRI-type view bakes in a bunch of assumptions about several dimensions of the strategic situation, including: (1) it's going to be clear to everyone that the AGI system will kill everyone without the safety solution, (2) the safety solution is trusted by everyone and not seen as a potential act of sabotage by an outside actor with its own interest, (3) the external context will allow for reasoned and lengthy conversation about these sorts of decisions. This view makes sense within one scenario in terms of the actors involved, their intentions and perceptions, the broader context, the nature of the tech, etc. It's not an impossible scenario, but to bet all your chips on it in terms of where the community focuses its effort (I've similarly witnessed some MIRI staff's "policy-skepticism") strikes me as naive and irresponsible.

levin @ 2022-07-04T14:18 (+1)

Agreed; it strikes me that I've probably been over-anchoring on this model

Jay Bailey @ 2022-07-04T01:48 (+2)

You can essentially think of it as two separate problems:

Problem 1: Conditional on us having a technical solution to AI alignment, how do we ensure the first AGI built implements it?

Problem 2: Conditional on us having a technical solution to AI alignment, how do we ensure no AGI is ever built that does NOT implement it, or some other equivalent solution?

I feel like you are talking about Problem 1, and Locke is talking about Problem 2. I agree with the MIRI-type that Problem 1 is easy to solve, and the hard part of that problem is having the solution. I do believe the existing labs working on AGI would implement a solution to AI alignment if we had one. That still leaves Problem 2 that needs to be solved - though at least if we're facing Problem 2, we do have an aligned AGI to help with the problem.

levin @ 2022-07-04T09:48 (+1)

Hmm. I don't have strong views on unipolar vs multipolar outcomes, but I think MIRI-type thinks Problem 2 is also easy to solve, due to the last couple clauses of your comment.

mic @ 2022-07-01T03:55 (+18)

Some quick thoughts:

levin @ 2022-07-01T13:01 (+1)

Thanks for these points, especially the last one, which I've now added to the intro section.

AstaK @ 2022-06-30T21:41 (+14)

Awesome post!

I find this bit interesting:

It seems like you should probably skip AISTR if you have good evidence that your shape rotator abilities aren’t reasonably strong — e.g., you studied reasonably hard for SAT Math but got <700 (90th percentile among all SAT takers).[5]

The above gives a rule of thumb for ruling oneself out of AISTR. On the flip side, I'm curious if you or anyone else has thought hard about what measurable attributes could correlate strongly with high AISTR aptitude, and whether AI safety field-builders can use this to recruit new AISTRs.

(Examples of measurable attributes might be math/informatics olympiad achievement, IQ score, chess rating, etc. Though I wouldn't want anyone to anchor to these suggestions I've just thrown out, I've only just spent 2 minutes thinking about this.)

Gavin @ 2022-06-30T21:28 (+11)

"100-200" is a serious undercount. My two-year-old, extremely incomplete count gave 300. And the field is way bigger now.

(In the footnote you hedge this as being a claim about the order of magnitude, but I think even that might be untrue quite soon. But changing the main claim to "on the order of 200 people" would silence my pedant alarm.)

Here's one promising source.

levin @ 2022-06-30T21:48 (+31)

Hmm, interesting. My first draft said "under 1,000" and I got lots of feedback that this was way too high. Taking a look at your count, I think many of these numbers are way too high. For example:

  • FHI AIS is listed at 34, when the entire FHI staff by my count is 59 and includes lots of philosophers and biosecurity people and the actual AI safety research group is 4, and counting GovAI (where I work this summer [though my opinions are of course my own] and is definitely not AI safety technical research).
  • MIRI is listed at 40, when their "research staff" page has 9 people.
  • CSET is listed at 5.8. Who at CSET does alignment technical research? CSET is a national security think-tank that focuses on AI risks, but is not explicitly longtermist, let alone a hub for technical alignment research!
  • CHAI is listed at 41, but their entire staff is 24, including visiting fellows and assistants.

Should I be persuaded by the Google Scholar label "AI Safety"? What percentage of their time do the listed researchers spend on alignment research, on average?

Zach Stein-Perlman @ 2022-06-30T23:05 (+8)

Agreed (I'm not checking your numbers but this all sounds right [MIRI may have more than 9 but less than 40]). Also AI Impacts, where I'm a research intern, currently/recently has 0–0.2 FTEs on technical AI safety, and I don't think we've ever had more than 1, much less the 7  that Gavin's count says.

Owen Cotton-Barratt @ 2022-07-01T21:08 (+5)

Gavin's count says it includes strategy and policy people, for which I think AI Impacts counts. He estimated these accounted for half of the field then. (But I think should have included that adjustment of 50% when quoting his historical figure, since this post was clearly just about technical work.)

Zach Stein-Perlman @ 2022-07-01T22:08 (+4)

Sure, good points. (But also note that AI Impacts had more like 4 than 7 FTEs in its highest-employment year, I think.)

Gavin @ 2022-07-01T11:38 (+4)

The numbers are stale, and both FHI and MIRI have suffered a bit since. But I'll try and reconstruct the reasoning:

  • FHI: I think I was counting the research scholars, expecting that programme to grow. (It didn't.)
  • MIRI: Connected people telling me that they had a lot more people than the Team page, and I did a manual count. Hubinger didn't blink when someone suggested 20-50 to him.
  • CSET: There definitely were some aligned people at the time, like Ashwin.
  • CHAI: Counted PhD students

 

Yeah, Scholar is a source of leads, not a source of confirmed true-blue people.

Buck @ 2022-07-01T15:16 (+10)

I agree with others that these numbers were way high two years ago and are still way high

Gavin @ 2022-07-03T16:30 (+5)

Happy to defer, though I wish it was deferring to more than one bit of information.

Stefan_Schubert @ 2022-06-30T21:34 (+4)

Do you have a rough estimate of the current size?

Gavin @ 2022-06-30T21:48 (+6)

Not really. I would guess 600, under a definition like "is currently working seriously on at least one alignment project". (And I'm not counting the indirect work which I previously obsessed over.)

Stefan_Schubert @ 2022-06-30T21:54 (+6)

Great, thanks - I appreciate it. I'd love a systematic study akin to the one Seb Farquhar did years back.

https://forum.effectivealtruism.org/posts/Q83ayse5S8CksbT7K/changes-in-funding-in-the-ai-safety-field

levin @ 2022-06-30T22:02 (+3)

With "100-200" I really had FTEs in mind rather than the >1 serious alignment threshold (and maybe I should edit the post to reflect this). What do you think the FTE number is?

Gavin @ 2022-07-01T11:32 (+5)

I wouldn't want my dumb guess to stand with any authority. But: 350?

Linch @ 2022-07-01T20:49 (+9)

if you have good evidence that your shape rotator abilities aren’t reasonably strong — e.g., you studied reasonably hard for SAT Math but got <700 (90th percentile among all SAT takers).[6]

This is really minor, but I think there's a weird conflation of spatial-visual reasoning and mathematical skills in this post (and related memespaces like roon's). This very much does not fit my own anecdotal experiences*, and I don't think this is broadly justified in psychometrics research. 

*FWIW, I've bounced off of AISTR a few times. I usually attribute the difficulty being that I broadly have high but insufficiently high mathematical ability. Though I'm open to spatial-visual reasoning being the bottleneck (I'm actually very bad at this, like I regularly get lost in places I've lived in for years).

Lukas_Finnveden @ 2022-07-01T23:02 (+3)

I agree. Anecdotally, among people I know, I've found aphantasia to be more common among those who are very mathematically skilled.

(Maybe you could have some hypothesis that aphantasia tracks something slightly different than other variance in visual reasoning. But regardless, it sure seems similar enough that it's a bad idea to emphasize the importance of "shape rotating". Because that will turn off some excellent fits.)

Stephen McAleese @ 2022-07-01T15:19 (+9)

Great post. Although there seems to be quite a lot of interest in AI safety in the EA community, the number of technical AI safety researchers is still surprisingly small. I spent some time estimating the number of researchers and I think the 100-200 number sounds right to me as an order of magnitude estimate.

Given AI safety technical research is so important and neglected, it seems like a high-impact field to get into. The field is still relatively immature and therefore there could be many low-hanging insights to discover.

Marcel D @ 2022-07-01T20:12 (+8)

On this question of "we need supergeniuses" vs. "we just need mere geniuses": does anyone know of good research/analysis on base rates for this? For example, has anyone conducted a survey of a variety of major science and/or engineering problems in the past and evaluated whether/when progress was primarily (and counterfactually?) driven by just a few "super-geniuses" vs. many "mere geniuses"? 

I keep seeing this concept get tossed around and am still very uncertain about this.

Marcel D @ 2022-07-02T15:38 (+4)

I also think it’s worth really looking at why that was the case. This may only be relevant to a narrow slice of problems, but one of my pet theories/hypotheses is that coordination problems (e.g., poorly incentive structures in academia) undermine the ability of groups to scale with size for certain research questions. Thus, in some fields it might just be that “super geniuses” were so valuable because “mere geniuses” struggled to coordinate. But if people in EA and AI safety are better at coordinating, this may be less of a problem.

levin @ 2022-07-03T21:58 (+1)

Agreed, these seem like fascinating and useful research directions.

levin @ 2023-05-27T23:49 (+6)

As of May 27, 2023, I no longer endorse this post. I think most early-career EAs should be trying to contribute to AI governance, either via research, policy work, advocacy, or other stuff, with the exception of people with outlier alignment research talent or very strong personal fits for other things.

I do stand by a lot of the reasoning, though, especially of the form "pick the most important thing and only rule it out if you have good reasons to think personal fit differences outweigh the large differences in importance." The main thing that changed was that governance and policy stuff now seems more tractable than I thought, while alignment research seems less tractable.

Edited the top of the post to reflect this.

CEvans @ 2023-07-29T00:42 (+3)

I think this post is interesting, while being quite unsure what my actual take is on the correctness of this updated version. I think I am worried about community epistemics in this world where we encourage people to defer on what the most important thing is.

It seems like there are a bunch of other plausible candidates for where the best marginal value add is even if you buy AI X- risk arguments eg. S risks, animal welfare, digital sentience, space governance etc. I am excited about most young EAs thinking about these issues for themselves.

How much do you weight the outside view consideration here of you suggesting a large shift in the EA community resource allocation, and then changing your mind a year later, which indicates the exact kind of uncertainty which motivates more diverse portfolios?

I think your point of people underrating problem importance relative to personal fit on the current margin seems true though and tangentially, my guess is the overall EA cause portfolio (both for financial and human capital allocation) is too large.

levin @ 2023-07-30T22:00 (+2)

Yep, all sounds right to me re: not deferring too much and thinking through cause prioritization yourself, and then also that the portfolio is too broad, though these are kind of in tension.

To answer your question, I'm not sure I update that much on having changed my mind, since I think if people did listen to me and do AISTR this would have been a better use of time even for a governance career than basically anything besides AI governance work (and of course there's a distribution within each of those categories for how useful a given project is; lots of technical projects would've been more useful than the median governance project).

Peter @ 2022-07-01T21:04 (+5)

This post surprised me because I remember seeing a few comments on the forum recently that expressed sentiments along the lines of "EA is maybe getting too focused on AI" and "too many people are going into AI Safety and we might end up shortstaffed on other useful skills." I don't really remember what they were. I think one mentioned a recent podcast from 80,000 hours. 

But I heard that Redwood's second Alignment focused Machine Learning Bootcamp got over 800 applicants for 40 spots. So I'm wondering if a surge of technical researchers is imminent? Or would make it less valuable for someone like me who has 0 coding or machine learning experience to consider this path?

Robi Rahman @ 2022-07-03T02:36 (+7)

There are tons of people vaguely considering working on alignment, and not a lot of people actually working on alignment.

levin @ 2022-07-03T15:18 (+5)

Yes! This is basically the whole post condensed into one sentence

Linch @ 2022-07-03T03:14 (+5)

(Maybe obvious point, but) there just aren't that many people doing longtermist EA work, so basically every problem will look understaffed, relative to the scale of the problem.