Instead of technical research, more people should focus on buying time

By Akash @ 2022-11-05T20:43 (+107)

This is a crosspost, probably from LessWrong. Try viewing it there.

null

Linch @ 2022-11-06T08:47 (+18)

I think I have one intuition that strongly agrees with you. I have another (more quantitative) intuition that strongly disagrees, which roughly goes:

1. There aren't that many alignment researchers. Last estimate I heard was maybe 300 total?
2. Many people are trying to advance AI capabilities. Maybe 30k total?

3. Naively, buying time interventions is 100x less efficient on average. So your comparative advantage for buying time must be really strong, to the tune of thinking you're 100x better at helping to buy time than doing technical AGI safety research, for the math to work out.

I'm probably missing something, but I notice myself being confused. How do I reconcile these two intuitions?

Max Clarke @ 2022-11-06T11:29 (+7)

Probably the number of people actually pushing the frontier of alignment is more like 30, and for capabilities maybe 3000. If the 270 remaining alignment people can influence those 3000 (biiiig if, but) then the odds aren't that bad

Emrik @ 2022-11-06T18:03 (+3)

This is confused, afaict? When comparing the impact of time-buying vs direct work, the probability of success for both activities is negated by the number of people pushing capabilities. So it cancels out, and you don't need to think about the number of people in opposition.

The unique thing about time-buying is that its marginal impact increases with the number of alignment workers,^[1] whereas the marginal impact of direct work plausibly decreases with the number of people already working on it (due to fruit depletion and coordination issues).^[2]

If there are 300 people doing direct alignment and you're an average worker, you can expect to contribute 0.3% of the direct work that happens per unit time. On the other hand, if you spend your time on time-buying instead, you only need to expect to save 0.3% units of time per unit of time you spend in order to break even.^[3]

^{^}
Although the marginal impact of every additional unit of time scales with the number of workers, there are probably still diminishing returns to more people working on time-buying.
^{^}
Probably direct work scales according to some weird curve idk, but I'm guessing we're past the peak. Two people doing direct work collaboratively do more good per person than one person. But there are probably steep diminishing marginal returns from economy-of-scale/specialisation, coordination, and motivation in this case.
^{^}
Impact is a multiplication of the number of workers , their average rate of work $w$ , and the time they have left to work $t$ . And because multiplication is commutative, if you increase one of the variables by a proportion $r$ , that is equivalent to increasing any of the other variables with the same proportion. $N \times (w \times r) \times t = N \times w \times (t \times r)$ .

Linch @ 2022-11-06T18:33 (+12)

When comparing the impact of time-buying vs direct work, the probability of success for both activities is negated by the number of people pushing capabilities. So it cancels out, and you don't need to think about the number of people in opposition.

Time-buying (slowing down AGI development) seems more directly opposed to the interests of those pushing capabilities than working on AGI safety.

If the alignment tax is low (to the tune of an open-source Python package that just lets you do "pip install alignment") I expect all the major AGI labs to be willing to pay it. Maybe they'll even thank you.

On the other hand, asking people to hold off on building AGI (though I agree there's more and less clever ways to do it in practice) seems to scale poorly especially with the number of people wanting to do AGI research, and to a lesser degree the number of people doing AI/ML research in general. Or even non-researchers whose livelihoods depends on such advancements. At the very least, I do not expect effort needed to persuade people to be constant with respect to the number of people with a stake in AGI development.

Emrik @ 2022-11-06T18:56 (+5)

Fair points. On the third hand, the more AGI researchers there are, the more "targets" there are for important arguments to reach, and the higher impact systematic AI governance interventions will have.

At this point, I seem to have lost track of my probabilities somewhere in the branches, let me try to go back and find it...

Good discussion, ty. ^^

David Johnston @ 2022-11-06T20:51 (+17)

Why call it "buying time" instead of "persuading AI researchers"? That seems to be the direct target of efforts here, and the primary benefit seems better conceptualised as "AI researchers act in a way more aligned with what AI safety people think is appropriate" rather than "buying time" which is just one of the possible consequences.

Jan_Kulveit @ 2022-11-07T13:47 (+16)

Crossposting from LW

Here is a sceptical take: anyone who is prone to getting convinced by this post to switch to attempts at “buying time” interventions from attempts at do technical AI safety is pretty likely not a good fit to try any high-powered buying-time interventions.

The whole thing reads a bit like "AI governance" and "AI strategy" reinvented under a different name, seemingly without bothering to understand what's the current understanding.

Figuring out that AI strategy and governance are maybe important, in late 2022, after spending substantial time on AI safety, does not seem to be what I would expect from promising AI strategists. Apparent lack of coordination with people already working in this area does not seem like a promising sign from people who would like to engage with hard coordination problems.

Also, I'm worried about suggestions like

Concretely, we think that roughly 80% of alignment researchers are working on directly solving alignment. We think that roughly 50% should be working on alignment, while 50% should be reallocated toward buying time.

We also think that roughly 90% of (competent) community-builders are focused on “typical community-building” (designed to get more alignment researchers). We think that roughly 70% should do typical community-building, and 30% should be buying time.

...could be easily counterproductive.

What is and would be really valuable are people who understand both the so-called "technical problem" and the so-called "strategy problem". (secretly, they have more in common than people think)

What is not only not valuable, but easily harmful, would be an influx of people who understand neither, but engage with the strategy domain instead of technical domain.

Emrik @ 2022-11-06T00:50 (+10)

Multiplier effects: Delaying timelines by 1 year gives the entire alignment community an extra year to solve the problem.

This is the most and fastest I've updated on a single sentence as far back as I can remember. Probably the most important thing I've ever read on the EA forum. I am deeply gratefwl for learning this, and it's definitely worth Taking Seriously. Hoping to look into it in January unless stuff gets in the way.

(Update: I'm much substantially less optimistic about time-buying than I was when I wrote this comment, but I still think it's high priority to look into.)

I have one objection to claim 3a, however: Buying-time interventions are plausibly more heavy-tailed than alignment research in some cases because 1) the bottleneck for buying time is social influence and 2) social influence follows a power law due to preferential attachment. Luckily, the traits that make for top alignment researchers have limited (but not insignificant) overlap with the traits that make for top social influencers. So I think top alignment researchers should still not switch in most cases on the margin.

jskatt @ 2022-11-06T07:34 (+1)

Can you be more specific about "the bottleneck for buying time is social influence"?

Emrik @ 2022-11-06T09:56 (+2)

I basically agree with this breakdown from the post:

Emrik @ 2022-11-06T09:55 (+7)

How do you account for the fact that the impact of a particular contribution to object-level alignment research can compound over time?

Let's say I have a technical alignment idea now that is both hard to learn and very usefwl, such that every recipient of it does alignment research a little more efficiently. But it takes time before that idea disseminates across the community.
1. At first, only a few people bother to learn it sufficiently to understand that it's valuable. But every person that does so adds to the total strength of the signal that tells the rest of the community that they should prioritise learning this.
2. Not sure if this is the right framework, but let's say that researchers will only bother learning it if the strength of the signal hits their person-specific threshold for prioritising it.
3. Number of researchers are normally distributed (or something) over threshold height, and the strength of the signal starts out below the peak of the distribution.
4. Then (under some assumptions about the strength of individual signals and the distribution of threshold height), every learner that adds to the signal will, at first, attract more than one learner that adds to the signal, until the signal passes the peak of the distribution and the idea reaches satiation/fixation in the community.
If something like the above model is correct, then the impact of alignment research plausibly goes down over time.
1. But the same is true of a lot of time-buying work (like outreach). I don't know how to balance this, but I am now a little more skeptical of the relative value of buying time.
Importantly, this is not the same as "outreach". Strong technical alignment ideas are most likely incompatible with almost everyone outside the community, so the idea doesn't increase the number of people working on alignment.

Max Clarke @ 2022-11-06T01:04 (+5)

I would push back a little, the main thing is that buying time interventions obviously have significant sign uncertainty. Eg. your graph on median researcher "buying time" vs technical alignment, I think should have very wide error at the low end of "buying time", going significantly below 0 within the 95% confidence interval. Technical alignment is lots less risky to that extent.

Miranda_Zhang @ 2022-11-06T01:09 (+9)

To clarify, you think that "buying time" might have a negative impact [on timelines/safety]?

Even if you think that, I think I'm pretty uncertain of the impact of technical alignment, if we're talking about all work that is deemed 'technical alignment.' e.g., I'm not sure that on the margin I would prefer an additional alignment researcher (without knowing what they were researching or anything else about them), though I think it's very unlikely that they would have net-negative impact.

So, I think I disagree that (a) "buying time" (excluding weird pivotal acts like trying to shut down labs) might have net negative impact and that & thus also that (b) "buying time" has more variance than technical alignment.

edit: Thought about it more and I disagree with my original formulation of the disagreement. I think "buying time" is more likely to be net negative than alignment research, but also that alignment research is usually not very helpful.

Max Clarke @ 2022-11-06T01:20 (+8)

Rob puts it well in his comment as "social coordination". If someone tries "buying time" interventions and fails, I think that because of largely social effects, poorly done "buying time" interventions have potential to both fail at buying time and preclude further coordination with mainstream ML. So net negative effect.

On the other hand, technical alignment does not have this risk.

I agree that technical alignment has the risk of accelerating timelines though.

But if someone tries technical alignment and fails to produce results, that has no impact compared to a counterfactual where they just did web dev or something.

My reference point here is the anecdotal disdain (from Twitter, YouTube, can dm if you want) some in the ML community have for anyone who they perceive to be slowing them down.

Miranda_Zhang @ 2022-11-06T01:27 (+11)

I see! Yes, I agree that more public "buying time" interventions (e.g. outreach) could be net negative. However, for the average person entering AI safety, I think there are less risky "buying time" interventions that are more useful than technical alignment.

Max Clarke @ 2022-11-06T01:37 (+5)

I think probably this post should be edited and "focus on low risk interventions first" put in bold in the first sentence and put right next to the pictures. Because the most careless people (possibly like me...) are the ones that will read that and not read the current caveats

Emrik @ 2022-11-06T18:09 (+2)

You'd be well able to compute the risk on your own, however, if you seriously considered doing any big outreach efforts. I think people should still have a large prior on action for anything that looks promising to them. : )

Max Clarke @ 2022-11-06T01:28 (+2)

An addendum is then:

If Buying time interventions are conjunctive (ie. one can cancel out the effect of the others); but technical alignment is disjunctive
If the distribution of people performing both kinds of intervention is mostly towards the lower end of thoughtfulness/competence, (which we should imo expect)

Then technical alignment is a better recommendation for most people.

In fact it suggests that the graph in the post should be reversed (but the axis at the bottom should be social competence rather than technical competence)

RobertM @ 2022-11-06T01:07 (+9)

It's not clear to me that the variance of "being a technical researcher" is actually lower than "being a social coordinator". Historically, quite a lot of capabilities advancements have come out of efforts that were initially intended to be alignment-focused.

Edited to add: I do think it's probably harder to have a justified inside-view model of whether one's efforts are directionally positive or negative when attempting to "buy time", as opposed to "doing technical research", if one actually makes a real effort in both cases.

Emrik @ 2022-11-06T01:45 (+2)

Would you be able to give tangible examples where alignment research has advanced capabilities? I've no doubt it's happened due to alignment-focused researchers being chatty about their capabilities-related findings, but idk examples.

RobertM @ 2022-11-06T02:05 (+6)

There's obviously substantial disagreement here, but the most recent salient example (and arguably the entire surrounding context of OpenAI as an institution).

Max Clarke @ 2022-11-06T02:03 (+5)

Not sure what Rob is referring to but there are a fair few examples of org/people's purposes slipping from alignment to capabilities, eg. OpenAI

I myself find it surprisingly difficult to focus on ideas that are robustly beneficial to alignment but not to capabilities.

(E.g. I have a bunch of interpretability ideas. But interpretability can only have no impact on, or accelerate timelines)

Do you know if any of the alignment orgs have some kind of alignment research NDA, with a panel to allow any alignment-only ideas be public, but keep the maybe-capabilities ideas private?

Emrik @ 2022-11-06T02:36 (+4)

Do you mean you find it hard to avoid thinking about capabilities research or hard to avoid sharing it?

It seems reasonable to me that you'd actually want to try to advance the capabilities frontier, to yourself, privately, so you're better able to understand the system you're trying to align, and also you can better predict what's likely to be dangerous.

Max Clarke @ 2022-11-06T05:50 (+5)

Thinking about

Max Clarke @ 2022-11-06T01:10 (+2)

That's a reasonable point - the way this would reflect in the above graph is then wider uncertainty around technical alignment at the high end of researcher ability

Peter S. Park @ 2022-11-05T21:52 (+3)

Thank you so much for your excellent post on the strategy of buying time, Thomas, Akash, and Olivia! I strongly agree that this strategy is necessary, neglected, and tractable.

For practical ideas of how to achieve this (and a productive debate in the comments section of the risks from low-quality outreach efforts), please see my related earlier forum post: https://forum.effectivealtruism.org/posts/juhMehg89FrLX9pTj/a-grand-strategy-to-recruit-ai-capabilities-researchers-into

Spencer Becker-Kahn @ 2022-11-06T20:15 (+2)

Interesting. Have not read; only skimmed and bookmarked. But one of my initial naive reactions to the framing is that I have more or less always viewed technical safety work that is ‘aimed’ relatively near term as a form of buying time (I’m thinking of pretty much all work that is aimed at AI via the ML/DL paradigm, eg alignment work that is explicitly stuff to do with big transformers etc) .

jskatt @ 2022-11-06T08:44 (+2)

What does it mean to buy "end time"? If an action results in a world with longer timelines, then what does it mean to say that the additional time comes "later"?
What is "serial alignment research"? I'm struggling to distinguish it from "alignment research."
Can you clarify the culture change you want to see? We should think of buying time as "better" than "(traditional) technical AI safety research and (traditional) community-building"?

Less-important comments:

Can you be more specific about "ODA+"? Does it include Meta and Google Brain, or only the most safety-conscious labs?
I'm confused why field building isn't listed as one of the key benefits other than buying time. Publishing more work like the goal misgeneralization paper would make field building more successful.
In the graph ("Figure 1"), I'm parsing "technical alignment" as "technical research on the core challenges of alignment that avoids shortening timelines," because the impact of talented researchers depends heavily on the projects they pursue.
PaLM would have been just as fast if they chose not to publish the PaLM paper. (The sentence about PaLM is confusing.)

Emrik @ 2022-11-06T01:35 (+2)

Naively, the main argument (imo) can be summed up as:

If I am capable of doing an average amount of alignment work per unit time, and I have $n$ units of time available before the development of transformative AI, I will have contributed $¯ x * n$ work. But if I expect to delay transformative AI by $m$ units of time if I focus on it, everyone will have that additional time to do alignment work, which means my impact is $¯ x * m * p$ , where $p$ is the number of people doing work. If $m * p > n$ , I should be focusing on buying time.^[1]

This analysis further favours time-buying if the total amount of work per unit time accelerates, which is plausibly the case if e.g. the alignment community increases over time.

^{^}
This assumes time-buying and direct alignment-work is independent, whereas I expect doing either will help with the other to some extent.

jskatt @ 2022-11-06T07:41 (+3)

Your impact is if each of the $p$ alignment researchers contributes exactly $x$ alignment work per unit of time.

Emrik @ 2022-11-06T09:58 (+2)

The first sentence points out that I am doing an average amount of alignment work, and that amount is . I realise this is a little silly, but it makes the heuristic smaller. Updated the comment to $¯ x$ instead. Thanks.