AI strategy nearcasting

By Holden Karnofsky @ 2022-08-26T16:25 (+61)

Cross-posted from the Alignment Forum / Less Wrong

This is the first in a series of pieces taking a stab at dealing with a conundrum:

It seems to me that in order to more productively take actions (including making more grants), we need to get more clarity on some crucial questions such as “How serious is the threat of a world run by misaligned AI?” But it’s hard to answer questions like this, when we’re talking about a development (transformative AI) that may take place some indeterminate number of decades from now.

This piece introduces one possible framework for dealing with this conundrum. The framework is AI strategy nearcasting: trying to answer key strategic questions about transformative AI, under the assumption that key events (e.g., the development of transformative AI) will happen in a world that is otherwise relatively similar to today's. One (but not the only) version of this assumption would be “Transformative AI will be developed soon, using methods like what AI labs focus on today.”

The term is inspired by nowcasting. For example, the FiveThirtyEight Now-Cast projects "who would win the election if it were held today,” which is easier than projecting who will win the election when it is actually held. I think imagining transformative AI being developed today is a bit much, but “in a world otherwise relatively similar to today’s” seems worth grappling with.

Some potential benefits of nearcasting, and reservations

A few benefits of nearcasting (all of which are speculative):

As with nowcasting, nearcasting can serve as a jumping-off point. If we have an idea of what the best actions to take would be if transformative AI were developed in a world otherwise similar to today’s, we can then start asking “Are there particular ways in which we expect the future to be different from the nearer term, that should change our picture of which actions would be most helpful?”

Nearcasting can also focus our attention on the scenarios that are not only easiest to imagine concretely, but also arguably “highest-stakes.” Worlds in which transformative AI is developed especially soon are worlds in which the “nearcasting” assumptions are especially likely to hold - and these are also worlds in which we will have especially little time to react to crucial developments as they unfold. They are thus worlds in which it will be especially valuable to have thought matters through in advance. (They are also likely to be worlds in which transformative AI most “takes the world by surprise,” such that the efforts of people paying attention today are most likely to be disproportionately helpful.)

Nearcasting might give us a sort of feedback loop for learning: if we do nearcasting now and in the future, we can see how the conclusions (in terms of which actions would be most helpful today) change over time, and perhaps learn something from this.

A major reservation about nearcasting (as an activity in general, not relative to forecasting) is that people are arguably quite bad at reasoning about future hypothetical scenarios,1 and most of the most impressive human knowledge to date arguably has relied heavily on empirical observations and experimentation. AI strategy nearcasting seems destined to be vastly inferior to e.g. good natural sciences work, in terms of how much we can trust its conclusions.

I think this reservation is valid as stated, and I expect any nearcast to look pretty silly in at least some respects with the benefit of hindsight. But I still think nearcasting (and, more generally, analyzing hypothetical future scenarios with transformative AI2) is probably under-invested in today:

I think part of the challenge of this kind of work is having reasonable judgment about which aspects of a hypothetical scenario are too specific to place big bets on, vs. which aspects represent relatively robust themes that would apply to many possible futures.

The nearcast I’ll be discussing

Starting here, and continuing into future pieces, I’m going to lay out a “nearcast” that shares many (not all) assumptions with this piece by Ajeya Cotra (which I would highly recommend reading in full if you’re interested in the rest of this series), hereafter abbreviated as “Takeover Analysis.”

I will generally use the present tense (“in my scenario, the alignment problem is difficult”) when describing a nearcasting scenario, and since the whole story should be taken in the spirit of speculation, I’m going to give fewer caveats than I ordinarily would - I will often say something like “the alignment problem is difficult” when I mean something like “the alignment problem is difficult in my most easily-accessible picture of how things would go” (not something like “I know with confidence that the alignment problem will be difficult”).

The key properties of the scenario I’ll be considering are:

  1. Magma. For concreteness, we’re focused on a particular AI project or company, which I’ll call Magma (following Takeover Analysis).
  2. Transformative AI is knowably near, but not yet here. Magma’s leadership has good reason to believe that it can develop something like PASTA soon - say, within a year. (I also assume that it hasn’t unwittingly done so yet!)
  3. Human feedback on diverse tasks. The path to transformative AI revolves around what Takeover Analysis calls “human feedback on diverse tasks” (HFDT). You can think of this roughly as if (to oversimplify) we trained an AI by giving it a thumbs up when we approved of what it had done, and a thumbs down when we disapproved, and repeated this enough times that it was able to become very good at behaving in ways that would get a thumbs up, on a broad set of diverse tasks.
  4. Some (but not unlimited) deliberate delay is possible. Magma has some - but not unlimited - ability to delay development and/or deployment in order to take “extra” measures to reduce the risk of misaligned AI.
    1. That is, it’s not the case that “any delay in deploying a given type of AI system will simply mean a competitor does so instead.”
    2. This could be because Magma has significant advantages over its competitors (funding, algorithms, talent, etc.) and/or because its closest competitors have a similar level of caution to Magma.
    3. I’ll be imagining that Magma can reasonably expect to deploy a given AI system 6-24 months later than it would otherwise for purposes of safety, without worrying that a competitor would deploy a similar (less safe) system in that time.
  5. Otherwise, changes compared to today’s world are fairly minimal.
    1. Deep learning has evolved in many ways - for example, there might be new architectures that improve on the Transformer - but the fundamental paradigm is still basically that of models gaining capabilities via “trial-and-error.”4 More on this at Takeover Analysis.
    2. We still can’t say much about what’s going on inside a given AI model (most of what we know is that it learns from trial and error on tasks, and performs well on the tasks).
    3. AI systems have advanced to the point where there are many new commercial applications, and perhaps substantial economic impact, but we haven’t yet reached the point where AIs can fully automate scientific and technological advancement, where economic growth is “explosive” (greater than 30% per year, which would very roughly be 10x the current rate of global economic growth), or where AI systems can facilitate anything like a decisive strategic advantage.

I note that the combination of “Changes compared to today’s world are fairly minimal” with “Transformative AI is knowably near” implies a fast takeoff approaching: a very rapid transition from a world like today’s to a radically different world. This is (in my view) a key way in which reality is likely to differ from the scenario I’m describing.5

I still think this scenario is worth contemplating, for the following reasons:

Next in series


Footnotes

  1. Though this is something I hear claimed a lot without necessarily much evidence; see my discussion of the track record of futurists here.

  2. E.g., see this piece, this piece and Age of Em

  3. For example, I generally feel better about predictions about technological developments (what will be possible at a particular price) than predictions about what products will be widely used, how lifestyles will change, etc.

    Because of this, I think “Extreme technology X will be available, and it seems clear and obvious that the minimum economic consequences of this technology or any technology that can do similar things would be enormous” is in many ways a better form of prediction than “Moderate technology Y will be available, and it will be superior enough to its alternatives that it will be in wide use.” This is some of why I tend to feel better about predictions about transformative AI (which I’d say are going out on a massive limb re: what will be possible, but less of one re: how significant such a thing would be) than about predictions of self-driving cars (which run into lots of questions about how technology will interact with the regulatory and economic environment).

  4. Some readers have objected to the “trial and error” term, because they interpret it as meaning something like: “trying every possible action until something works, with no system for ‘learning’ from mistakes.” This is not how AI systems are generally trained - their training uses stochastic gradient descent, in which each “trial” comes with information about how to adjust an AI system to do better on similar inputs in the future. But I use the “trial and error” term because I think it is intuitive, and because I think the term generally is inclusive of this sort of thing (e.g., I think Wikipedia’s characterization of the term is quite good, and includes cases where each “trial” comes with information about how to do better in the future).

  5. To be clear, I expect that the transition to transformative AI will be much faster than the vast majority of people imagine, but I don’t expect it will be quite as fast as implied here.

  6. To my knowledge, most existing discussion of the alignment problem - see links at the bottom of this section for examples - focuses on abstract discussions of some of the challenges presented by aligning any AI system (things like “It’s hard to formalize the values we most care about”). Takeover Analysis instead takes a nearcasting frame, and walks mechanically through what it might look like to develop powerful AI with today’s methods. It discusses mechanically how this could lead to an AI takeover - as well as why the most obvious methods for diagnosing and preventing this problem seem like they wouldn’t work.


Marcel D @ 2022-08-26T18:55 (+8)

This is because many actions that would be helpful under one theory of how things will play out would be harmful under another (for example, see my discussion of the “caution” frame vs. the “competition” frame).

It seems to me that in order to more productively take actions (including making more grants), we need to get more clarity on some crucial questions such as “How serious is the threat of a world run by misaligned AI?” But it’s hard to answer questions like this, when we’re talking about a development (transformative AI) that may take place some indeterminate number of decades from now.

I agree and have been saying as much for a while. I think the nearcasting approach is interesting and worth exploring, but given how complicated reality is (given the number of  potentially relevant variables, uncertainty about those variables values and interactions, inability to just compute insights via empirical research and big data crunching), I still think people should consider that maybe what we need is just a lot of rigorous theoretical legwork, just like how it's often beneficial/necessary to use rigorous empirical methods (e.g., statistical hypothesis testing, data-based high-fidelity engineering simulations[1]) for STEM research rather than just intuiting conclusions based on mental models.

Unless there is some other methodological innovation in theoretical research that makes it as legible and reproducible as the empirical methods we've come to rely on over the past century, I think that this kind of theoretical research should at least demand the kind of (1) assumption explication, (2) argument/rebuttal tracking, and (3) viewpoint/argument generation and aggregation that I described in my research project post-mortem.

But I'd be curious to hear other people's thoughts!

  1. ^

    The status of this as an empirical method is certainly disputable, but I think it's not worth quibbling over semantics here/yet.