We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"

By Dane Valerie @ 2024-05-16T12:05 (+15)

This is a linkpost to https://www.lesswrong.com/posts/dLwo67p7zBuPsjG5t/we-might-be-missing-some-key-feature-of-ai-takeoff-it-ll

By Lukas Gloor

Predicting the future is hard, so it’s no surprise that we occasionally miss important developments.

However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could’ve seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.)

Maybe this is hindsight bias, but if there’s something to it, I want to distill the nature of the mistake.

First, here are the examples that prompted me to take notice:

Predicting the course of the Covid pandemic:

Predicting AI progress:

Concerning these examples, it seems to me that:

  1. It should’ve been possible to either foresee these developments or at least highlight the scenario that happened as one that could happen/is explicitly worth paying attention to.
  2. The failure mode at play involves forecasting well on some narrow metrics but not paying attention to changes in the world brought about by the exact initial thing you were forecasting, and so predicting a future that will seem incongruent.

What do I mean by “incongruent?”

I won’t speculate much on how to improve at this in this post since I mainly just wanted to draw attention to the failure mode in question.

Still, if I had to guess, the scenario forecasting that some researchers have recently tried out seems like a promising approach here. To see why, imagine writing a detailed scenario forecast about how Covid affects countries with various policies. Surely, you’re more likely to notice the importance of things like the “control system” if you think things through vividly/in fleshed-out ways than if you’re primarily reasoning abstractly in terms of doubling times and r0.

Admittedly, sometimes, there are trend-altering developments that are genuinely hard to foresee. For instance, in the case of the “chat-gpt moment,” it seems obvious with hindsight, but ten years ago, many people probably didn’t necessarily expect that AI capabilities would develop gradually enough for us to get chat-gpt capabilities well before the point where AI becomes capable of radically transforming the world. For instance, see Yudkowksy’s post about there being no fire alarm, which seems to have been wrong in at least one key respect (while being right in the sense that, even though many experts have changed their minds after chat-gpt, there’s still some debate about whether we can call it a consensus that “short AI timelines are worth taking seriously.”)

So, I’m sympathetic to the view that it would have been very difficult (and perhaps unfairly demanding) for us to have anticipated a “chat-gpt moment” early on in discussions of AI risk, especially for those of us who were previously envisioning AI progress in a significantly different, more “jumpy” paradigm. (Note that progress being gradual before AI becomes "transformative" doesn't necessary predict that progress will continue to stay gradual all the way to the end – see the argument here for an alternative.) Accordingly, I'd say that it seems a lot to ask to have explicitly highlighted – in the sense of: describing the scenario in sufficient detail to single it out as possible and assigning non-trivial probability mass to it – something like the current LLM paradigm (or its key ingredients, the scaling hypothesis and “making use of internet data for easy training”) before, say, GPT-2 came out. (Not to mention earlier still, such as e.g., before Go progress signalled the AI community’s re-ignited excitement about deep learning.) Still, surely there must have come a point where it became clearer that a "chat-gpt moment" is a thing that's likely going to happen. So, while it might be true that it wasn’t always foreseeable that there’d be something like that, somewhere in between “after GPT-2” and “after GPT-3,” it became foreseeable as a possibility at the very least.

I'm sure many people indeed saw this coming, but not everyone did, so I'm trying to distill what heuristics we could use to do better in similar, future cases.

To summarize, I concede that we (at least those of us without incredibly accurate inside-view models of what’s going to happen) sometimes have to wait for the world to provide updates about how a trend will unfold. Trying to envision important developments before those updates come in is almost guaranteed to leave us with an incomplete and at-least-partly misguided picture. That’s okay; it’s the nature of the situation. Still, we can improve at noticing at the earliest point possible what those updates might be. That is, we can stay on the lookout for signals that the future will go in a different way than our default models suggest, and update our models early on. (Example: “AGI” being significantly easier than "across-the-board superhuman AI" in the LLM paradigm.) Furthermore, within any given trend/scenario that we’re explicitly modeling (like forecasting Covid numbers or forecasting AI capabilities under the assumption of the scaling hypothesis), we should coherence-check our forecasts to ensure that they don’t predict an incongruent state of affairs. By doing so, we can better plan ahead instead of falling into a mostly reactive role.

So, here’s a set of questions to maybe ask ourselves:


[Content cross-posted with author's permission. The author may not see or respond to comments on this post.]