We might be missing some key feature of AI takeoff; it'll probably seem like "we could've seen this coming"
By Dane Valerie @ 2024-05-16T12:05 (+15)
This is a linkpost to https://www.lesswrong.com/posts/dLwo67p7zBuPsjG5t/we-might-be-missing-some-key-feature-of-ai-takeoff-it-ll
By Lukas Gloor
Predicting the future is hard, so it’s no surprise that we occasionally miss important developments.
However, several times recently, in the contexts of Covid forecasting and AI progress, I noticed that I missed some crucial feature of a development I was interested in getting right, and it felt to me like I could’ve seen it coming if only I had tried a little harder. (Some others probably did better, but I could imagine that I wasn't the only one who got things wrong.)
Maybe this is hindsight bias, but if there’s something to it, I want to distill the nature of the mistake.
First, here are the examples that prompted me to take notice:
Predicting the course of the Covid pandemic:
- I didn’t foresee the contribution from sociological factors (e.g., “people not wanting to get hospitalized” – Zvi called it “the control system”).
- As a result, I overpredicted the difference between countries with a lockdown policy vs ones without. (Note that this isn’t necessarily an update against the cost-effectiveness of lockdowns because the update goes both ways: lockdowns saved fewer lives than I would’ve predicted naively, but costs to the economy were also lower compared to the counterfactual where people already social-distanced more than expected of their own accord since they were reading the news about crowded hospitals and knew close contacts who were sick with the virus.)
Predicting AI progress:
- Not foreseeing that we’d get an Overton window shift in AI risk awareness.
- Many EAs were arguably un(der)prepared for the possibility of a “chat-gpt moment,” where people who weren’t paying attention to AI progress previously got to experience a visceral sense of where AI capabilities progress is rapidly heading. As a result, it is now significantly easier to make significant policy asks to combat AI risks.
- Not foreseeing wide deployment of early-stage “general” AI and the possible irrelevance of AI boxing.
- Early discussions of AI risk used to involve this whole step about whether a superhuman AI system could escape and gain access to the internet. No one (to my knowledge?) highlighted that the future might well go as follows:
“There’ll be gradual progress on increasingly helpful AI tools. Companies will roll these out for profit and connect them to the internet. There’ll be discussions about how these systems will eventually become dangerous, and safety-concerned groups might even set up testing protocols (“safety evals”). Still, it’ll be challenging to build regulatory or political mechanisms around these safety protocols so that, when they sound the alarm at a specific lab that the systems are becoming seriously dangerous, this will successfully trigger a slowdown and change the model release culture from ‘release by default’ to one where new models are air-gapped and where the leading labs implement the strongest forms of information security.”
If we had understood the above possibility earlier, the case for AI risks would have seemed slightly more robust, and (more importantly) we could’ve started sooner with the preparatory work that ensures that safety evals aren’t just handled company-by-company in different ways, but that they are centralized and connected to a trigger for appropriate slowdown measures, industry-wide or worldwide.
- Early discussions of AI risk used to involve this whole step about whether a superhuman AI system could escape and gain access to the internet. No one (to my knowledge?) highlighted that the future might well go as follows:
Concerning these examples, it seems to me that:
- It should’ve been possible to either foresee these developments or at least highlight the scenario that happened as one that could happen/is explicitly worth paying attention to.
- The failure mode at play involves forecasting well on some narrow metrics but not paying attention to changes in the world brought about by the exact initial thing you were forecasting, and so predicting a future that will seem incongruent.
What do I mean by “incongruent?”
- A world where hospitals are crowded with people dying from Covid-induced pneumonia and everyone has a contact who’s already got the virus, yet people continue to go to restaurants as normal.
- A world where AI capabilities progressed far enough to get us to something like chat-gpt, but somehow this didn’t cause a stir or wake-up moment for anyone who wasn’t already concerned about AI risk.
- A world where "general" AI is easier than across-the-board superhuman AI, and yet the profit-oriented AI companies don't develop a hard-to-reverse culture of making models broadly available (and giving them access to the internet) as these models are slowly getting more and more capable.
I won’t speculate much on how to improve at this in this post since I mainly just wanted to draw attention to the failure mode in question.
Still, if I had to guess, the scenario forecasting that some researchers have recently tried out seems like a promising approach here. To see why, imagine writing a detailed scenario forecast about how Covid affects countries with various policies. Surely, you’re more likely to notice the importance of things like the “control system” if you think things through vividly/in fleshed-out ways than if you’re primarily reasoning abstractly in terms of doubling times and r0.
Admittedly, sometimes, there are trend-altering developments that are genuinely hard to foresee. For instance, in the case of the “chat-gpt moment,” it seems obvious with hindsight, but ten years ago, many people probably didn’t necessarily expect that AI capabilities would develop gradually enough for us to get chat-gpt capabilities well before the point where AI becomes capable of radically transforming the world. For instance, see Yudkowksy’s post about there being no fire alarm, which seems to have been wrong in at least one key respect (while being right in the sense that, even though many experts have changed their minds after chat-gpt, there’s still some debate about whether we can call it a consensus that “short AI timelines are worth taking seriously.”)
So, I’m sympathetic to the view that it would have been very difficult (and perhaps unfairly demanding) for us to have anticipated a “chat-gpt moment” early on in discussions of AI risk, especially for those of us who were previously envisioning AI progress in a significantly different, more “jumpy” paradigm. (Note that progress being gradual before AI becomes "transformative" doesn't necessary predict that progress will continue to stay gradual all the way to the end – see the argument here for an alternative.) Accordingly, I'd say that it seems a lot to ask to have explicitly highlighted – in the sense of: describing the scenario in sufficient detail to single it out as possible and assigning non-trivial probability mass to it – something like the current LLM paradigm (or its key ingredients, the scaling hypothesis and “making use of internet data for easy training”) before, say, GPT-2 came out. (Not to mention earlier still, such as e.g., before Go progress signalled the AI community’s re-ignited excitement about deep learning.) Still, surely there must have come a point where it became clearer that a "chat-gpt moment" is a thing that's likely going to happen. So, while it might be true that it wasn’t always foreseeable that there’d be something like that, somewhere in between “after GPT-2” and “after GPT-3,” it became foreseeable as a possibility at the very least.
I'm sure many people indeed saw this coming, but not everyone did, so I'm trying to distill what heuristics we could use to do better in similar, future cases.
To summarize, I concede that we (at least those of us without incredibly accurate inside-view models of what’s going to happen) sometimes have to wait for the world to provide updates about how a trend will unfold. Trying to envision important developments before those updates come in is almost guaranteed to leave us with an incomplete and at-least-partly misguided picture. That’s okay; it’s the nature of the situation. Still, we can improve at noticing at the earliest point possible what those updates might be. That is, we can stay on the lookout for signals that the future will go in a different way than our default models suggest, and update our models early on. (Example: “AGI” being significantly easier than "across-the-board superhuman AI" in the LLM paradigm.) Furthermore, within any given trend/scenario that we’re explicitly modeling (like forecasting Covid numbers or forecasting AI capabilities under the assumption of the scaling hypothesis), we should coherence-check our forecasts to ensure that they don’t predict an incongruent state of affairs. By doing so, we can better plan ahead instead of falling into a mostly reactive role.
So, here’s a set of questions to maybe ask ourselves:
- How will the social and technological landscape differ from now as we get even closer to transformative AI?
- What might be an incredibly relevant aspect of AI takeoff that I’m currently oblivious to? Particularly, anything that might happen where I will later think, “Ah, why didn’t I see this coming?”
- How might I be envisioning trends in a misguided way? What could be early indicators of “trend changes” where the future will depart from my default way of modeling/envisioning things?
[Content cross-posted with author's permission. The author may not see or respond to comments on this post.]