Forecasting Transformative AI: What Kind of AI?

By Holden Karnofsky @ 2021-08-10T21:38 (+62)

Audio version available at Cold Takes (or search Stitcher, Spotify, Google Podcasts, etc. for "Cold Takes Audio")

This is the first of four posts summarizing hundreds of pages of technical reports focused almost entirely on forecasting one number. It's the single number I'd probably most value having a good estimate for: the year by which transformative AI will be developed.1

By "transformative AI," I mean "AI powerful enough to bring us into a new, qualitatively different future." The Industrial Revolution is the most recent example of a transformative event; others would include the Agricultural Revolution and the emergence of humans.2

This piece is going to focus on exploring a particular kind of AI I believe could be transformative: AI systems that can essentially automate all of the human activities needed to speed up scientific and technological advancement. I will call this sort of technology Process for Automating Scientific and Technological Advancement, or PASTA.3 (I mean PASTA to refer to either a single system or a collection of systems that can collectively do this sort of automation.)

PASTA could resolve the same sort of bottleneck discussed in The Duplicator and This Can't Go On - the scarcity of human minds (or something that plays the same role in innovation).

PASTA could therefore lead to explosive science, culminating in technologies as impactful as digital people. And depending on the details, PASTA systems could have objectives of their own, which could be dangerous for humanity and could matter a great deal for what sort of civilization ends up expanding through the galaxy.

By talking about PASTA, I'm partly trying to get rid of some unnecessary baggage in the debate over "artificial general intelligence." I don't think we need artificial general intelligence in order for this century to be the most important in history. Something narrower - as PASTA might be - would be plenty for that.

To make this idea feel a bit more concrete, the rest of this post will discuss:

Making PASTA

I'll start with a very brief, simplified characterization of machine learning, which you can skip by clicking here.

There are essentially two ways to "teach" a computer to do a task:

Traditional programming. In this case, you code up extremely specific, step-by-step instructions for completing the task. For example, the chess-playing program Deep Blue is essentially executing instructions4 along the lines of:

Machine learning. This is essentially "training" an AI to do a task by trial and error, rather than by giving it specific instructions. Today, the most common way of doing this is by using an "artificial neural network" (ANN), which you might think of sort of like a "digital brain" that starts in an empty (or random) state: it hasn't yet been wired to do specific things.

For example, AlphaZero - an AI that has been used to master multiple board games including chess and Go - does something more like this (although it has important elements of "traditional programming" as well, which I'm ignoring for simplicity):

The latter approach is central for a lot of the recent progress in AI. This is especially true for tasks that are hard to “write down all the instructions” for. For example, humans are able to write down some reasonable guidelines for succeeding at chess, but we know very little about how we ourselves classify images (determine whether some image is of a dog, cat, or something else). So machine learning is particularly essential for tasks like classifying images.

Could PASTA be developed via machine learning? One obvious (but unrealistic) way of doing this might be something like this:

This would be wildly impractical, at least compared to how I think things are more likely to play out, but it hopefully gives a starting intuition for what a training process could be trying to accomplish: by providing a signal of "how the AI is doing," it could allow an AI to get good at the goal via trial-and-error and tweaking its internal wiring.

In reality, I'd expect training to be faster and more practical due to things like:

Developing PASTA will almost certainly be hugely harder and more expensive than it was for AlphaZero. It may require a lot of ingenuity to get around obstacles that exist today (the picture above is surely radically oversimplified, and is there to give basic intuitions). But AI research is simultaneously getting cheaper8 and better-funded. I'll argue in future pieces that the odds of developing PASTA in the coming decades are substantial.

Impacts of PASTA

Explosive scientific and technological advancement

I've previously talked about the idea of a potential explosion in scientific and technological advancement, which could lead to a radically unfamiliar future.

I've emphasized that such an explosion could be caused by a technology that "dramatically increased the number of 'minds' (humans, or digital people, or advanced AIs) pushing forward scientific and technological advancement."

PASTA would fit this bill well, particularly if it were as good as humans (or better) at finding better, cheaper ways to make more PASTA systems. PASTA would have all of the tools for a productivity explosion that I previously laid out for digital people:

Thanks to María Gutiérrez Rojas for this graphic, a variation on similar graphics from The Duplicator and Digital People Would Be An Even Bigger Deal illustrating the dynamics of explosive growth. Here, instead of people having ideas that increase productivity, it's AI algorithms (denoted by neural network icons).

Why doesn't this feedback loop apply to today's computers and AIs? Because today's computers and AIs aren't able to do all of the things required to have new ideas and get themselves copied more efficiently. They play a role in innovation, but innovation is ultimately bottlenecked by humans, whose population is only growing so fast. This is what PASTA would change (it is also what digital people would change).

Additionally: unlike digital copies of humans, PASTA systems might not be attached to their existing identity and personality. A PASTA system might quickly make any edits to its "mind" that made it more effective at pushing science and technology forward. This might (or might not, depending on a lot of details) lead to recursive self-improvement and an "intelligence explosion." But even if this didn't pan out, simply being as good as humans at making more PASTA systems could cause explosive advancement for the same reasons the digital people could.

Misaligned AI: mysterious, potentially dangerous objectives

If PASTA were developed as outlined above, it's possible that we might know extremely little about its inner workings.

AlphaZero - like other modern deep learning systems - is in a sense very poorly understood. We know that it "works." But we don't really know "what it's thinking."

If we want to know why AlphaZero made some particular chess move, we can't look inside its code to find ideas like "Control the center of the board" or "Try not to lose my queen." Most of what we see is just a vast set of numbers, denoting the strengths of connections between different artificial neurons. As with a human brain, we can mostly only guess at what the different parts of the "digital brain" are doing9 (although there are some early attempts to do what one might call "digital neuroscience.")

The "designers" of AlphaZero (discussed above) didn't need much of a vision for how its thought processes would work. They mostly just set it up so that it would get a lot of trial and error, and evolve to get a particular result (win the game it’s playing). Humans, too, evolved primarily through trial and error, with selection pressure to get particular results (survival and reproduction - although the selection worked differently).

Like humans, PASTA systems might be good at getting the results they are under pressure to get. But like humans, they might learn along the way to think and do all sorts of other things, and it won't necessarily be obvious to the designers whether this is happening.

Perhaps, due to being optimized for pushing forward scientific and technological advancement, PASTA systems will be in the habit of taking every opportunity to do so. This could mean that they would - given the opportunity - seek to fill the galaxy with long-lasting space settlements devoted to science.

Perhaps PASTA will emerge as some byproduct of another objective. For example, perhaps humans will be trying to train systems to make money or amass power and resources, and setting them up to do scientific and technological advancement will just be part of that. In which case, perhaps PASTA systems will just end up as power-and-resources seekers, and will seek to bring the whole galaxy under their control.

Or perhaps PASTA systems will end up with very weird, "random" objectives. Perhaps some PASTA system will observe that it "succeeds" (gets a positive training signal) whenever it does something that causes it to have direct control over an increased amount of electric power (since this is often a result of advancing technology and/or making money), and it will start directly aiming to increase its supply of electric power as much as possible - with the difference between these two objectives not being noticed until it becomes quite powerful. (Analogy: humans have been under selection pressure to pass their genes on, but many have ended up caring more about power, status, enjoyment, etc. than about genes.)

These are scary possibilities if we are talking about AI systems (or collections of systems) that may be more capable than humans in at least some domains.

If you're interested in more discussion of whether an AI could or would have its own goals, I'd suggest checking out Superintelligence (book), The case for taking AI seriously as a threat to humanity (Vox article), Draft report on existential risk from power-seeking AI (Open Philanthropy analysis) or one of the many other pieces on this topic.10

Conclusion

It's hard to predict what a world with PASTA might look like, but two salient possibilities would be:

The next 3 posts will argue that PASTA is more likely than not to be developed this century.

Footnotes

  1. Of course, the answer could be "A kajillion years from now" or "Never." 

  2. See this section of "Forecasting TAI with Biological Anchors" (Cotra (2020)) for a more full definition of "transformative AI." 

  3. I'm sorry. But I do think the rest of the series will be slightly more fun to read this way. 

  4. The examples here are of course simplified. For example, both Deep Blue and AlphaGo incorporate substantial amounts of "tree search," a traditionally-programmed algorithm that has its own "trial and error" process. 

  5. And they can include simulating long chains of future game states. 

  6. Some AIs could be used to determine whether papers are original contributions based on how they are later cited; others could be used to determine whether papers are original contributions based only on the contents of the paper and on previous literature. The former could be used to train the latter, by providing a "That's correct" or "That's wrong" signal for judgments of originality. Similar methods could be used for training AIs to assess the correctness of papers. 

  7. E.g., https://openai.com/blog/improving-language-model-behavior/  

  8. Due to improvements in hardware and software. 

  9. It's even worse than spaghetti code

  10. More books: Human Compatible, Life 3.0, and The Alignment Problem.  


gwern @ 2021-08-11T01:51 (+14)

(If anyone asks, say 'PASTA' was designed as an allusion to Strega Nona.)

andzuck @ 2021-08-11T02:46 (+7)

Footnote 9 also on theme

Tristan Williams @ 2023-10-28T13:18 (+1)

Same problem with the footnotes (doesn't show a preview) and actually this one might be bugged (currently can't see any footnotes).