AI is advancing fast

By Vishakha Agrawal, Algon @ 2025-04-23T11:04 (+2)

This is a linkpost to https://aisafety.info/questions/NM37/1:-AI-is-advancing-fast

This is an article in the new intro to AI safety series from AISafety.info. AISafety.info writes AI safety intro content. We'd appreciate any feedback.

The most up-to-date version of this article is on our website, along with 300+ other articles on AI existential safety.

AI has existed since the 1950s, but in the 2010s and especially 2020s, “deep learning” systems have made great strides in capability. The most prominent deep learning systems are “large language models” (LLMs) trained on completing human text, which end up being able to do a surprisingly wide range of tasks.

This progress has been largely driven by rapid scaling of the resources going into it. A few different factors are growing exponentially:^[1]

The amount of money spent (2.6x growth per year), which is now on the order of a hundred million for the largest training runs.
The quality of computing hardware technology (1.35x growth per year), measured in computations available per dollar — see also “Moore’s Law”.
The quality of software algorithms (3x growth per year), measured according to how much computing power it takes to reach some given level of performance.
The amount of data used to train language models (3x growth per year).

The total computing power applied to AI increases exponentially based on the first two points, and once you include software improvement, what you might call the “effective computing power” increases at an even faster exponential rate.

This growth compounds quickly. The computing power used to train the largest models in 2024 was billions of times greater than in 2010 — and that’s before taking into account better software.

As a result of this scaling, AI has become smarter. Various AI systems can now:

Drive cars. (For real this time.)
Generate realistic images, speech, music, and even video from a short text prompt.
Operate robots in factories and on construction sites.
Beat the best humans at many board and video games.
Predict the structure of proteins.

Scaling has also made some AI systems more general. A single language model like GPT-4, which powers ChatGPT, can:

Hold a conversation like a human.
Use its reasoning to play video games like Pokemon without specific training.
Play chess at the level of an intermediate human player, just from having seen lots of game transcripts.
Give sensible advice on a wide range of topics.
Tutor users competently.
Search the web for new information.
Translate between most languages, even if they haven’t been trained on the specific pair.

To get a sense of the capabilities of current language models, you can try them for yourself — you may notice that current freely-available systems are much better at reliably answering complex questions than anything that existed two or three years ago.

You can also see AI progress on quantitative benchmarks. AI has rapidly improved on measures of its ability to:

Understand written text.^[2]
Answer science questions suited to graduate students.^[3]
Solve coding competition problems.^[4]
Solve Olympiad-level problems in mathematics.^[5]

The “horizon length” of AI has also doubled every seven months.^[6] That is: newer AI systems are increasingly able to remain coherent in their thinking while doing coding tasks, succeeding at new types of tasks that take humans longer and longer amounts of time.

In fact, researchers are struggling to keep up and create new benchmarks. One benchmark, called “Humanity’s Last Exam”^[7], is a collection of problems from various fields that are hard even for experts. As of April 2025, AI progress on solving it has been only modest — although we haven’t checked in the past hour.

If progress is so fast, it’s natural to ask: will such trends take AI all the way to the human level?

^{^}
These numbers are from Epoch AI (retrieved April 2025, but will probably change in the future).
^{^}
https://paperswithcode.com/dataset/mmlu
^{^}
https://paperswithcode.com/dataset/gpqa
^{^}
https://paperswithcode.com/dataset/apps
^{^}
https://paperswithcode.com/dataset/math
^{^}
https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
^{^}
https://paperswithcode.com/sota/humanity-s-last-exam-on-humanity-s-last-exam

Yarrow @ 2025-04-23T23:07 (+3)

I don’t have to tell you that scaling inputs like compute like money, compute, labour, and so on isn’t the same as scaling outputs like capabilities or intelligence. So, evidence that inputs have been increasing a lot is not evidence that outputs have been increasing a lot. We should avoid ambiguating between these two things.

I’m actually not convinced AI can drive a car today in any sense that was not also true 5 years ago or 10 years ago. I have followed the self-driving car industry closely and, internally, companies have a lot of metrics about safety and performance. These are closely held and rarely is anything disclosed to the public.

We also have no idea how much human labour is required in operating autonomous vehicle prototypes, e.g., how often a human has to intervene remotely.

Self-driving car companies are extremely secretive about the information that is the most interesting for judging technological progress. And they simultaneously have strong and aggressive PR and marketing. So, I’m skeptical. Especially since there is a history of companies like Cruise making aggressive, optimistic pronouncements and then abruptly announcing that the company is over.

Elon Musk has said full autonomy is one year away every year since 2015. That’s an extreme case, but others in the self-driving car industry have also set timelines and then blown past them.

Algon @ 2025-05-02T15:50 (+1)

Hi!

Thanks for the all comments! Sorry for taking so long.

1) Yes, scaling inputs isn't the same thing as scaling capabilities. That's why the empirical success of the scaling hypothesis is/was so surprising.

2) Thanks for the point about self-driving cars, we removed that from the article on site. I'm going to investigate this some more before re-instating the self-driving claim on site. We'll update the article on the EA forum to reflect the current on-site article. (Though we won't keep doing that.)

A friend independently made a related point to me, where he noted a self-driving ML engineer claimed that the reason for the city based limitation of self-driving taxis was because the AI relied on 3d-mesh maps for navigation. I'm kind of confused about why this would be a big limitation, as I thought things like NeRFs made it possible to quickly generate a high res voxel map for static shape feasible. And that most of the difficulty in self-driving was in dealing with edge cases e.g. a baby deer jumping into the road, rather than being aware of static shapes about you. But that's my confusion, and I'd rather not stick my neck out on a claim on which I'm not an expert, which experts do contest.

3) Mostly, I was basing these claims off Waymo, and their stats and anecdotal reports from folks in SF regarding how well they perform. When I looked into Waymo's statistics for average miles w/o human interruption, and at first blush they looked quite impressive (17k miles!). But the one niggling worry I had was that they said humans feedback wasn't always written down as an interruption.