AI predictions for 2026

By Ajeya @ 2026-01-20T10:35 (+25)

But first, scoring my predictions for 2025


Note: This post was crossposted from the Planned Obsolescence  by the Forum team, with the author's permission. The author may not see or respond to comments on this post.

 

On December 13th, 2024, I registered predictions about what would happen with AI by the end of 2025, using this survey run by Sage. They asked five questions about benchmarks, four about the OpenAI Preparedness Framework risk categories, one about revenues, and one about public salience of AI.

The Sage team has updated the survey with the resolutions, so I went through and looked back at what I said. Overall, I was somewhat too bullish about benchmarks scores and much too bearish about AI revenue — the reverse of the conventional wisdom that people underestimate benchmark progress and overestimate real-world utility (something which I felt like I’ve done in previous years, though I didn’t register clear predictions so it’s hard to say).

You can see how I did on benchmark scores in the table below.1 I overestimated progress on pretty much every benchmark other than FrontierMath Tiers 1-3,2 which notoriously jumped from ~2% to ~24% with the announcement of OpenAI’s o3, about a week after I registered my prediction (and less than two months after the benchmark was published).

 

* Shortly after my prediction OpenAI claimed o3 got 72%. ** Shortly after my prediction OpenAI claimed o3 got 24% *** Anthropic only reimplemented 39 out of the 40 tasks, Opus 4.5 got 32 / 39 right.

All the other questions are in the table below. I did fine on the OpenAI Preparedness Framework predictions and was somewhat too bullish on public salience. But I completely bombed the annualized revenue question, predicting that it would merely triple from ~$4.5B to ~$17B instead of nearly 7xing (!) to ~$30.5B.

 

I’m not totally sure what’s up here,3 since I’ve heard from other sources that AI revenue has been roughly 3-4xing year on year. My best guess is the Sage resolution team pulled lowball reports for the 2024 baseline revenue; one friend who looked into it a bit said that EO2024 revenue was ~$6.4B rather than ~$4.5B as Sage reported. If true, this would make my prediction only somewhat too bearish.

Predictions for 2026

 

All these forecasts are evaluated as of 11:59 pm Pacific time on Dec 31 2026. I’ll start with giving my medians for some measurable quantities with pre-existing trends,:

Next, some vibes-y capability predictions. Here are a few tasks that I’m pretty sure AIs can’t do today but probably will be able to do as of Dec 31 2026.4 I’m about 80% on these, so I’m expecting to get four of these five predictions right. Note that some of these will be hard to directly test given the high specified inference budgets, in which case I’ll go with the best judgment of my friends who really understand current AI capabilities.

And tasks that I’m pretty sure AIs can’t do today and probably still won’t be able to do as of Dec 31 2026 (again expecting to get around 4 out of 5 correct).

And lastly, some more extreme milestones I think are less likely still:

Since some people act as if so-called “doomers” have to be committed to the prediction that recursively self-improving superintelligence will definitely kill us all in the next five minutes (and if it doesn’t then the whole concern is fake), I figured I would make it clear for the record that I think most likely nothing too crazy is going to happen by the end of 2026. And probably I’ll even make it another year without the AI-created otome game I’ve been pining for since the days of GPT-3.

But we genuinely might see some truly insane outcomes. And we are utterly unprepared for them. That’s to say nothing of the chance of insane outcomes in 2028 or 2030 or 2032. There’s a lot to do, and not much time.

1

Note that while all the other benchmarks are measured as an accuracy from 0% to 100%, METR's RE-Bench is a set of 7 open-ended ML engineering problems of the form "make this starter code better to improve its runtime / loss / win-rate," with scores normalized such that the provided starting code scores 0 and a strong reference solution scores 1, meaning you can get a score greater than 1 if you improve on the reference solution (and indeed current SOTA AIs do).

2

When the benchmark was released it was just called FrontierMath, and that’s how it’s referred to in the Sage survey. The original benchmark had three tiers of difficulty, and Epoch introduced a final harder tier in the summer of 2025.

3

At first I thought I got it so wrong because I misread the question as asking about total annual revenue rather than annualized December revenue (i.e., the monthly revenue in December times 12, which will naturally be a lot higher). I did in fact have that misconception, and it happened that total 2025 revenue was very close to my prediction of $17B. But then I realized that I must have just done the forecast by multiplying the 2024 baseline value (provided by Sage) by whatever multiple I thought was appropriate, and that number was the appropriate apples-to-apples comparison.

4

To make these predictions precise, you’d need to specify the maximum inference budget you allow the AI to use in its attempt. I don’t think specifying an exact budget generally matters too much for these predictions, because often either an AI won’t be able to do a task for any realistic budget, or else if it can do the task it’ll be able to do it very cheaply compared to humans. If there’s ambiguity, I’m imagining the AI getting to use an inference budget comparable to the amount you’d have to pay a human to do the same task.


SummaryBot @ 2026-01-20T11:12 (+2)

Executive summary: The author reviews their 2025 AI predictions, noting mixed calibration, and then offers quantified and qualitative forecasts for 2026 that predict substantial capability and revenue growth, modest increases in public salience, and a small but non-negligible chance of extreme outcomes like automated AI R&D or loss of control.

Key points:

  1. The author reports being somewhat too bullish on benchmark progress and much too bearish on AI revenue in their 2025 predictions, notably underestimating annualized December revenue growth.
  2. For 2026, they predict continued benchmark advances, including a longest-reported METR 50% time horizon of 24 hours and an Epoch ECI score of 169, implying high but slowing progress on established benchmarks.
  3. They forecast combined annualized December revenue of $110 billion for OpenAI, Anthropic, and xAI, alongside modest increases in AI’s public salience and roughly stable net favorability.
  4. The author lists several concrete tasks they expect AIs will likely be able to do by the end of 2026, including competent video game play, complex personal logistics, short-form video generation, visual novel creation, and solving the hardest problem in the 2026 IMO.
  5. They also identify tasks they expect AIs will still not be able to do by the end of 2026, such as top-player-level gameplay in complex new games, organizing large weddings, producing festival-quality long films, generating long high-quality games, or publishing top-tier theoretical research from scratch.
  6. Finally, the author assigns low but non-zero probabilities to extreme milestones by 2026, including near-full automation of AI R&D (10%), top-human-expert-dominating AI (5%), self-sufficient AI (2.5%), and unrecoverable loss of control (0.5%), while emphasizing that humanity remains unprepared even if these outcomes are unlikely.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.