Dwarkesh Patel's thoughts on AI progress (Dec 2025)
By Vasco Grilođ¸ @ 2026-02-01T09:28 (+27)
This is a linkpost to https://www.dwarkesh.com/p/thoughts-on-ai-progress-dec-2025
[Subtitle.] Why I'm moderately bearish in the short term, and explosively bullish in the long term
This is a crosspost for Thoughts on AI progress (Dec 2025) by Dwarkesh Patel, which was originally published on Dwarkesh Podcast on 2 December 2025.
What are we scaling?
Iâm confused why some people have short timelines and at the same time are bullish on the current scale up of reinforcement learning atop LLMs. If weâre actually close to a human-like learner, this whole approach of training on verifiable outcomes is doomed.
Currently the labs are trying to bake in a bunch of skills into these models through âmid-trainingâ - thereâs an entire supply chain of companies building RL environments which teach the model how to navigate a web browser or use Excel to write financial models. [Relatedly, see Epoch AI's post An FAQ on Reinforcement Learning Environments.]
Either these models will soon learn on the job in a self directed way - making all this pre-baking pointless - or they wonât - which means AGI is not imminent. Humans donât have to go through a special training phase where they need to rehearse every single piece of software they might ever need to use.
Beren Millidge made interesting points about this in a recent blog post:
When we see frontier models improving at various benchmarks we should think not just of increased scale and clever ML research ideas but billions of dollars spent paying PhDs, MDs, and other experts to write questions and provide example answers and reasoning targeting these precise capabilities ... In a way, this is like a large-scale reprise of the expert systems era, where instead of paying experts to directly program their thinking as code, they provide numerous examples of their reasoning and process formalized and tracked, and then we distill this into models through behavioural cloning. This has updated me slightly towards longer AI timelines since given we need such effort to design extremely high quality human trajectories and environments for frontier systems implies that they still lack the critical core of learning that an actual AGI must possess.
You can see this tension most vividly in robotics. In some fundamental sense, robotics is an algorithms problem, not a hardware or data problem â with very little training, humans can learn how to teleoperate current hardware to do useful work. So if we had a human like learner, robotics would (in large part) be solved. But the fact that we donât have such a learner makes it necessary to go out into a thousand different homes to learn how to pick up dishes or fold laundry.
One counterargument Iâve heard from the takeoff-within-5-years crew is that we have to do this cludgy RL in service of building a superhuman AI researcher, and then the million copies of automated Ilya can go figure out how to solve robust and efficient learning from experience.
This gives the vibes of that old joke, âWeâre losing money on every sale, but weâll make it up in volume.â Somehow this automated researcher is going to figure out the algorithm for AGI - a problem humans have been banging their head against for the better part of a century - while not having the basic learning capabilities that children have? I find this super implausible.
Besides, even if you think the RLVR scaleup will soon help us automate AI research, the labsâ actions suggest otherwise. You donât need to pre-bake the consultantâs skills at crafting Powerpoint slides in order to automate Ilya. So clearly the labsâ actions hint at a world view where these models will continue to fare poorly at generalizing and on-the-job learning, thus making it necessary to build in the skills that they hope will be economically valuable.
Another counterargument you could make is that even if the model could learn these skills on the job, it is just so much more efficient to build them up just once during training rather that again and again for each user or company. And look, it makes a lot of sense to just bake in fluency with common tools like browsers and terminals. Indeed one of the key advantages that AGIs will have is this greater capacity to share knowledge across copies. But people are underrating how much company and context specific skills are required to do most jobs. And there just isnât currently a robust efficient way for AIs to pick up those skills.
Human labor is valuable precisely because itâs not shleppy to train
I was at a dinner with an AI researcher and a biologist. The biologist said she had long timelines. We asked what she thought AI would struggle with. She said her work has recently involved looking at slides and decide if a dot is actually a macrophage or just looks like one. The AI researcher says, âImage classification is a textbook deep learning problemâwe could easily train for that.â
I thought this was a very interesting exchange, because it revealed a key crux between me and the people who expect transformative economic impacts in the next few years. Human workers are valuable precisely because we donât need to build schleppy training loops for every small part of their job. Itâs not net-productive to build a custom training pipeline to identify what macrophages look like given the way this particular lab prepares slides, then another for the next lab-specific micro-task, and so on. What you actually need is an AI that can learn from semantic feedback or from self directed experience, and then generalize, the way a human does.
Every day, you have to do a hundred things that require judgment, situational awareness, and skills & context learned on the job. These tasks differ not just across different people, but from one day to the next even for the same person. It is not possible to automate even a single job by just baking in some predefined set of skills, let alone all the jobs.
In fact, I think people are really underestimating how big a deal actual AGI will be because theyâre just imagining more of this current regime. Theyâre not thinking about billions of human-like intelligences on a server which can copy and merge all their learnings. And to be clear, I expect this (aka actual AGI) in the next decade or two. Thatâs fucking crazy!
Economic diffusion lag is cope for missing capabilities
Sometimes people will say that the reason that AIs arenât more widely deployed across firms and already providing lots of value (outside of coding) is that technology takes a long time to diffuse. I think this is cope. People are using this cope to gloss over the fact that these models just lack the capabilities necessary for broad economic value.
Steven Byrnes has an excellent post on this and many other points:
New technologies take a long time to integrate into the economy? Well ask yourself: how do highly-skilled, experienced, and entrepreneurial immigrant humans manage to integrate into the economy immediately? Once youâve answered that question, note that AGI will be able to do those things too.
If these models were actually like humans on a server, theyâd diffuse incredibly quickly. In fact, theyâd be so much easier to integrate and onboard than a normal human employee (they could read your entire Slack and Drive in minutes and immediately distill all the skills your other AI employees have). Plus, hiring is very much like a lemons market, where itâs hard to tell who the good people are, and hiring someone bad is quite costly. This is a dynamic you wouldnât have to worry about when you just wanna spin up another instance of a vetted AGI model.
For these reasons, I expect itâs going to be much much easier to diffuse AI labor into firms than it is to hire a person. And companies hire lots of people all the time. If the capabilities were actually at AGI level, people would be willing to spend trillions of dollars a year buying tokens (knowledge workers cumulatively earn 10s of trillions of dollars of wages a year). The reason that lab revenue are 4 orders of magnitude off right now is that the models are nowhere near as capable as human knowledge workers.
Goal post shifting is justified
AI bulls will often criticize AI bears for repeatedly moving the goal posts. This is often fair. AI has made a ton of progress in the last decade, and itâs easy to forget that.
But some amount of goal post shifting is justified. If you showed me Gemini 3 in 2020, I would have been certain that it could automate half of knowledge work. We keep solving what we thought were the sufficient bottlenecks to AGI (general understanding, few shot learning, reasoning), and yet we still donât have AGI (defined as, say, being able to completely automate 95% of knowledge work jobs). What is the rational response?
Itâs totally reasonable to look at this and say, âOh actually thereâs more to intelligence and labor than I previously realized. And while weâre really close to (and in many ways have surpassed) what I would have defined as AGI in the past, the fact that model companies are not making trillions is revenue clearly reveals that my previous definition of AGI was too narrow.â
I expect this to keep happening into the future. I expect that by 2030 that the labs will have made significant progress on my hobby horse of continual learning, and the models will start earning 100s of billions in revenue, but they wonât have automated all knowledge work, and Iâll be like, âWeâve made a lot of progress, but weâre not at AGI yet. We also need X, Y, and Z thing to get to trillions in revenue.â
Models keep getting more impressive at the rate the short timelines people predict, but more useful at the rate the long timelines people predict.
RL scaling is laundering the prestige of pretraining scaling
With pretraining, we had this extremely clean and general trend in improvement in loss across multiple orders of magnitude of compute (albeit on a power law, which is as weak as exponential growth is strong). People are trying to launder the presitge of pretraining scaling, which was almost as predictable as a physical law of the universe, to justify bullish projections about RLVR, for which we have no well fit publicly known trend. When intrepid researchers do try to piece together the implications from scarce public datapoints, they get quite bearish results. For example, Toby Ord has a great post where he cleverly connects the dots between different o-series benchmark charts, which suggested âwe need something like a 1,000,000x scale-up of total RL compute to give a boost similar to a GPT levelâ.
Comparison to human distribution will make us at first overestimate (and then underestimate) AI
There is huge variance in the amount of value that different humans can add, especially in white collar with its O-ring dynamics. The village idiot adds ~0 value to knowledge work, while top AI researchers are worth billions of dollars to Mark Zuckerberg.
AI models at any given snapshot of time, however, are roughly equally capable. Humans have all this variance, whereas AI models donât. Because a disproportionate share of value-add in knowledge work comes from the top percentile humans, if we try to compare the intelligence of these AI models to the median human, then we will systematically overestimate the value they can generate. But by the same token, when models finally do match top human performance, their impact might be quite explosive.
Broadly deployed intelligence explosion
People have spent a lot of time talking about a software only singularity (where AI models write the code for a smarter successor system), a software + hardware singularity (where AIs also improve their successorâs computing hardware), or variations therein.
All these scenarios neglect what I think will be the main driver of further improvements atop AGI: continual learning. Again, think about how humans become more capable at anything. Itâs mostly from experience in the relevant domain.
Over conversation, Beren Millidge made the interesting suggestion that the future might look continual learning agents going out, doing jobs and generating value, and then bringing all their learnings back to the hive mind model, which does some kind of batch distillations on all these agents. The agents themselves could be quite specialized - containing what Karpathy called âthe cognitive coreâ plus knowledge and skills relevant to the job theyâre being deployed to do.
âSolvingâ continual learning wonât be a singular one-and-done achievement. Instead, it will feel like solving in context learning. GPT-3 demonstrated that in context learning could be very powerful (its ICL capabilities were so remarkable that the title of the GPT-3 paper is âLanguage Models are Few-Shot Learnersâ). But of course, we didnât âsolveâ in-context learning when GPT-3 came out - and indeed thereâs plenty of progress still to be made, from comprehension to context length. I expect a similar progression with continual learning. Labs will probably release something next year which they call continual learning, and which will in fact count as progress towards continual learning. But human level continual learning may take another 5 to 10 years of further progress.
This is why I donât expect some kind of runaway gains to the first model that cracks continual learning, thus getting more and more widely deployed and capable. If you had fully solved continual learning drop out of nowhere, then sure, itâs âgame set matchâ, as Satya put it. But thatâs not whatâs going to happen. Instead, some lab is going to figure out how to get some initial traction on the problem. Playing around with this feature will make it clear how it was implemented, and the other labs will soon replicate this breakthrough and improve it slightly.
Thereâll also probably be diminishing returns from learning-from-deployment. Each of the first 1000 consultant agents are each learning a ton from deployment. Less so the next 1000. And is there such a long tail to consultant work that the millionth deployed instance is likely to see something super important the other 999,999 instances missed? In fact, I wouldnât be surprised if continual learning also ends up leading to a power law, but with respect to the number of instances deployed.
Besides, I just have some prior that competition will stay fierce, informed by the observation that all these previous supposed flywheels (user engagement on chat, synthetic data, etc) have done very little to diminish the greater and greater competition between model companies. Every month (or less), the big three will rotate around the podium, with other competitors not that far behind. There is some force (potentially talent poaching, rumor mills, or reverse engineering) which has so far neutralized any runaway advantages a single lab might have had.
Noah Birnbaum @ 2026-02-01T15:41 (+9)
Vasco, you are the king of cross-posting good content onto the EA forum.
Vasco Grilođ¸ @ 2026-02-01T15:54 (+3)
Thanks, Noah!