Scaling of AI training runs will slow down after GPT-5

By Maxime Riché @ 2024-04-26T16:06 (+10)

My credence: 33% confidence in the claim that the growth in the number of GPUs used for training SOTA AI will slow down significantly directly after GPT-5. It is not higher because of (1) decentralized training is possible, and (2) GPT-5 may be able to increase hardware efficiency significantly, (3) GPT-5 may be smaller than assumed in this post, (4) race dynamics.

TLDR: Because of a bottleneck in energy access to data centers and the need to build OOM larger data centers.

Update: See Vladimir_Nesov's comment or Erich_Grunewald's comment for why this claim is likely wrong, since decentralized training seems to be solved. 

The reasoning behind the claim:

Unrelated to the claim:

How big is that effect going to be?

Using values from: https://epochai.org/blog/the-longest-training-run, we have estimates that in a year, the effective compute is increased by:

Let's assume GPT-5 is using 10 times more GPUs than GPT-4 for training. 250k GPUs would mean around 250MW needed for training. This is already larger than the largest data center reported in this article... Then, moving to GPT-6 with 2.5M GPUs would require 2.5 GW.

Building the infrastructure for GPT-6 may require a few years (e.g., using existing power plants and building a 2.5M GPU data center). For reference, OpenAI and Microsoft seem to have a $100B data center project going until 2028 (4 years); that’s worth around 3M B200 GPUs (at $30k per units).

Building the infrastructure for GPT-7 may require even more time (e.g., building 25 power plant units).

If the infrastructure for GPT-6 takes 4 years to be assembled, then the increase in GPUs is limited to 1 OOM in 4 years (~ x1.8/year).

The total growth rate between GPT-4 and GPT-5 is x22/year or x6.2/year when using investment growth values from before ChatGPT.

Taking into account the decrease in the growth of investment in training runs, the total growth rate between GPT-5 and GPT-6 would then be x4/year. The growth rate would be divided by 5.5 or by 1.55 when using values from before ChatGPT. 

These estimates assume no efficient decentralized training.

Impact of GPT-5

One could assume that software and hardware efficiency will have a growth rate increased by something like 100% because of the increased productivity from GPT-5 (vs before ChatGPT). 

In that case, the growth rate of effective compute after GPT-5 would be significantly above the growth rate before ChatGPT (~ x8.8/year vs. ~ x6/year before ChatGPT).


Erich_Grunewald @ 2024-04-26T16:28 (+10)

A 10-fold increase in the number of GPUs above GPT-5 would require a 1 to 2.5 GW data center, which doesn’t exist and would take years to build, OR would require decentralized training using several data centers. Thus GPT-5 is expected to mark a significant slowdown in scaling runs.

Why do you think decentralized training using several data centers will lead to a significant slowdown in scaling runs? Gemini was already trained across multiple data centers.

SummaryBot @ 2024-04-26T16:37 (+1)

Executive summary: The scaling of AI training runs is expected to slow down significantly after GPT-5 due to the unsustainable power consumption required to continue scaling at the current rate, which would necessitate the equivalent of multiple nuclear power plants.

Key points:

  1. Current large data centers consume around 100 MW of power, limiting the number of GPUs that can be supported.
  2. GPT-4 used an estimated 15k to 25k GPUs, requiring 15 to 25 MW of power.
  3. A 10-fold increase in GPUs above GPT-5 would require a 1 to 2.5 GW data center, which doesn't exist and would take years to build.
  4. After GPT-5, the focus will shift to improving software efficiency, scaling at inference time, and decentralized training using multiple data centers.
  5. Scaling GPUs will be slowed down by regulations on lands, energy production, and build time, potentially leading to the construction of training data centers in low-regulation countries.
  6. The total growth rate of effective compute is expected to decrease significantly after GPT-5, from ~x22/year (or x6.2/year using pre-ChatGPT investment growth values) to ~x4/year, assuming no efficient decentralized training is developed.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.