Selection Bias in Observational Estimates of Algorithmic Progress

By Parker_Whitfill @ 2025-08-18T01:48 (+23)

This is a linkpost to http://arxiv.org/abs/2508.11033

This is a short research note I wrote about the difficulty of estimating algorithmic progress from observational data. For expositional purposes, I focus on Epoch's paper on algorithmic progress in language models, although the general lesson applies broadly.

I show that estimates of algorithmic progress can be contaminated by selection bias if within the same year, compute levels are correlated with algorithm quality. If better algorithms are correlated with larger compute usage, then current estimates of algorithmic progress are too low. If better algorithms are correlated with lower compute use, then current estimates of algorithmic progress are too high. Moreover, this bias can be quantitatively large. I run a basic simulation where algorithmic progress can be under/overstated by up to 4x depending on the degree of selection. Speculatively applying experimental results gives weak evidence that Epoch's estimate of algorithmic progress may be overstated by 9x.