Transformative AI and Compute [Summary]

By lennart @ 2021-09-23T13:53 (+66)

This is the summary of the series Transformative AI and Compute - A holistic approach. You can find the sequence here and the links to the posts below:

0. Executive Summary

This series attempts to:

Introduce a simplified model of computing which serves as a foundational concept (Part 1 - Section 1).
Discuss the role of compute for AI systems (Part 1 - Section 2).
- In Part 1 - Section 2.3 you can find the updated compute plot you have been coming for.
Explore the connection of compute trends and more capable AI systems over time (Part 1 - Section 3).
Discuss the compute component in forecasting efforts on transformative AI timelines (Part 2 - Section 4)
Propose ideas for better compute forecasts (Part 2 - Section 5).
Briefly outline the relevance of compute for AI Governance (Part 3 - Section 6).
Conclude this report and discuss next steps (Section 7).
Provide a list of connected research questions (Appendix A).
Present common compute metrics and discusses their caveats (Appendix B).
Provide a list of Startups in the AI Hardware domain (Appendix C).

Abstract

Modern progress in AI systems has been driven and enabled mainly by acquiring more computational resources. AI systems rely on computation-intensive training runs — they require massive amounts of compute.

Learning about the compute requirements for training existing AI systems and their capabilities allows us to get a more nuanced understanding and take appropriate action within the technical and governance domain to enable a safe development of potential transformative AI systems.

To understand the role of compute, I decided to (a) do a literature review, (b) update existing work with new data, (c) investigate the role of compute for timelines, and lastly, (d) explore concepts to enhance our analysis and forecasting efforts.

In this piece, I present a brief analysis of AI systems’ compute requirements and capabilities, explore compute’s role for transformative AI timelines, and lastly, discuss the compute governance domain.

I find that compute, next to data and algorithmic innovation, is a crucial contributor to the recent performance of AI systems. We identify a doubling time of 6.2 months for the compute requirements of the final training run of state-of-the-art AI systems from 2012 to the present.
Next to more powerful hardware components, the spending on AI systems and the algorithmic innovation are other factors that inform the amount of effective compute available — which itself is a component for forecasting models on transformative AI.

Therefore, as compute is a significant component and driver of AI systems’ capabilities, understanding the developments of the past and forecasting future results is essential. Compared to the other components, the quantifiable nature of compute makes it an exciting aspect for forecasting efforts and the safe development of AI systems.

I consequently recommend additional investigations in highlighted components of compute, especially AI hardware. As compute forecasting and regulations require an in-depth understanding of hardware, hardware spending, the semiconductor industry, and much more, we recommend an interdisciplinary effort to inform compute trends interpretations and forecasts. Those insights can then be used to inform policymaking, and potentially regulate access to compute.

Epistemic Status

This article is Exploratory to My Best Guess. I've spent roughly 300 hours researching this piece and writing it up. I am not claiming completeness for any enumerations. Most lists are the result of things I learned on the way and then tried to categorize.

I have a background in Electrical Engineering with an emphasis on Computer Engineering and have done research in the field of ML optimizations for resource-constrained devices — working on the intersection of ML deployments and hardware optimization. I am more confident in my view on hardware engineering than in the macro interpretation of those trends for AI progress and timelines.

This piece was a research trial to test my prioritization, interest, and fit for this topic. Instead of focusing on a single narrow question, this paper and research trial turned out to be more broad — therefore a holistic approach. In the future, I’m planning to work more focused on a narrow relevant research question within this domain. Please reach out.

Views and mistakes are solely my own.

Highlights per Section

1. Compute

Computation is the manipulation of information.
There are various types of computation. This piece is concerned with today’s predominant type: digital computers.
With compute, we usually refer to a quantity of operations used — computed by our computer/processor. It is also used to refer to computational power, a rate of compute operations per time period.
Computation can be divided into memory, interconnect, and logic. Each of these components is relevant for the performance of the other.
For an introduction to integrated circuits/chips, I recommend “AI Chips: What They Are and Why They Matter” (Khan 2020).

2. Compute in AI Systems

Compute is required for training AI systems. Training requires significantly more compute than inference (the process of using a trained model).
Next to algorithmic innovation and available training data, available compute for the final training run is one of the significant drivers of more capable AI systems.
OpenAI observed in 2018 that since 2012 the amount of compute used in the largest AI training runs has been doubling every 3.4 months.
In our updated analysis (n=57, 1957 to 2021), we observe a doubling time of 6.2 months between 2012 and mid-2021 (n=45).

3. Compute and AI Alignment

Compute is a component (an input) of AI systems that led to the increasing capabilities of modern AI systems. There are reasons to believe that progress in computing capabilities, independent of further progress in algorithmic innovation, might be sufficient to lead to a transformative AI^[1].
Compute is a fairly coherent and quantifiable feature — probably the easiest input to AI progress to make reasonable quantitative estimates of.
- Measuring the quality of the other inputs, data, and algorithmic innovation, is more complex.
- Strongly simplified, compute only consists of one input axis: more or less compute — where we can expect that an increase in compute leads to more capable and potentially unsafe systems.
According to “The Bitter Lesson”, progress arrives via approaches based on scaling computation by search and learning, and not by building knowledge into the systems. Human intuition has been outpaced by more computational resources.
The strong scaling hypothesis is stating that we only need to scale a specific architecture, to achieve transformative or superhuman capabilities — this architecture might already be available. Scaling an architecture implies more compute for the training run.
Cotra's report on biological anchors forecasts the computational power/effective compute required to train systems that resemble, e.g. the human brain’s performance. Those estimates can provide compute milestones for transformative capabilities of AI systems.

4. Forecasting Compute

For transformative AI timeline models with compute milestones, we are interested in how much effective compute we have available at year Y.
- We can break this down into (1) compute costs, (2) compute spending, and (3) algorithmic progress.
Hardware progress: For forecasting hardware progress, no single model can explain the improvements of the last years. Instead, a mix of Moore’s Law, chip architectures, and hardware paradigms are applicable models and categories to think about progress.
- Performance improvements can happen significantly faster than the pure improvement in transistor count and density (Moore’s law) would indicate.
- We will see a fragmentation of applications into the slow and fast lane. High-demand applications will move to the fast lane by designing and benefitting from specialized processors. In contrast, low-demand applications will be stuck in the slow lane running on general-purpose processors. We should assume that AI will be on the fast lane.
Economy of scale: There will either be room for improvement in chip design, or chip design will stabilize which enables an economy of scale. Our hardware will first get better and then get cheaper.
Hardware spending: The current increase in spending is not sustainable; however, it could still significantly increase with estimates up to 1% of (US) GDP (a megaproject, like the Manhattan Project or the Apollo program).
- However, it is still unclear which percentage of the compute trend has been due to increased spending or to more performant hardware.

5. Better Compute Forecasts

Researchers should share insights into their AI system’s training by disclosing the amount of compute used and its connected details.
- Ideally, we should make it a requirement for publications and reduce the technical burden of recording the used compute.
For forecasting hardware progress, we can rely on conceptual models and categories of innovation. We can break this down into:
- Progress in current computing paradigms
- Economy of scale for existing technologies
- Introduction of new computing paradigms
- Unknown unknowns
We should also monitor the dominant design strategy as this informs our forecasts. The three design strategies are (1) hardware-driven algorithm design, (2) algorithm-driven hardware design, and (3) co-develop hardware and algorithm.
Metrics, such as FLOPS/$, often give limited insights, as they only represent one of the three computer components (memory, interconnect, and logic). Understanding their limitations is essential for forecasting.

6. Compute Governance

Compute is a unique AI governance node due to the required physical space, energy demand, and the concentrated supply chain. Those features make it a governable candidate.
Controlling and governing access to compute can be harnessed to achieve better AI safety outcomes, for instance restricting compute access to non-safety-aligned actors.
As compute becomes a dominant factor of costs at the frontier of AI research, it may start to resemble high-energy physics research, where a significant amount of the budget is spent on infrastructure (unlike previous trends of CS research where the equipment costs have been fairly low).

7. Conclusions

In terms of published papers, the research on compute trends, compute spending, and algorithmic efficiency (the field of macro ML research) is minor and more work on this intersection could quickly improve our understanding.
The field is currently bottlenecked by available data on macro ML trends: total compute used to train a model is rarely published, nor is spending. With these, it would be easier to estimate algorithmic efficiency and build better forecasting models.
The importance of compute also highlights the need for ML engineers working on AI safety to be able to deploy gigantic models.
- Therefore, more people should consider becoming an AI hardware expert or working as an ML engineer at safety-aligned organizations and enabling their deployment success.
But also working on the intersection of technology and economics is relevant to inform spending and understanding of macro trends.
Research results in all of the mentioned fields could then be used to inform compute governance.

Acknowledgments

This work was supported and conducted as a summer fellowship at the Stanford Existential Risks Initiative (SERI). Their support is gratefully acknowledged. I am thankful for joining this program and would like to thank the organizers for enabling this, and the other fellows for the insightful discussions.

I am incredibly grateful to Ashwin Acharya and Michael Andregg for their mentoring throughout the project. Michaels thoughts on AI hardware nudged me to reconsider my current research interest and learn more about AI and compute. Ashwin for bouncing off ideas, the wealth of expertise in the domain, and helping me put things into the proper context. Thanks for the input! I was looking forward to every meeting and the thought-provoking discussions.

Thanks to the Swiss Existential Risk Initiative (CHERI) for providing the social infrastructure during my project. Having the opportunity to organize such an initiative in a fantastic team and being accompanied by motivated young researchers is astonishing.

I would like to express my thanks to Jaime Sevilla, Charlie Giattino, Will Hunt, Markus Anderljung, and Christopher Phenicie for your input and discussing ideas.

Thanks to Jaime Sevilla, Jeffrey Ohl, Christopher Phenicie, Aaron Gertler, and Kwan Yee Ng for providing feedback on this piece.

References

Ahmed, Nur, and Muntasir Wahed. 2020. “The De-Democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research.” ArXiv:2010.15581 [Cs], October. http://arxiv.org/abs/2010.15581.
Amodei, Dario, and Danny Hernandez. 2018. “AI and Compute.” OpenAI. May 15, 2018. https://openai.com/blog/ai-and-compute/.
Anderljung, Markus, and Alexis Carlier. 2021. “Some AI Governance Research Ideas.” Some AI Governance Research Ideas - EA Forum. March 6, 2021. https://forum.effectivealtruism.org/posts/kvkv6779jk6edygug/some-ai-governance-research-ideas.
Branwen, Gwern. 2020. “The Scaling Hypothesis,” May. https://www.gwern.net/Scaling-hypothesis.
Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” ArXiv:2005.14165 [Cs], July. http://arxiv.org/abs/2005.14165.
Carey, Ryan. 2018. “Interpreting AI Compute Trends.” AI Impacts. July 10, 2018. https://aiimpacts.org/interpreting-ai-compute-trends/.
Carlsmith, Joseph. 2020. “How Much Computational Power Does It Take to Match the Human Brain?” Open Philanthropy. August 14, 2020. https://www.openphilanthropy.org/brain-computation-report.
Centre for the Governance of AI. 2020. “A Guide to Writing the NeurIPS Impact Statement.” Medium (blog). May 19, 2020. https://medium.com/@GovAI/a-guide-to-writing-the-neurips-impact-statement-4293b723f832.
Cotra, Ajeya. 2020. “Draft Report on AI Timelines.” September 19, 2020. https://www.alignmentforum.org/posts/KrJfoZzpSDpnrv9va/draft-report-on-ai-timelines.
Crox, John. 2019. “On AI and Compute - EA Forum.” EA Forum. March 4, 2019. https://forum.effectivealtruism.org/posts/8wEDjvpcdACvYGQTq/on-ai-and-compute.
Dafoe, Allan. 2018. “AI Governance: A Research Agenda.” Centre for the Governance of AI, Future of Humanity Institute, University of Oxford. https://www.fhi.ox.ac.uk/wp-content/uploads/GovAI-Agenda.pdf.
Davidson, Tom. 2021. “Report on Semi-Informative Priors.” Open Philanthropy. March 25, 2021. https://www.openphilanthropy.org/blog/report-semi-informative-priors.
Finnveden, Lukas. 2020. “Extrapolating GPT-N Performance - AI Alignment Forum.” December 18, 2020. https://www.alignmentforum.org/posts/k2SNji3jXaLGhBeYP/extrapolating-gpt-n-performance.
Garfinkel, Ben. 2018. “Reinterpreting ‘AI and Compute.’” AI Impacts. December 18, 2018. https://aiimpacts.org/reinterpreting-ai-and-compute/.
Grace, Katja, John Salvatier, Allan Dafoe, Baobao Zhang, and Owain Evans. 2016. “2016 Expert Survey on Progress in AI.” AI Impacts (blog). December 14, 2016. https://aiimpacts.org/2016-expert-survey-on-progress-in-ai/.
Hernandez, Danny, and Tom B. Brown. 2020. “Measuring the Algorithmic Efficiency of Neural Networks.” ArXiv:2005.04305 [Cs, Stat], May. http://arxiv.org/abs/2005.04305.
Hestness, Joel, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. 2017. “Deep Learning Scaling Is Predictable, Empirically.” ArXiv:1712.00409 [Cs, Stat], December. http://arxiv.org/abs/1712.00409.
Kaplan, Jared, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. 2020. “Scaling Laws for Neural Language Models.” ArXiv:2001.08361 [Cs, Stat], January. http://arxiv.org/abs/2001.08361.
Khan, Saif M. 2020. “AI Chips: What They Are and Why They Matter.” Center for Security and Emerging Technology (blog). April 2020. https://cset.georgetown.edu/research/ai-chips-what-they-are-and-why-they-matter/.
———. 2021. “The Semiconductor Supply Chain.” Center for Security and Emerging Technology (blog). January 2021. https://cset.georgetown.edu/publication/the-semiconductor-supply-chain/.
Los Alamos National Laboratory. 2013. “Massive Infrastructures Are Needed to Support Supercomputers.” March 25, 2013. https://www.lanl.gov/discover/publications/national-security-science/2013-april/what-is-under the floor-of-a-supercomputer.php.
Lyzhov, Alex. 2021. “‘AI and Compute’ Trend Isn’t Predictive of What Is Happening.” “AI and Compute” Trend Isn’t Predictive of What Is Happening - AI Alignment Forum. February 4, 2021. https://www.alignmentforum.org/posts/wfpdejMWog4vEDLDg/ai-and-compute-trend-isn-t-predictive-of-what-is-happening.
Microsoft Documentation. 2020. “Deploy ML Models to FPGAs - Azure Machine Learning.” September 24, 2020. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-deploy-fpga-web-service.
Mirhoseini, Azalia, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, et al. 2021. “A Graph Placement Methodology for Fast Chip Design.” Nature 594 (7862): 207–12. https://doi.org/10.1038/s41586-021-03544-w.
MLCommons. 2021. “MLCommons<sup>TM</sup> Releases MLPerf<sup>TM</sup> Training v1.0 Results.” MLCommons. June 30, 2021. https://mlcommons.org/.
Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. “Language Models Are Unsupervised Multitask Learners,” February, 24.
Sevilla, Jaime, Pablo Villalobos, Juan Felipe Cerón, Matthew Burtell, and Lennart Heim. 2021. “Parameter, Compute and Data Trends in Machine Learning.” Google Sheets (blog). June 19, 2021. https://docs.google.com/spreadsheets/d/1AAIebjNsnJj_uKALHbXNfn3_YsT6sHXtCU0q7OIPuc4/.
Shalf, John. 2020a. “The Future of Computing beyond Moores Law.” Philosophical Transactions of the Royal Society A, March. https://doi.org/10.1098/rsta.2019.0061.
———. 2020b. “Computing Beyond Moore’s Law.” July 14. https://cs.lbl.gov/assets/CSSSP-Slides/20200714-Shalf.pdf.
Sutton, Rich. 2019. “The Bitter Lesson.” March 13, 2019. http://www.incompleteideas.net/IncIdeas/BitterLesson.html.
Thompson, Neil C., Kristjan Greenewald, Keeheon Lee, and Gabriel F. Manso. 2020. “The Computational Limits of Deep Learning.” ArXiv:2007.05558 [Cs, Stat], July. http://arxiv.org/abs/2007.05558.
Thompson, Neil C., and Svenja Spanuth. 2021. “The Decline of Computers as a General Purpose Technology.” Communications of the ACM 64 (3): 64–72. https://doi.org/10.1145/3430936.

Transformative AI, as defined by Open Philanthropy in this blogpost: “Roughly and conceptually, transformative AI is AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.” ↩︎

MichaelA @ 2021-09-29T10:59 (+2)

Very minor comment, but I think there's a mistaken pair of commas in the first sentence here, which confuses the sentence a bit:

Applications, with enough demand, will move to the fast lane by designing and benefitting from specialized processors. In contrast, others will be stuck in the slow lane running on general-purpose processors.

lennart @ 2021-09-30T07:58 (+1)

Thanks, I've edited it.

MichaelA @ 2021-09-29T10:58 (+2)

Thanks for this - I really appreciated how clear and focused-on-actual-takeaways (rather than "I'll discuss X, Y, Z topic") this summary is.

OpenAI observed in 2018 that since 2012 the amount of compute used in the largest AI training runs has been doubling every 3.4 months.
In our updated analysis (n=57, 1957 to 2021), we observe a doubling time of 6.2 months between 2012 and mid-2021.

What is n counting here? Like studies included in the lit review?
Why is n referring to things from 1957-2021 if the conclusion given is just about 2012-2021? It seems like the relevant n would be just the items about 2012-2021?
Is the difference between your estimate and OpenAI's because 2018-2021 involved much slower doubling times (perhaps three times as long?), because OpenAI missed or misinterpreted some things that happened in 2012-2018, because you're including pre-2012 things and those involved slower doubling times, or some mix of those factors?

(Probably I can find the answers in your next post, but maybe this is worth clarifying here too?)

lennart @ 2021-09-30T07:40 (+1)

Thanks, Michael.

n is counting the number of ML systems in the analysis at the point of writing. (We have added more systems in the meantime). An example for such a system is GPT-3, AlphaFold, etc. - basically a row in our dataset.
Right, good point. I'll add the number of systems for the given time period.
That's hard to answer. I don't think OpenAI misinterpreted anything. For the moment, I think it's probably a mixture of:
- the inclusion criteria for the systems on which we base this trend
- actual slower doubling times for reasons which we should figure out Nonetheless, as outlined in Part 1 - Section 2.3, I did not interpret those trends yet but I'm interested in a discussion and trying to write up my thoughts on this in the future.

MarkusAnderljung @ 2021-09-24T22:00 (+2)

Thanks for this! I really look forward to seeing the rest of the sequence, especially on the governance bits.