Will AI Avoid Exploitation? (Adam Bales)
By Global Priorities Institute @ 2023-12-13T11:37 (+38)
This paper was published as a GPI working paper in December 2023.
Introduction
Recent decades have seen rapid progress in artificial intelligence (AI). Some people expect that in the coming decades, further progress will lead to the development of AI systems that are at least as cognitively capable as humans (see Zhang et al., 2022). Call such systems artificial general intelligences (AGIs). If we develop AGI then humanity will come to share the Earth with agents that are as cognitively sophisticated as we are.
Even in the abstract, this seems like a momentous event: while the analogy is imperfect, the development of AGI would have some similarity to the encountering of an intelligent alien species who intend to make the Earth their home. Less abstractly, it has been argued that AGI could have profound economic implications, impacting growth, employment and inequality (Korinek & Juelfs, Forthcoming; Trammell & Korinek, 2020). And it has been argued that AGI could bring with it risks, including those arising from human misuse of powerful AI systems (Brundage et al., 2018; Dafoe, 2018) and those arising more directly from the AI systems themselves (Bostrom, 2014; Carlsmith, Forthcoming).
Given the potential stakes, it would be desirable to have some sense of what AGIs will be like if we develop them. Knowing this might help us prepare for a world where such systems are present. Unfortunately, it's difficult to speculate with confidence about what hypothetical future AI systems will be like.
However, a surprisingly simple argument suggests we can make predictions about the behaviour of AGIs (this argument is inspired by Omohundro, 2007, 2008; Yudkowsky, 2019).3 According to this argument, we should expect AGIs to behave as if maximising expected utility (EU).
In rough terms, the argument claims that unless an agent decides by maximising EU it will be possible to offer them a series of trades that leads to a guaranteed loss of some valued thing (an agent that's susceptible to such trades is said to be exploitable). Sufficiently sophisticated systems are unlikely to be exploitable, as exploitability plausibly interferes with acting competently, and sophisticated systems are likely to act competently. So, the argument concludes, sophisticated systems are likely to be EU maximisers. I'll call this the EU argument.
In this paper, I'll discuss this argument in detail. In doing so, I'll have four aims. First, I'll show that the EU argument fails. Second, I'll show that reflecting on this failure is instructive: such reflection points us towards more nuanced and plausible alternative arguments. Third, the nature of these more nuanced arguments will highlight the limitations of our models of AGI, in a way that encourages us to adopt a pluralistic approach. And fourth, reflecting on such models will suggest that at least sometimes what matters is less developing a formal model of an AGI's decision-making procedure and more clarifying what sort of goals, if any, an AGI is likely to develop. So while my discussion will focus on the EU argument, I'll conclude with more general lessons about modelling AGI.