Potential Risks from Advanced Artificial Intelligence: The Philanthropic Opportunity

By Holden Karnofsky @ 2016-05-06T12:55 (+2)

This is a linkpost to https://www.openphilanthropy.org/blog/potential-risks-advanced-artificial-intelligence-philanthropic-opportunity

We’re planning to make potential risks from artificial intelligence a major priority this year. We feel this cause presents an outstanding philanthropic opportunity — with extremely high importance, high neglectedness, and reasonable tractability (our three criteria for causes) — for someone in our position. We believe that the faster we can get fully up to speed on key issues and explore the opportunities we currently see, the faster we can lay the groundwork for informed, effective giving both this year and in the future.

With all of this in mind, we’re placing a larger “bet” on this cause, this year, than we are placing even on other focus areas — not necessarily in terms of funding (we aren’t sure we’ll identify very large funding opportunities this year, and are more focused on laying the groundwork for future years), but in terms of senior staff time, which at this point is a scarcer resource for us. Consistent with our philosophy of hits-based giving, we are doing this not because we have confidence in how the future will play out and how we can impact it, but because we see a risk worth taking. In about a year, we’ll formally review our progress and reconsider how senior staff time is allocated.

This post will first discuss why I consider this cause to be an outstanding philanthropic opportunity. (My views are fairly representative, but not perfectly representative, of those of other staff working on this cause.) It will then give a broad outline of our planned activities for the coming year, some of the key principles we hope to follow in this work, and some of the risks and reservations we have about prioritizing this cause as highly as we are.

In brief:

My views on this cause have evolved considerably over time. I will discuss the evolution of my thinking in detail in a future post, but this post focuses on the case for prioritizing this cause today.

Importance

It seems to me that AI and machine learning research is currently on a very short list of the most dynamic, unpredictable, and potentially world-changing areas of science.[1] In particular, I believe that this research may lead eventually to the development of transformative AI, which we have roughly and conceptually defined as AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution. I believe there is a nontrivial likelihood (at least 10% with moderate robustness, and at least 1% with high robustness) that transformative AI will be developed within the next 20 years. For more detail on the concept of transformative AI (including a more detailed definition), and why I believe it may be developed in the next 20 years, see our previous post.

I believe that today’s AI systems are accomplishing a significant amount of good, and by and large, I expect the consequences of further progress on AI — whether or not transformative AI is developed soon — to be positive. Improvements in AI have enormous potential to improve the speed and accuracy of medical diagnosis; reduce traffic accidents by making autonomous vehicles more viable; help people communicate with better search and translation; facilitate personalized education; speed up science that can improve health and save lives; accelerate development of sustainable energy sources; and contribute on a huge number of other fronts to improving global welfare and productivity. As I’ve written before, I believe that economic and technological development have historically been highly beneficial, often despite the fact that particular developments were subject to substantial pessimism before they played out. I also expect that if and when transformative AI is very close to development, many people will be intensely aware of both the potential benefits and risks, and will work to maximize the benefits and minimize the risks.

With that said, I think the risks are real and important:

The above risks could be amplified if AI capabilities improved relatively rapidly and unexpectedly, making it harder for society to anticipate, prepare for, and adapt to risks. This dynamic could (though won’t necessarily) be an issue if it turns out that a relatively small number of conceptual breakthroughs turn out to have very general applications.

If the above reasoning is right (and I believe much of it is highly debatable, particularly when it comes to my previous post’s arguments as well as the importance of accident risks), I believe it implies that this cause is not just important but something of an outlier in terms of importance, given that we are operating in an expected-value framework and are interested in low-probability, high-potential-impact scenarios.[2] The underlying stakes would be qualitatively higher than those of any issues we’ve explored or taken on under the U.S. policy category, to a degree that I think more than compensates for e.g. a “10% chance that this is relevant in the next 20 years” discount. When considering other possible transformative developments, I can’t think of anything else that seems equally likely to be comparably transformative on a similar time frame, while also presenting such a significant potential difference between best- and worst-case imaginable outcomes.

One reason that I’ve focused on a 20-year time frame is that I think this kind of window should, in a sense, be considered “urgent” from a philanthropist’s perspective. I see philanthropy as being well-suited to low-probability, long-term investments. I believe there are many past cases in which it took a very long time for philanthropy to pay off,[3] especially when its main value-added was supporting the gradual growth of organizations, fields and research that would eventually make a difference. If I thought there were negligible probability of transformative AI in the next 20 years, I would still consider this cause important enough to be a focus area for us, but we would not be prioritizing it as highly as we plan to this year.

The above has focused on potential risks of transformative AI. There are also many potential AI developments short of transformative AI that could be very important. For example:

We are interested in these potential developments, and see the possibility of helping to address them as a potential benefit of allocating resources to this cause. With that said, my previously expressed views, if correct, would imply that most of the “importance” (as we’ve defined it) in this cause comes from the enormously high-stakes possibility of transformative AI.

Neglectedness

Both artificial intelligence generally and potential risks have received increased attention in recent years.[4] We’ve put substantial work into trying to ensure that we have a thorough landscape of the researchers, funders, and key institutions in this space. We will later be putting out a landscape document, which will be largely consistent with the landscape we published last year. In brief:

Bottom line - I consider this cause to be highly neglected, particularly by philanthropists, and I see major gaps in the relevant fields that a philanthropist could potentially help to address.

Tractability

It’s been the case for a long time that I see this cause as important and neglected, and that my biggest reservation has been tractability. I see transformative AI as very much a future technology – I’ve argued that there is a nontrivial probability that it will be developed in the next 20 years, but it is also quite plausibly more than 100 years away, and even 20 years is a relatively long time. Working to reduce risks from a technology that is so far in the future, and about which so much is still unknown, could easily be futile.

With that said, this cause is not as unique in this respect as it might appear at first. I believe that one of the things philanthropy is best-positioned to do is provide steady, long-term support as fields and institutions grow. This activity is necessarily slow. It requires being willing to support groups based largely on their leadership and mission, rather than immediate plans for impact, in order to lay the groundwork for an uncertain future. I’ve written about this basic approach in the context of policy work, and I believe there is ample precedent for it in the history of philanthropy. It is the approach we favor for several of our other focus areas, such as immigration policy and macroeconomic stabilization policy.

And I have come to believe that there is potentially useful work to be done today that could lay the groundwork for mitigating future potential risks. In particular:

I think there are important technical challenges that could prove relevant to reducing accident risks.

Added June 24: for more on technical challenges, see Concrete Problems in AI Safety.

I’ve previously put significant weight on an argument along the lines of, “By the time transformative AI is developed, the important approaches to AI will be so different from today’s that any technical work done today will have a very low likelihood of being relevant.” My views have shifted significantly for two reasons. First, as discussed previously, I now think there is a nontrivial chance that transformative AI will be developed in the next 20 years, and that the above-quoted argument carries substantially less weight when focusing on that high-stakes potential scenario. Second, having had more conversations about open technical problems that could be relevant to reducing risks, I’ve come to believe that there is a substantial amount of work worth doing today, regardless of how long it will be until the development of transformative AI.

Potentially relevant challenges that we’ve come across so far include value learning (designing AI systems to learn the values of other agents through e.g. inverse reinforcement learning); problems having to do with making reinforcement learning systems and other AI agents less likely to behave in undesirable ways (designing reinforcement learning systems that will not try to gain direct control of their rewards, that will avoid behavior with unreasonably far-reaching impacts, and that will be robust against differences between formally specified rewards and human designers’ intentions in specifying those rewards); reliability and usability of machine learning techniques (including transparency, understandability, and robustness against or at least detection of large changes in input distribution); formal specification and verification of deep learning, reinforcement learning, and other AI systems; better theoretical understanding of desirable properties for powerful AI systems; and a variety of challenges related to an approach laid out in a series of blog posts by Paul Christiano.

Going into the details of these challenges is beyond the scope of this post, but to give a sense for non-technical readers of what a relevant challenge might look like, I will elaborate briefly on one challenge. A reinforcement learning system is designed to learn to behave in a way that maximizes a quantitative “reward” signal that it receives periodically from its environment - for example, DeepMind’s Atari player is a reinforcement learning system that learns to choose controller inputs (its behavior) in order to maximize the game score (which the system receives as “reward”), and this produces very good play on many Atari games. However, if a future reinforcement learning system’s inputs and behaviors are not constrained to a video game, and if the system is good enough at learning, a new solution could become available: the system could maximize rewards by directly modifying its reward “sensor” to always report the maximum possible reward, and by avoiding being shut down or modified back for as long as possible. This behavior is a formally correct solution to the reinforcement learning problem, but it is probably not the desired behavior. And this behavior might not emerge until a system became quite sophisticated and had access to a lot of real-world data (enough to find and execute on this strategy), so a system could appear “safe” based on testing and turn out to be problematic when deployed in a higher-stakes setting. The challenge here is to design a variant of reinforcement learning that would not result in this kind of behavior; intuitively, the challenge would be to design the system to pursue some actual goal in the environment that is only indirectly observable, instead of pursuing problematic proxy measures of that goal (such as a “hackable” reward signal).

It appears to me that work on challenges like the above is possible in the near term, and could be useful in several ways. Solutions to these problems could turn out to directly reduce accident risks from transformative AI systems developed in the future, or could be stepping stones toward techniques that could reduce these risks; work on these problems could clarify desirable properties of present-day systems that apply equally well to systems developed in the longer-term; or work on these problems today could help to build up the community of people who will eventually work on risks posed by longer-term development, which would be difficult to do in the absence of concrete technical challenges.

I preliminarily feel that there is also useful work to be done today in order to reduce future misuse risks and provide useful analysis of strategic and policy considerations.

As mentioned above, I would like to see more institutions working on considering different potential scenarios with respect to transformative AI; considering how governments, corporations, and individual researchers should react in those scenarios; and working with machine learning researchers to identify potential signs that particular scenarios are becoming more likely.

I think it’s worth being careful about funding this sort of work, since it’s possible for it to backfire. My current impression is that government regulation of AI today would probably be unhelpful or even counterproductive (for instance by slowing development of AI systems, which I think currently pose few risks and do significant good, and/or by driving research underground or abroad). If we funded people to think and talk about misuse risks, I’d worry that they’d have incentives to attract as much attention as possible to the issues they worked on, and thus to raise the risk of such premature/counterproductive regulation.

With that said, I believe that potential risks have now received enough attention – some of which has been unfortunately exaggerated in my view – that premature regulation and/or intervention by government agencies is already a live risk. I’d be interested in the possibility of supporting institutions that could provide thoughtful, credible, public analysis of whether and when government regulation/intervention would be advisable, even if it meant simply making the case against such things for the foreseeable future. I think such analysis would likely improve the quality of discussion and decision-making, relative to what will happen without it.

I also think that technical work related to accident risks – along the lines discussed above – could be indirectly useful for reducing misuse risks as well. Currently, it appears to me that different people in the field have very different intuitions about how serious and challenging accident risks are. If it turns out that there are highly promising paths to reducing accident risks – to the point where the risks look a lot less serious – this development could result in a beneficial refocusing of attention on misuse risks. (If, by contrast, it turns out that accident risks are large and present substantial technical challenges, this makes work on such risks extremely valuable.)

Other notes on tractability.

Bottom line. I think there are real questions around the extent to which there is work worth doing today to reduce potential risks from advanced artificial intelligence. That said, I see a reasonable amount of potential if there were more people and institutions focused on the relevant issues; given the importance and neglectedness of this cause, I think that’s sufficient to prioritize it highly.

Some Open-Phil-specific considerations

Networks

I consider this a challenging cause. I think it would be easy to do harm while trying to do good. For example:

I think it is important for someone working in this space to be highly attentive to these risks. In my view, one of the best ways to achieve this is to be as well-connected as possible to the people who have thought most deeply about the key issues, including both the leading researchers in AI and machine learning and the people/organizations most focused on reducing long-term risks.

I believe the Open Philanthropy Project is unusually well-positioned from this perspective:

Time vs. money

One consideration that has made me hesitant about prioritizing this cause is the fact that I see relatively little in the way of truly “shovel-ready” giving opportunities. I list our likely priorities in the next section; I think they are likely to be very time-consuming for staff, and I am unsure of how long it will take before we see as many concrete giving opportunities as we do in some of our other focus areas.

By default, I prefer to prioritize causes with significant existing “shovel-ready” opportunities and minimal necessary time commitment, because I consider the Open Philanthropy Project to be short on capacity relative to funding at this stage in our development.

However, I think the case for this cause is compelling enough to outweigh this consideration, and I think a major investment of senior staff time this year could leave us much better positioned to find outstanding giving opportunities in the future.

Our plans

For the last couple of months, we have focused on:

Ultimately, we expect to seek giving opportunities in the following categories:

Getting to this point will likely require a great deal more work and discussion – internally and with the relevant communities more broadly. It could be a long time before we are recommending large amounts of giving in this area, and I think that allocating significant senior staff time to the cause will speed our work considerably.

Some overriding principles for our work

As we work in this space, we think it’s especially important to follow a few core principles:

Don’t lose sight of the potential benefits of AI, even as we focus on mitigating risks

Our work is focused on potential risks, because this is the aspect of AI research that seems most neglected at the moment. However, as stated above, I see many ways in which AI has enormous potential to improve the world, and I expect the consequences of advances in AI to be positive on balance. It is important to act and communicate accordingly.

Deeply integrate people with strong technical expertise in our work

The request for proposals we co-funded last year employed an expert review panel for selecting grantees. We wouldn’t have participated if it had involved selecting grantees ourselves with nontechnical staff. We believe that AI and machine learning researchers are the people best positioned to make many assessments that will be important to us, such as which technical problems seem tractable and high-potential and which researchers have impressive accomplishments.

Seek a lot of input, and reflect a good deal, before committing to major grants and other activities

As stated above, I consider this a challenging cause, where well-intentioned actions could easily do harm. We are seeking to be thoroughly networked and to seek substantial advice on our activities from a range of people, both AI and machine learning researchers and people focused on reduction of potential risks.

Support work that could be useful in a variety of ways and in a variety of scenarios, rather than trying to make precise predictions

I don’t think it’s possible to have certainty, today, about when we should expect transformative AI, what form we should expect it to take, and/or what the consequences will be. We have a preference for supporting work that seems robustly likely to be useful. In particular, one of our main goals is to support an increase in the number of people – particularly people with strong relevant technical backgrounds - dedicated to thinking through how to reduce potential risks.

Distinguish between lower-stakes, higher-stakes, and highest-stakes potential risks

There are many imaginable risks of advanced artificial intelligence. Our focus is likely to be on those that seem to have the very highest stakes, to the point of being potential global catastrophic risks. In our view currently, that means misuse risks and accident risks involving transformative AI. We also consider neglectedness (we prefer to work on risks receiving less attention from others) and tractability (we prefer to work on risks where it seems there is useful work to be done today that can help mitigate them).

Notes on AI and machine learning researchers’ views on the topics discussed here

Over the last couple of months, we have been reaching out to AI and machine learning researchers that we don’t already have strong relationships with in order to discuss our plans and background views and get their feedback. We have put particular effort into seeking out skeptics and potential critics. As of today, we have requested 35 conversations along these lines and had 25. About three-fourths of these conversations have been with tenure-track academics or senior researchers at private labs, and the remainder have been with students or junior researchers at top AI and machine learning departments and private labs.

We’ve heard a diverse set of perspectives. Conversations were in confidence and often time-constrained, so we wouldn’t feel comfortable attributing specific views to specific people. Speaking generally, however, it seems to us that:

Risks and reservations

I see much room for debate in the decision to prioritize this cause as highly as we are. I have discussed most of the risks and reservations I see in this post and the ones preceding it. Here I list the major ones in one place. In this section, my goal is to provide a consolidated list of risks and reservations, but not necessarily to give my comprehensive take on each.

With all of the above noted, I think it is important that a philanthropist in our position be willing to take major risks, and prioritizing this cause is one that I see as very worth taking.

Notes


  1. I’m not in a position to support this claim very systematically, but we have done a substantial amount of investigation and discussion of various aspects of scientific research, as discussed in our recent annual review. In a previous post, I addressed what I see as the most noteworthy other possible major developments in the next 20 years. ↩︎

  2. Here I mean that it scores significantly higher by this criterion than the vast majority of causes, not that it stands entirely alone. I think there are a few other causes that have comparable importance, though none that I think have greater importance, as we’ve defined it. ↩︎

  3. We’ve been accumulating case studies via our History of Philanthropy project, and we expect to publish an updated summary of what we know by the end of 2016. For now, there is some information available at our History of Philanthropy page and in a recent blog post. ↩︎

  4. See our previous post regarding artificial intelligence generally. See our writeup on a 2015 grant to support a request for proposals regarding potential risks. ↩︎