A list of good heuristics that the case for AI X-risk fails

By Aaron Gertler 🔸 @ 2020-07-16T09:56 (+25)

This is a linkpost to https://www.alignmentforum.org/posts/bd2K3Jdz82csjCFob/a-list-of-good-heuristics-that-the-case-for-ai-x-risk-fails

Note: I'm crossposting an Alignment Forum post, written by David Krueger, which contained a list I thought was worth building out further. There are some good comments at the original link!


I think one reason machine learning researchers don't think AI x-risk is a problem is because they haven't given it the time of day. And on some level, they may be right in not doing so!

We all need to do meta-level reasoning about what to spend our time and effort on. Even giving an idea or argument the time of day requires it to cross a somewhat high bar, if you value your time. Ultimately, in evaluating whether it's worth considering a putative issue (like the extinction of humanity at the hands (graspers?) of a rogue AI), one must rely on heuristics; by giving the argument the time of day, you've already conceded a significant amount of resources to it! Moreover, you risk privileging the hypothesis or falling victim to Pascal's Mugging.

Unfortunately, the case for x-risk from out-of-control AI systems seems to fail many powerful and accurate heuristics. This can put proponents of this issue in a similar position to flat-earth conspiracy theorists at first glance. My goal here is to enumerate heuristics that arguments for AI takeover scenarios fail.

Ultimately, I think machine learning researchers should not refuse to consider AI x-risk when presented with a well-made case by a person they respect or have a personal relationship with, but I'm ambivalent as to whether they have an obligation to consider the case if they've only seen a few headlines about Elon. I do find it a bit hard to understand how one doesn't end up thinking about the consequences of super-human AI, since it seems obviously impactful and fascinating. But I'm a very curious (read "distractable") person...

A list of heuristics that say not to worry about AI takeover scenarios:

I'm pretty sure this list in incomplete, and I plan to keep adding to it as I think of or hear new suggestions! Suggest away!!

Also, to be clear, I am writing these descriptions from the perspective of someone who has had very limited exposure to the ideas underlying concerns about AI takeover scenarios. I think a lot of these reactions indicate significant misunderstandings about what people working on mitigating AI x-risk believe, as well as matters of fact (e.g. a number of experts have voiced concerns about AI x-risk, and a significant portion of the research community seems to agree that these concerns are at least somewhat plausible and important).


irving @ 2020-07-16T11:54 (+17)

As a meta-comment, I think it's quite unhelpful that some of these "good heuristics" are written as intentional strawmen where the author doesn't believe the assumptions hold. E.g., the author doesn't believe that there are no insiders talking about X-risk. If you're going to write a post about good heuristics, maybe try to make the good heuristic arguments actually good? This kind of post mostly just alienates me from wanting to engage in these discussions, which is a problem given that I'm one of the more senior AGI safety researchers.

vaniver @ 2020-07-17T02:41 (+11)

Presumably there are two categories of heuristics, here: ones which relate to actual difficulties in discerning the ground truth, and ones which are irrelevant or stem from a misunderstanding. I think it seems bad that this list implicitly casts the heuristics as being in the latter category, and rather than linking to why each is irrelevant or a misunderstanding it does something closer to mocking the concern.

For example, I would decompose the "It's not empirically testable" heuristic into two different components. The first is something like "it's way easier to do good work when you have tight feedback loops, and a project that relates to a single shot opportunity without a clear theory simply cannot have tight feedback loops." This was the primary reason I stayed away from AGI safety for years, and still seems to me like a major challenge to research work here. [I was eventually convinced that it was worth putting up with this challenge, however.]

The second is something like "only trust claims that have been empirically verified", which runs into serious problems with situations where the claims are about the future, or running the test is ruinously expensive. A claim that 'putting lamb's blood on your door tonight will cause your child to be spared' is one that you have to act on (or not) before you get to observe whether or not it will be effective, and so whether or not this heuristic helps depends on whether or not it's possible to have any edge ahead of time on figuring out which such claims are accurate.

irving @ 2020-07-17T16:27 (+4)

Yes, the mocking is what bothers me. In some sense the wording of the list means that people on both sides of the question could come away feeling justified without a desire for further communication: AGI safety folk since the arguments seem quite bad, and AGI safety skeptics since they will agree that some of these heuristics can be steel-manned into a good form.

Aaron Gertler @ 2020-07-22T09:58 (+2)

I wouldn't have used the word "Good" in the title if the author hadn't, and I share your annoyance with the way the word is being used. 

As for the heuristics themselves, I think that even the strawmen capture beliefs that many people hold, and I could see it being useful for people to have a "simple" version of a given heuristic to make this kind of list easier to hold in one's mind. However, I also think the author could have achieved simplicity without as much strawmanning.

Halstead @ 2020-07-16T14:15 (+11)

Phil Trammell makes a similar point about how we should update from the fact that smart people refuse to engage with AI risk arguments:

" The upshot here seems to be that when a lot of people disagree with the experts on some issue, one should often give a lot of weight to the popular disagreement, even when one is among the experts and the people's objections sound insane. Epistemic humility can demand more than deference in the face of peer disagreement: it can demand deference in the face of disagreement from one's epistemic inferiors, as long as they're numerous. They haven't engaged with the arguments, but there is information to be extracted from the very fact that they haven't bothered engaging with them. "

MichaelA @ 2020-07-16T10:53 (+11)

I'd second the suggestion of checking out the comments.

There are also a few extra comments on the LessWrong version of the post, which aren't on the Alignment Forum version.

Here's the long-winded comment I made there:

I think this list is interesting and potentially useful, and I think I'm glad you put it together. I also generally think it's a good and useful norm for people to seriously engage with the arguments they (at least sort-of/overall) disagree with.

But I'm also a bit concerned about how this is currently presented. In particular:

I think these things will have relatively small downsides, given the likely quite informed and attentive audience here. But a bunch of psychological research I read a while ago (2015-2017) suggests there could be some degree of downsides. E.g.:

Information that initially is presumed to be correct, but that is later retracted or corrected, often continues to influence memory and reasoning. This occurs even if the retraction itself is well remembered. The present study investigated whether the continued influence of misinformation can be reduced by explicitly warning people at the outset that they may be misled. A specific warning--giving detailed information about the continued influence effect (CIE)--succeeded in reducing the continued reliance on outdated information but did not eliminate it. A more general warning--reminding people that facts are not always properly checked before information is disseminated--was even less effective. In an additional experiment, a specific warning was combined with the provision of a plausible alternative explanation for the retracted information. This combined manipulation further reduced the CIE but still failed to eliminate it altogether.

And also:

Information presented in news articles can be misleading without being blatantly false. Experiment 1 examined the effects of misleading headlines that emphasize secondary content rather than the article’s primary gist. [...] We demonstrate that misleading headlines affect readers’ memory, their inferential reasoning and behavioral intentions, as well as the impressions people form of faces. On a theoretical level, we argue that these effects arise not only because headlines constrain further information processing, biasing readers toward a specific interpretation, but also because readers struggle to update their memory in order to correct initial misconceptions.

Based on that sort of research (for a tad more info on it, see here), I'd suggest:

I also recognise that several of the heuristics really do seem good, and probably should make us at least somewhat less concerned about AI. So I'm not suggesting trying to make the heuristics all sound deeply flawed. I'm just suggesting perhaps being more careful not to end up with some readers' brains, on some level, automatically processing all of these heuristics as definite truths that definitely suggest AI x-risk isn't worth of attention.

Sorry for the very unsolicited advice! It's just that preventing gradual slides into false beliefs (including from well-intentioned efforts that do actually contain the truth in them!) is sort of a hobby-horse of mine.

[And then I replied to myself with the following]

Also, one other heuristic/proposition that, as far as I'm aware, is simply factually incorrect (rather than "flawed but in debatable ways" or "actually pretty sound") is "AI researchers didn't come up with this concern, Hollywood did. Science fiction is constructed based on entertaining premises, not realistic capabilities of technologies." So there it may also be worth pointing out in some manner that, in reality, quite early on prominent AI researchers raised concerns somewhat similar to those discussed now.

E.g., I. J. Good apparently wrote in 1959:

Whether [an intelligence explosion] will lead to a Utopia or to the extermination of the human race will depend on how the problem is handled by the machines. The important thing will be to give them the aim of serving human beings.
Ben Pace @ 2020-07-16T21:36 (+9)

(I would include the original author’s name somewhere in the crosspost, especially at the top.)

Aaron Gertler @ 2020-07-17T11:45 (+7)

Done.

Halstead @ 2020-07-16T14:11 (+6)

I'm not sure whether this counts as a heuristic or not, but anyway... I think an important point is that it's not clear what the incentive problem is, or (more weakly) that the incentive problem is nowhere near as bad as suggested in classical AI risk arguments. Designers of AI systems have very strong incentives for them not to be as useless as painted in classical AI risk arguments. If you were running a paperclip company, you would very strongly want to figure out a way to make a paperclip maximiser that doesn't use people's atoms to make paperclips, and there would be lots of opportunities to learn as you go on this front - there would be incentives for capabilities to develop in tow with safety

The incentive problem is made better by the fact that the market for AI development is quite concentrated - there are <10 major players in the field who currently look most likely to make very advanced AGIs. Thus, you only need to get coordination on safety from <10 actors. In contrast, biotechnology is comparatively a complete nightmare from a coordination point of view, requiring coordination among pretty much all states. Climate change also seems difficult on this front, though less bad than bio.