Cognitive Science/Psychology As a Neglected Approach to AI Safety

By Kaj_Sotala @ 2017-06-05T13:46 (+40)

All of the advice on getting into AI safety research that I've seen recommends studying computer science and mathematics: for example, the 80,000 hours AI safety syllabus provides a computer science-focused reading list, and mentions that "Ideally your undergraduate degree would be mathematics and computer science".

There are obvious good reasons for recommending these two fields, and I agree that anyone wishing to make an impact in AI safety should have at least a basic proficiency in them. However, I find it a little concerning that cognitive science/psychology are rarely even mentioned in these guides. I believe that it would be valuable to have more people working in AI safety whose primary background is from one of cogsci/psych, or who have at least done a minor in them.

Here are examples of four lines of research into AI safety which I think could benefit from such a background:

From these four special cases, you could derive more general use cases for psychology and cognitive science within AI safety:

Here I would ideally offer reading recommendations, but the fields are so broad that any given book can only give a rough idea of the basics; and for instance, the topic of world-models that human brains use is just one of many, many subquestions that the fields cover. Thus my suggestion to have some safety-interested people who'd actually study these fields as a major or at least a minor.

Still, if I'd have to suggest a couple of books, with the main idea of getting a basic grounding in the mindsets and theories of the fields so that it would be easier to read more specialized research... on the cognitive psychology/cognitive science side I'd suggest Cognitive Science by Jose Luis Bermudez (haven't read it, but Luke Muehlhauser recommends it and it looked good to me based on the table of contents; see also Luke's follow-up recommendations behind that link); Cognitive Psychology: A Student's Handbook by Michael W. Eysenck & Mark T. Keane; and maybe Sensation and Perception by E. Bruce Goldstein. I'm afraid that I don't know of any good introductory textbooks on the social psychology side.


undefined @ 2017-06-06T13:34 (+11)

I am a psychology PhD student with a background in philosophy/evolutionary psychology. My current research focuses on two main areas: effective altruism and the nature of morality and in particular the psychology of metaethics. My motivation for pursuing the former should be obvious, but my rationale for pursuing the latter is in part self-consciously about the third bullet point, "Defining just what it is that human values are." More basic than even defining what those values are, I am interested in what people take values themselves to be. For instance, we do not actually have good data on the degree to which people regard their own moral beliefs as objective/relative, how common noncognitivist or error theoretic beliefs are in lay populations, etc.

Related to the first point, about developing an AI safety culture, there is also the matter of what we can glean psychologically about how the public likely to receive AI developments. Understanding how people generally perceive AI and technological change more broadly could provide insight that can help us anticipate emerging social issues that result from advances in AI and improve our ability to raise awareness about and increase receptivity to concerns about AI risk among nonexperts, policymakers, the media, and the public. Cognitive science has more direct value than areas like mine (social psychology/philosophy) but my areas of study could serve a valuable auxiliary function to AI safety.

undefined @ 2017-06-05T14:24 (+9)

I think these are all points that many people have considered privately or publicly in isolation, but that thus far no one has explicitly written them down and drawn a connection between them. In particular, lots of people have independently made the observation that ontological crises in AIs are apparently similar to existential angst in humans, ontology identification seems philosophically difficult, and so plausibly studying ontology identification in humans is a promising route to understanding ontology identification for arbitrary minds. So, thank you for writing this up; it seems like something that needed to be written quite badly.

Some other problems that might be easier to tackle from this perspective include mind crime, nonperson predicates, and suffering risk, especially subproblems like suffering in physics.

undefined @ 2017-06-05T16:32 (+7)

Strong agreement. Considerations from cognitive science might also help us to get a handle on how difficult the problem of general intelligence is, and the limits of certain techniques (e.g. reinforcement learning). This could help clarify our thinking on AI timelines as well as the constraints which any AGI must satisfy. Misc. topics that jump to mind are the mental modularity debate, the frame problem, and insight problem solving.

This is a good article on AI from a cog sci perspective: https://arxiv.org/pdf/1604.00289.pdf

undefined @ 2017-06-17T11:32 (+5)

There has recently been an effort started to make the pipeline better for getting people up to speed with AGI safety. I'm trying to champion a broad view of AGI safety including psychology.

Would anyone be interested in providing digested content? It would also be good to have an exit for the pipeline for psychology people interested in AGI. Would that be FHI? Who else would be good to talk to about what is required.

undefined @ 2017-08-16T21:52 (+4)

Excellent post; as a psych professor I agree that psych and cognitive science are relevant to AI safety, and it's surprising that our insights from studying animal and human minds for the last 150 years haven't been integrating into mainstream AI safety work.

The key problem, I think, is that AI safety seems to assume that there will be some super-powerful deep learning system attached to some general-purpose utility function connected to a general-purpose reward system, and we have to get the utility/reward system exactly aligned with our moral interests.

That's not the way any animal mind has ever emerged in evolutionary history. Instead, minds emerge as large numbers of domain-specific mental adaptations to solve certain problems, and they're coordinated by superordinate 'modes of operation' called emotions and motivations. These can be described as implementing utility functions, but that's not their function -- promoting reproductive success is. Some animals also evolve some 'moral machinery' for nepotism, reciprocity, in-group cohesion, norm-policing, and virtue-signaling, but those mechanisms are also distinct and often at odds.

Maybe we'll be able to design AGIs that deviate markedly from this standard 'massively modular' animal-brain architecture, but we have no proof-of-concept for thinking that will work. Until then, it seems useful to consider what psychology has learned about preferences, motivations, emotions, moral intuitions, and domain-specific forms of reinforcement learning.

undefined @ 2017-06-08T21:00 (+3)

I got linked here while browsing a pretty random blog on deep learning, you're getting attention! (https://medium.com/intuitionmachine/seven-deadly-sins-and-ai-safety-5601ae6932c3)

undefined @ 2017-06-09T11:31 (+1)

Neat, thanks for the find. :)

undefined @ 2017-06-06T15:34 (+3)

What is your model of why other people in the AI safety field disagree with you/don't consider this as important as you?

undefined @ 2017-06-06T18:07 (+9)

My main guess is "they mostly come from a math/CS background so haven't looked at this through a psych/cogsci perspective and seen how it could be useful".

That said, some of my stuff linked to above has been mostly met with a silence, and while I presume it's a question of inferential silence - a sufficiently long inferential distance that a claim doesn't provoke even objections, just uncomprehending or indifferent silence - there is also the possibility of me just being so wrong about the usefulness of my ideas that nobody's even bothering to tell me.

undefined @ 2017-06-08T20:01 (+2)

Kaj, I tend to promote your stuff a fair amount to end the inferential silence, and it goes without saying that I agree with all else you said.

Don't give up on your ideas or approach. I am dispirited that there are so few people thinking like you do out there.

Peter McIntyre @ 2017-06-20T18:58 (+1)

Hi Kaj,

Thanks for writing this. Since you mention some 80,000 Hours content, I thought I’d respond briefly with our perspective.

We had intended the career review and AI safety syllabus to be about what you’d need to do from a technical AI research perspective. I’ve added a note to clarify this.

We agree that there a lot of approaches you could take to tackle AI risk, but currently expect that technical AI research will be where a large amount of the effort is required. However, we’ve also advised many people on non-technical routes to impacting AI safety, so don’t think it’s the only valid path by any means.

We’re planning on releasing other guides and paths for non-technical approaches, such as the AI safety policy career guide, which also recommends studying political science and public policy, law, and ethics, among others.

undefined @ 2017-06-20T21:09 (+4)

Hi Peter, thanks for the response!

Your comment seems to suggest that you don't think the arguments in my post are relevant for technical AI safety research. Do you feel that I didn't make a persuasive case for psych/cogsci being relevant for value learning/multi-level world-models research, or do you not count these as technical AI safety research? Or am I misunderstanding you somehow?

I agree that the "understanding psychology may help persuade more people to work on/care about AI safety" and "analyzing human intelligences may suggest things about takeoff scenarios" points aren't related to technical safety research, but value learning and multi-level world-models are very much technical problems to me.

Peter McIntyre @ 2017-06-22T16:10 (+3)

We agree these are technical problems, but for most people, all else being equal, it seems more useful to learn ML rather than cog sci/psych. Caveats:

  1. Personal fit could dominate this equation though, so I'd be excited about people tackling AI safety from a variety of fields.
  2. It's an equilibrium. The more people already attacking a problem using one toolkit, the more we should be sending people to learn other toolkits to attack it.
undefined @ 2017-06-22T20:38 (+3)

it seems more useful to learn ML rather than cog sci/psych.

Got it. To clarify: if the question as framed as "should AI safety researchers learn ML, or should they learn cogsci/psych", then I agree that it seems better to learn ML.