Cognitive Science/Psychology As a Neglected Approach to AI Safety

By Kaj_Sotala @ 2017-06-05T13:46 (+40)

All of the advice on getting into AI safety research that I've seen recommends studying computer science and mathematics: for example, the 80,000 hours AI safety syllabus provides a computer science-focused reading list, and mentions that "Ideally your undergraduate degree would be mathematics and computer science".

There are obvious good reasons for recommending these two fields, and I agree that anyone wishing to make an impact in AI safety should have at least a basic proficiency in them. However, I find it a little concerning that cognitive science/psychology are rarely even mentioned in these guides. I believe that it would be valuable to have more people working in AI safety whose primary background is from one of cogsci/psych, or who have at least done a minor in them.

Here are examples of four lines of research into AI safety which I think could benefit from such a background:

The psychology of developing an AI safety culture. Besides the technical problem of "how can we create safe AI", there is the social problem of "how can we ensure that the AI research community develops a culture where safety concerns are taken seriously". At least two existing papers draw on psychology to consider this problem: Eliezer Yudkowsky's "Cognitive Biases Potentially Affecting Judgment of Global Risks" uses cognitive psychology to discuss why people might misjudge the probability of risks in general, and Seth Baum's "On the promotion of safe and socially beneficial artificial intelligence" uses social psychology to discuss the specific challenge of motivating AI researchers to choose beneficial AI designs.
Developing better analyses of "AI takeoff" scenarios. Currently humans are the only general intelligence we know of, so any analyzes of what "expertise" consists of and how it can be acquired would benefit from the study of humans. Eliezer Yudkowsky's "Intelligence Explosion Microeconomics" draws on a number of fields to analyze the possibility of a hard takeoff, including some knowledge of human intelligence differences as well as the history of human evolution, whereas my "How Feasible is the Rapid Development of Artificial Superintelligence?" draws extensively on the work of a number of psychologists to make the case that based on what we know of human expertise, scenarios with AI systems becoming major actors within timescales on the order of mere days or weeks seem to remain within the range of plausibility.
Defining just what it is that human values are. The project of AI safety can roughly be defined as "the challenge of ensuring that AIs remain aligned with human values", but it's also widely acknowledged that nobody really knows what exactly human values are - or at least, not to a sufficient extent that they could be given a formal definition and programmed into an AI. This seems like one of the core problems of AI safety, and one which can only be understood with a psychology-focused research program. Luke Muehlhauser's article "A Crash Course in the Neuroscience of Human Motivation" took one look at human values from the perspective of neuroscience, and my "Defining Human Values for Value Learners" sought to provide a preliminary definition of human values in a computational language, drawing from the intersection of artificial intelligence, moral psychology, and emotion research. Both of these are very preliminary papers, and it would take a full research program to pursue this question in more detail.
Better understanding multi-level world-models. MIRI defines the technical problem of "multi-level world-models" as "How can multi-level world-models be constructed from sense data in a manner amenable to ontology identification?". In other words, suppose that we had built an AI to make diamonds (or anything else we care about) for us. How should that AI be programmed so that it could still accurately estimate the number of diamonds in the world after it had learned more about physics, and after it had learned that the things it calls "diamonds" are actually composed of protons, neutrons, and electrons? While I haven't seen any papers that would explicitly tackle this question yet, a reasonable starting point would seem to be the question of "well, how do humans do it?". There, psych/cogsci may offer some clues. For instance, in the book Cognitive Pluralism, the philosopher Steven Horst offers an argument for believing that humans have multiple different, mutually incompatible mental models / reasoning systems - ranging from core knowledge systems to scientific theories - that they flexibly switch between depending on the situation. (Unfortunately, Horst approaches this as a philosopher, so he's mostly content at making the argument for this being the case in general, leaving it up to actual cognitive scientists to work out how exactly this works.) I previously also offered a general argument along these lines in my article World-models as tools, suggesting that at least part of the choice of a mental model may be driven by reinforcement learning in the basal ganglia. But this isn't saying much, given that all human thought and behavior seems to be in at least part driven by reinforcement learning in the basal ganglia. Again, this would take a dedicated research program.

From these four special cases, you could derive more general use cases for psychology and cognitive science within AI safety:

Psychology as the study and understanding of human thought and behavior, helps guide actions that are aimed at understanding and influencing people's behavior in a more safety-aligned direction (related example: the psychology of developing an AI safety culture)
The study of the only general intelligence we know about, may provide information about the properties of other general intelligences (related example: developing better analyzes of "AI takeoff" scenarios)
A better understanding of how human minds work, may help figure out how we want the cognitive processes of AIs to work so that they end up aligned with our values (related examples: defining human values, better understanding multi-level world-models)

Here I would ideally offer reading recommendations, but the fields are so broad that any given book can only give a rough idea of the basics; and for instance, the topic of world-models that human brains use is just one of many, many subquestions that the fields cover. Thus my suggestion to have some safety-interested people who'd actually study these fields as a major or at least a minor.

Still, if I'd have to suggest a couple of books, with the main idea of getting a basic grounding in the mindsets and theories of the fields so that it would be easier to read more specialized research... on the cognitive psychology/cognitive science side I'd suggest Cognitive Science by Jose Luis Bermudez (haven't read it, but Luke Muehlhauser recommends it and it looked good to me based on the table of contents; see also Luke's follow-up recommendations behind that link); Cognitive Psychology: A Student's Handbook by Michael W. Eysenck & Mark T. Keane; and maybe Sensation and Perception by E. Bruce Goldstein. I'm afraid that I don't know of any good introductory textbooks on the social psychology side.

undefined @ 2017-06-06T13:34 (+11)

I am a psychology PhD student with a background in philosophy/evolutionary psychology. My current research focuses on two main areas: effective altruism and the nature of morality and in particular the psychology of metaethics. My motivation for pursuing the former should be obvious, but my rationale for pursuing the latter is in part self-consciously about the third bullet point, "Defining just what it is that human values are." More basic than even defining what those values are, I am interested in what people take values themselves to be. For instance, we do not actually have good data on the degree to which people regard their own moral beliefs as objective/relative, how common noncognitivist or error theoretic beliefs are in lay populations, etc.

Related to the first point, about developing an AI safety culture, there is also the matter of what we can glean psychologically about how the public likely to receive AI developments. Understanding how people generally perceive AI and technological change more broadly could provide insight that can help us anticipate emerging social issues that result from advances in AI and improve our ability to raise awareness about and increase receptivity to concerns about AI risk among nonexperts, policymakers, the media, and the public. Cognitive science has more direct value than areas like mine (social psychology/philosophy) but my areas of study could serve a valuable auxiliary function to AI safety.

undefined @ 2017-06-05T14:24 (+9)

I think these are all points that many people have considered privately or publicly in isolation, but that thus far no one has explicitly written them down and drawn a connection between them. In particular, lots of people have independently made the observation that ontological crises in AIs are apparently similar to existential angst in humans, ontology identification seems philosophically difficult, and so plausibly studying ontology identification in humans is a promising route to understanding ontology identification for arbitrary minds. So, thank you for writing this up; it seems like something that needed to be written quite badly.

Some other problems that might be easier to tackle from this perspective include mind crime, nonperson predicates, and suffering risk, especially subproblems like suffering in physics.

undefined @ 2017-06-05T16:32 (+7)

Strong agreement. Considerations from cognitive science might also help us to get a handle on how difficult the problem of general intelligence is, and the limits of certain techniques (e.g. reinforcement learning). This could help clarify our thinking on AI timelines as well as the constraints which any AGI must satisfy. Misc. topics that jump to mind are the mental modularity debate, the frame problem, and insight problem solving.

This is a good article on AI from a cog sci perspective: https://arxiv.org/pdf/1604.00289.pdf

undefined @ 2017-06-17T11:32 (+5)

There has recently been an effort started to make the pipeline better for getting people up to speed with AGI safety. I'm trying to champion a broad view of AGI safety including psychology.

Would anyone be interested in providing digested content? It would also be good to have an exit for the pipeline for psychology people interested in AGI. Would that be FHI? Who else would be good to talk to about what is required.

undefined @ 2017-08-16T21:52 (+4)

Excellent post; as a psych professor I agree that psych and cognitive science are relevant to AI safety, and it's surprising that our insights from studying animal and human minds for the last 150 years haven't been integrating into mainstream AI safety work.

The key problem, I think, is that AI safety seems to assume that there will be some super-powerful deep learning system attached to some general-purpose utility function connected to a general-purpose reward system, and we have to get the utility/reward system exactly aligned with our moral interests.

That's not the way any animal mind has ever emerged in evolutionary history. Instead, minds emerge as large numbers of domain-specific mental adaptations to solve certain problems, and they're coordinated by superordinate 'modes of operation' called emotions and motivations. These can be described as implementing utility functions, but that's not their function -- promoting reproductive success is. Some animals also evolve some 'moral machinery' for nepotism, reciprocity, in-group cohesion, norm-policing, and virtue-signaling, but those mechanisms are also distinct and often at odds.

Maybe we'll be able to design AGIs that deviate markedly from this standard 'massively modular' animal-brain architecture, but we have no proof-of-concept for thinking that will work. Until then, it seems useful to consider what psychology has learned about preferences, motivations, emotions, moral intuitions, and domain-specific forms of reinforcement learning.

undefined @ 2017-06-08T21:00 (+3)

I got linked here while browsing a pretty random blog on deep learning, you're getting attention! (https://medium.com/intuitionmachine/seven-deadly-sins-and-ai-safety-5601ae6932c3)

undefined @ 2017-06-09T11:31 (+1)

Neat, thanks for the find. :)

undefined @ 2017-06-06T15:34 (+3)

What is your model of why other people in the AI safety field disagree with you/don't consider this as important as you?

undefined @ 2017-06-06T18:07 (+9)

My main guess is "they mostly come from a math/CS background so haven't looked at this through a psych/cogsci perspective and seen how it could be useful".

That said, some of my stuff linked to above has been mostly met with a silence, and while I presume it's a question of inferential silence - a sufficiently long inferential distance that a claim doesn't provoke even objections, just uncomprehending or indifferent silence - there is also the possibility of me just being so wrong about the usefulness of my ideas that nobody's even bothering to tell me.

undefined @ 2017-06-08T20:01 (+2)

Kaj, I tend to promote your stuff a fair amount to end the inferential silence, and it goes without saying that I agree with all else you said.

Don't give up on your ideas or approach. I am dispirited that there are so few people thinking like you do out there.

Peter McIntyre @ 2017-06-20T18:58 (+1)

Hi Kaj,

Thanks for writing this. Since you mention some 80,000 Hours content, I thought I’d respond briefly with our perspective.

We had intended the career review and AI safety syllabus to be about what you’d need to do from a technical AI research perspective. I’ve added a note to clarify this.

We agree that there a lot of approaches you could take to tackle AI risk, but currently expect that technical AI research will be where a large amount of the effort is required. However, we’ve also advised many people on non-technical routes to impacting AI safety, so don’t think it’s the only valid path by any means.

We’re planning on releasing other guides and paths for non-technical approaches, such as the AI safety policy career guide, which also recommends studying political science and public policy, law, and ethics, among others.

undefined @ 2017-06-20T21:09 (+4)

Hi Peter, thanks for the response!

Your comment seems to suggest that you don't think the arguments in my post are relevant for technical AI safety research. Do you feel that I didn't make a persuasive case for psych/cogsci being relevant for value learning/multi-level world-models research, or do you not count these as technical AI safety research? Or am I misunderstanding you somehow?

I agree that the "understanding psychology may help persuade more people to work on/care about AI safety" and "analyzing human intelligences may suggest things about takeoff scenarios" points aren't related to technical safety research, but value learning and multi-level world-models are very much technical problems to me.

Peter McIntyre @ 2017-06-22T16:10 (+3)

We agree these are technical problems, but for most people, all else being equal, it seems more useful to learn ML rather than cog sci/psych. Caveats:

Personal fit could dominate this equation though, so I'd be excited about people tackling AI safety from a variety of fields.
It's an equilibrium. The more people already attacking a problem using one toolkit, the more we should be sending people to learn other toolkits to attack it.

undefined @ 2017-06-22T20:38 (+3)

it seems more useful to learn ML rather than cog sci/psych.

Got it. To clarify: if the question as framed as "should AI safety researchers learn ML, or should they learn cogsci/psych", then I agree that it seems better to learn ML.