A Research Agenda for Psychology and AI

By carter allen🔸 @ 2024-06-28T12:56 (+53)

I think very few people have thought rigorously about how psychology research could inform the trajectory of AI or humanity’s response to it. Despite this, there seem to be many important contributions psychology could make to AI safety. For instance, a few broad paths-to-impact that psychology research might have are:

  1. Helping people anticipate the societal response to possible developments in AI. In which areas is public opinion likely to be the bottleneck to greater AI safety?
  2. Improving forecasting/prediction techniques more broadly and applying this to forecasts of AI trajectories (the Forecasting Research Institute’s Existential Risk Persuasion Tournament is a good example).
  3. Describing human values and traits more rigorously to inform AI alignment, or to inform decisions about who to put behind the wheel in an AI takeoff scenario.
  4. Doing cognitive/behavioral science on AI models. For instance, developing diagnostic tools that can be used to assess how susceptible an AI decision-maker is to various biases.
  5. Modeling various risks related to institutional stability. For instance, arms races, risks posed by malevolent actors, various parties’ incentives in AI development, and decision-making within/across top AI companies.

I spent several weeks thinking about specific project ideas in these topic areas as part of my final project for BlueDot Impact’s AI Safety Fundamentals course. I’m sharing my ideas here because a) there are probably large topic areas I’m missing, and I’d like for people to point them out to me, b) I’m starting my PhD in a few months, and I want to do some of these ideas, but I haven't thought much about which of them are more/less valuable, and c) I would love for anyone else to adopt any of the ideas here or reach out to me about collaborating! I also hope to write a future version of this post that incorporates more existing research (I haven’t thoroughly checked which of these project ideas have already been done).

Maybe another way this post could be valuable is that I've consolidated a lot of different resources, ideas, and links to other agendas in one place. I'd especially like for people to send me more things in this category, so that I or others can use this post as a resource for connecting people in psychology to ideas and opportunities in AI safety.

In the rest of this post, I list various topic areas I identified in no particular order, as well as any particular project ideas I had which struck me as potentially especially neglected & valuable in that area. Any feedback is appreciated!

Topic Areas & Project Ideas

1. Human-AI Interaction

1.1 AI Persuasiveness

A lot of people believe that future AI systems might be extremely persuasive, and perhaps we should prepare for a world where interacting with AI models carries a risk of manipulation/brainwashing. How realistic is this concern? (Note, although I don’t think this sort of research is capable of answering whether AI will ever be extremely persuasive, I think it could still be very usefully informative.) For instance:

If AI models are already highly persuasive, can they be used as scalable tools for improving people’s reasoning or moral judgment, or increasing people’s concern about AI safety?

There's a rapidly growing corner of psychology interested in using AI for belief change. Links to some existing work related to AI persuasiveness:

1.2 Human-AI Capability Differences

Which, if any, kinds of tasks do humans have the best bet of being better than highly advanced AI systems at?

See also: Moravec's paradox

1.3 Trust in AI

How much are people willing to defer to AI advisors:

How much should we expect risk compensation to affect individuals’ and institutions’ willingness to use advanced AI?

How malleable are people’s empirical beliefs about AI risk:

2. Forecasting AI Development

2.1 Foundational Work on Forecasting

What biases exist in probability forecasting?

How can people use new or existing statistical tools and AI models to improve their forecasting ability?

Meta-forecasting: what’s the relationship between cognitive effort given to making predictions and predictive accuracy? 

How do people (perhaps domain experts vs. non-experts) interpret probability forecasts? I think there are many promising ideas one could research here:

2.2 Applying Forecasting Research to AI Safety

How good are experts and non-experts at predicting the trajectory and/or speed of technological development? 

Why are people often hesitant to provide explicit probability estimates for AI risk (along with other similar high-magnitude risks like the odds of nuclear war)?

Someone could also do cross-cultural forecasting research projects with the aim of obtaining better forecasts of trends in international politics related to AI. Perhaps additional forecasting tournament-style interventions would be interesting. A few questions I think are good candidates for this sort of thing:

3. Understanding Human Values

I think more research on human values could be useful for two reasons: (1) To the extent one wants to align an advanced AI's values to commonsense morality, it's good to understand what commonsense morality actually is. But I think it's also pretty obvious one wouldn't want to align an extremely advanced AI system to commonsense morality, so the bigger reason is that (2) whether it's a good idea or not, people probably will  align advanced AI systems to commonsense morality (even unknowingly), and so understanding human values and their failure modes better might be really useful for AI strategy more broadly. Some projects here fall into both categories.

3.1 Foundational Work on Human Values

To the extent we want AI to promote well-being, any work that aims to better understand what well-being is and what factors best predict well-being would be extremely useful. Some example questions:

Value spread:

Which of the 4 resolutions here best describes most people’s response to moral uncertainty? 

How fanatical are most people?

Of course, a lot of psychology work on human values already exists. I think these are a few really good examples which might inspire further projects/questions:

3.2 Applying Human Values Research to AI Safety

What values does it seem likely people will want AIs to maximize?

How true is the orthogonality thesis for human minds? Should we expect the same results in AI minds?

4. Institutional Stability

Papers on institutional stability related to AI safety:

Also, I recommend this list of interventions for improving institutional decision-making by @taoburga if you're interested in this topic; many items on his list could easily be explored from a psychology angle.

4.1 Arms Race Dynamics

Game theory work on AI arms races:

Conduct empirical work on the security dilemma: Do people interpret building a stronger defense as an offensive action? Conversely, do people interpret building offensive weapons as a defensive act because of mutually assured destruction or other power dynamics?

4.2 Malevolence and AI

I’m not sure it’s that useful (for AI safety specifically) for psychologists to focus on reducing human malevolence in general (ie. designing interventions that could reduce average levels of malevolence in the general public). Conditional on it being the case that malevolent actors pose a large risk in the future as a result of AI, it will probably only take a small number of bad actors for immense harm to be realized, in which case even very substantial improvements in people’s ability to detect and prevent malevolence wouldn't substantially reduce these risks. However, I think it could still be valuable to try to estimate more rigorously the risk of AI misuse by select governments, or to try to forecast risks from bad actors (to better model AI risk). Some ideas:

How large is the risk posed by malevolent leaders in future worlds with advanced AI?

What are the underpinning motivations, beliefs, and mechanisms of malevolence (e.g., psychopathic, Machiavellian, narcissistic, and sadistic traits) and their associated behavioral patterns?

How can we best measure malevolence?

4.3 Identifying Ethical Decision-Makers

It could be really useful to have a clear understanding of which people are or aren’t epistemically virtuous and/or altruistically motivated in many possible scenarios where humans are faced with designating people to make decisions about AI systems’ constitutions, AI governance, preparing for transformative AI, which human values to maximize, etc. Some of these ideas are straight plagiarized from Psychology for Effectively Improving the Future, a research agenda created by some of my colleagues a few years ago.

How can we best identify people (e.g., policymakers) who will safeguard the future wisely? 

4.4 Miscellaneous

Model ways that advanced AI systems might engage in conflict with each other.

How might humans and AIs interact with each other in a conflict scenario?

How much should we expect the unilateralist’s curse to exacerbate the risks posed by unilateral decisions related to AI, such as data leaks, rogue model development, etc.?

Foundational work on institutional/organizational stability:

5. Anticipating the Societal Response to AI Developments

5.1 AI Sentience

How likely are people to believe that AI systems are conscious/sentient/agentic in the future?

How likely is it that social division over AI sentience could lead to large-scale societal conflict?

Assuming that people care about AI rights, should we expect them to care primarily about the front-facing AI models (i.e., models working in the background wouldn’t get any protections)?

Assuming that most AI systems will in fact have positive welfare, would people still morally oppose using “happy servant” type AIs as personal servants or for other tasks, assuming they knew the AI systems were sentient?

Assuming that consumers could pay more for an AI model that suffered less, how much would they be willing to pay?

5.2 Modeling AI takeoff scenarios

One way to think about AI takeoff scenarios is as “all of the bottlenecks to progress substantially widening” - what are the relevant bottlenecks, and are any of these dependent on human psychology?

Developing better models of people’s responses to takeoff scenarios: 

5.3 Risk Aversion and AI

Very simply: how risk averse are the people leading the AI stuff.

Project idea: Maybe people underestimate the value of risk taking as they start to lose. For instance, when people are playing board games, they should become extremely risk seeking if they start to lose, but people rarely do this; instead, they often get Goodharted by things like “maximizing expected points per turn” or something.

^ In contrast, there’s also an idea that people might become unnecessarily risk-seeking in competitive environments even when it doesn’t benefit them, perhaps because they use the above heuristic too much. Either finding would be important.

5.4 Post-TAI Wellbeing 

Inspired by this Daron Acemoglu work - how likely is it that AI developments could cause most people’s lives to lose their meaning?

6. Cognitive Science of AI

I imagine there are many promising psychology projects that would apply methods in cognitive/computational psychology to AI models. This could involve doing things like exploring the presence of dual-process cognition in AI models or computationally modeling the way that AI brains make different kinds of trade-offs. I’m not very familiar with research in these areas, though, so I would love to hear more thoughts about how projects like this would look.

6.1 Miscellaneous

How accurately does shard theory describe human psychology/neuroscience? How likely is it that shard theory similarly describes AIs’ “psychologies”?

What are the neural bases of consciousness?

What is the nature of intelligence? What exactly do people mean when they talk about intelligence? What components are necessary for general intelligence?

In general, searching for similarities between human cognition and AI cognition could probably inform interpretability research.

To what extent are cognitive biases transmitted to AI models via processes like RLHF?

How likely is it that AI models will have positive/negative utility, assuming they are sentient? On one view, they might have quite high utility by default because even in scenarios where they’re doing labor people would view as unlikable, their reward functions would have adapted specifically for these tasks.

See also: LessWrong large language model psychology research agenda.

7. Other Topics

7.1 Meta-Psychology and AI

Can we use AI to reliably simulate human data, to increase the speed of behavioral/cognitive science research? (Perhaps this technique could also be used to quickly rule out many of the above hypotheses, and to identify which are the most promising/unknown.) I know there's already some work on this but couldn't find it in two google searches so I moved on.

Can we use AI to improve/test the replicability of social science research?

Undertake community-building projects for psychology x AI.

7.2 Politics

I think there are also likely to be many questions in political psychology that could be important for AI safety. For instance:

Project idea: Case studies of important people’s particular psychology (e.g. trying to get super familiar with Sam Altman’s judgment and decision-making practices to better anticipate his decisions).

7.3 Donation behavior

It might be useful to understand how to persuade people to give more to causes that would advance AI safety (or to move them away from things like “impact investing in AI companies that I see as societally beneficial”). There’s already a broad marketing literature on people’s charitable donation behavior but I don’t think anyone has connected this to AI or longtermist charities specifically.

7.4 Miscellaneous

Project idea: Test whether people are insufficiently willing to pay for information (could use designs like the one in Amanda Askell's moral value of information talk.)

Someone could do some projects on motivation-based selection effects in AI: a person is only going to write an article on something if they feel strongly about it, so articles tend to express polarized viewpoints. One will only build TAI if they feel strongly that it is going to be good, so TAI is likely to be built by overly optimistic people who systematically underestimate its danger. Similar to the optimizer’s curse. Could inform science more broadly and also relates specifically to AI development.

Additional Resources for People Interested in Psychology and AI:

Please recommend more!

Forum Posts

Some excellent skeptical takes about this stuff by @Vael Gates: https://forum.effectivealtruism.org/posts/WHDb9r9yMFetG7oz5/

List of AI researchers with degrees in psychology by @Sam Ellis: https://forum.effectivealtruism.org/posts/fSDxnLcCn8h22gCYB/ea-psychology-and-ai-safety-research

Another post with some psychology and AI ideas by @Kaj_Sotala: https://forum.effectivealtruism.org/posts/WdMnmmqqiP5zCtSfv/cognitive-science-psychology-as-a-neglected-approach-to-ai

@Geoffrey Miller is running a course on AI and Psychology: https://forum.effectivealtruism.org/posts/rtGfSuaydeengBQpQ/seeking-suggested-readings-and-videos-for-a-new-course-on-ai

Fellowships/Programs/Labs

PIBBS fellowship: https://pibbss.ai/resources/

Effective thesis: https://effectivethesis.org/psychology-and-cognitive-science/

Institute working on modeling human moral judgment/psychology for AI safety: https://mosaic.allenai.org/

Research labs focused (at least somewhat) on psychology and AI:

Acknowledgments

Thanks to @Lucius Caviola and @Joshua_Lewis for ideas and discussions on many of these topics - a solid third of these thoughts come from one or both of them.  Thanks also to @Chris Leong and others at BlueDot for helpful suggestions and comments!


Coleman @ 2024-07-03T17:28 (+6)

Hi, I just wanted to say this is a great post! My background is in psychology and AI, so I am particularly excited about this piece and would be really excited to talk more about some of your points, especially relating to key questions that are important to investigate that may inform AI governance strategy (my current focus!) 

SummaryBot @ 2024-07-01T13:43 (+2)

Executive summary: Psychology research could make important contributions to AI safety in areas like anticipating societal responses to AI developments, improving forecasting techniques, describing human values, assessing AI decision-making biases, and modeling risks related to institutional stability and AI development.

Key points:

  1. Human-AI interaction research could explore AI persuasiveness, human-AI capability differences, trust in AI, and more.
  2. Foundational work on forecasting biases, aggregation techniques, and interpreting probability estimates could be applied to predicting AI development trajectories.
  3. Understanding human values, both in general and in the context of AI alignment, is crucial. This includes research on well-being, value spread, moral uncertainty, and fanaticism.
  4. Modeling institutional stability issues like AI arms race dynamics, risks from malevolent actors, identifying ethical decision-makers, and AI cooperation/conflict scenarios is important.
  5. Anticipating societal responses to AI developments in areas like AI sentience, takeoff scenarios, risk aversion, and post-TAI well-being could inform AI strategy.
  6. Applying cognitive science methods to understand the nature of AI cognition and its similarities/differences with human cognition is a promising area for interdisciplinary research.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.