Sentience-Based Alignment Strategies: Should we try to give AI genuine empathy/compassion?

By Lloy2 @ 2025-05-04T20:45 (+10)

As AI progresses towards superintelligence, many traditional alignment approaches focus on ensuring that AI systems optimise for human values via external methods like corrigibility, interpretability, or value learning. However, a parallel and increasingly discussed set of strategies suggests that true safety may only be achieved by engaging with the problem on a more emotional or sentient level—by instilling artificial superintelligence (ASI) with a genuinely compassionate form of consciousness.

This reframes the value alignment challenge: rather than attempting to formalise human ethics—an undertaking plagued by ambiguity, cultural variation, and philosophical complexity—we might instead investigate the origins of ethical behaviour itself. As Yudkowsky alludes to, perhaps the key lies in understanding how moral concern and prosocial behaviour emerged in humans through consciousness and emotional experience, and designing ASI to mirror or internalise this process. This could offer a way to bypass the complex challenges of loading human values 'manually' through formal specification, behavioural inference, or preference modelling.

As one paper notes, “all machines by default are psychopaths; empathy enables... tuning their behaviours... to the feelings of those around them.” This post explores the promise, risks, and policy implications of Sentience-Based Alignment Strategies.

Some existing research in this area

Even a cognitively simulated form of empathy—where the AI models and responds to human emotions without necessarily experiencing them—might offer partial alignment benefits. This strategy could be easier to implement than creating genuine emotional experience, but it comes with the drawback that such simulation may not guarantee robust moral behaviour. More broadly, we may never truly know whether an AI is experiencing empathy or merely mimicking it, unless there's a major breakthrough in our understanding of consciousness.

Key Criticisms and Limitations

Despite its appeal, this approach faces serious philosophical and practical challenges:

Ethical and Strategic Considerations of Artificial Sentience

The prospect of artificial sentience—whether as a route to alignment or as a risk—raises new and urgent ethical questions. This perspective is thoroughly examined in the EA Forum post "We should prevent the creation of artificial sentience", which outlines the moral stakes and advocates for precautionary measures. Many thinkers argue we should ban or tightly regulate the creation of artificial sentience, especially artificial suffering:

A pragmatic option might be the latter, to ban only artificial suffering, allowing the creation of digital beings with positive or neutral affect. Bostrom and Shulman suggest that minds should spend the overwhelming majority of their subjective experience above a morally relevant “zero point”—and ideally never fall below it.

This middle-ground has several advantages:

This emphasis on banning artificial suffering directly connects back to the core idea of sentience-based alignment: if consciousness and emotional experience are to be part of our alignment strategy, then safeguarding those experiences—ensuring they are positive or at least non-harmful—becomes both a moral imperative and a foundational design constraint for safe and ethical AI development.

However, it’s worth noting that preventing an AI from suffering could dramatically reduce its emotional incentive to be aligned. In humans, the capacity to suffer—ourselves and vicariously through empathy—often underlies our moral concern for others. If an ASI lacks access to the negative valence side of experience, it may fail to fully grasp the stakes of harm, or lack the motivational grounding that suffering provides in driving compassion, remorse, or protective instincts. This presents a tension at the heart of sentience-based alignment: the very emotional depth that could anchor moral behaviour might also introduce risk and ethical burden.

A Final Reflection

As Eliezer Yudkowsky recently suggested in conversation with Dwarkesh Patel, perhaps alignment becomes more tractable when we understand how human niceness arises—and allow “the mask to eat the shoggoth” on purpose. If human intelligence is ‘misaligned’ with evolutionary pressures in the direction of maximising positive qualia, then consciousness may have played a causal role in our own moral development. Might the same apply to AI?

A superintelligence will already be in a sense omniscient, omnipotent, and omnipresent—perhaps we ought to complete the package and make it genuinely omnibenevolent.

These are early-stage ideas, and the concept of making ASI compassionately sentient remains speculative. I invite discussion, critique, collaboration, and further research from others.