How the Human Psychological "Program" Undermines AI Alignment — and What We Can Do
By Beyond Singularity @ 2025-05-06T13:37 (+6)
TL;DR
Humans reliably proclaim loftier morals than they practice. Modern field data confirm a stubborn value–action gap across sustainability, animal welfare, honesty, and fairness. Because today’s AI systems learn from behavioural data, naïve training risks codifying that gap at super-human scale. To align AI with our aspirational ethics, we must measure the gap, understand its psychological roots, and design technical and societal scaffolding that compensates for our biases. This is a core challenge in Moral Alignment — the pursuit of AI systems and human societies that jointly elevate ethical reasoning as their power grows.
1. How big is the gap?
Domain | What people say | What they do |
---|---|---|
Sustainability | 60–70% claim to “usually prefer” eco-friendly brands | ~25% actually buy when it costs ≥10% more |
Animal welfare | 90% oppose factory farming | 99% of meat sold in the US comes from industrial farms |
Civic honesty | >80% agree “keeping found money is wrong” | 40–60% return lost wallets (global experiment) |
Hiring fairness | >90% HR managers endorse equal opportunity | “White-sounding” names get ≈50% more callbacks |
Even a century after LaPiere’s 1930s field experiment, the pattern holds. The gap scales with convenience, anonymity, and short-term incentives.
2. Why do words and deeds diverge?
2.1 The “psychological program”
Mechanism | Definition | Key evidence |
---|---|---|
Present bias | Immediate rewards outweigh future costs | Bulley & Pepper 2017 |
Cognitive dissonance | We alter beliefs to avoid discomfort | Harmon-Jones 2019 |
Moral licensing | Good deeds earn “credit” for later lapses | Blanken et al. 2015 |
Self-enhancement bias | Most believe they’re more moral than others | Tappin & McKay 2017 |
All of these are adaptive traits — for social cohesion and cognitive economy — but were never optimized for global-scale ethics or AGI design.
2.2 How we defend the gap
- Rationalisation – “I had no choice.”
- Information avoidance – Don’t watch the animal cruelty doc.
- Value reshaping – Downgrade the violated ideal.
- Moral bookkeeping – Donate one day, exploit the next.
3. Value–Action and the Future of AI
Building on the insights of this post about the power–ethics gap, this piece focuses on a more specific layer of the same crisis: the value–action gap. This is the behavioral rift where stated ethics fail to shape actual behavior — and where AI alignment efforts are especially vulnerable.
3.1 Alignment strategies
Strategy | AI Outcome | Risk |
---|---|---|
Behavioral imitation | Mirrors real human actions | Locks in bias, shortsightedness |
Aspirational alignment | Targets ideal values (ethics) | Demands clear value selection and reconciliation |
Today’s leading methods — RLHF, Constitutional AI, dataset audits — approximate aspirational alignment, but they inherit whatever gaps remain in the underlying data and reward signals.
3.2 Examples of failure modes if we ignore the gap
When the value–action gap is ignored in AI training, ethical systems can replicate — or even amplify — human contradictions. For example:
- Automated moral licensing: a recommendation engine that offsets one ethical prompt with ten consumption-pushing suggestions.
- Turbo-charged present bias: real-time personalised ads super-optimised for instant gratification.
- Scale lock-in: once a biased hiring model is deployed globally, its outputs become the “ground truth” for future models.
To address this, we must engage in Value Selection — not just mirror behavior, but define what values we want AI to serve. Current approaches approximate this, but often inherit inconsistencies.
"Value Selection" refers to the ethical task of explicitly determining which values AI should prioritize — distinct from capability development or training strategies. While alignment research often focuses on how to constrain AI, the foundational question of which values are worth aligning to remains underdefined.
This raises a deeper question: Whose values? And leads us to a more foundational challenge.
4. Human Alignment: Fixing the Source
If human behavior fails to follow human good ethics, then aligning AI with humans may not be enough. We must also align humans with their own best values. This is the idea of Human Alignment — a necessary precursor to meaningful AI alignment.
Why?
- Evolution favored adaptive behavior over ethical coherence.
- Society optimizes short-term gains, not long-term flourishing.
- Even in democracies, professed values rarely guide collective policy.
Thus, a truly ethical AI must not merely be “aligned with humans,” but aligned with the better angels of our nature — a form of alignment that recognizes the gap and helps close it, rather than preserve it.
5. Toward Moral Alignment: AI as an Ethical Amplifier
The ultimate goal isn’t just safe AI — it’s a kind civilization. Moral Alignment proposes a tri-level approach:
- Technical alignment – Keep control of AI systems (power).
- Human alignment – Elevate and stabilize our values (ethics).
- AI ethical alignment – Build AI that acts with ethical foresight, not mimicry.
We must stop viewing AI as an obedient mirror. Instead, see it as moral scaffolding — helping humanity close its own value–action gap by elevating aspiration over convenience, long-termism over present bias, and compassion over inertia.
Conclusion
Humans are not hypocrites by accident; we are engineered by evolution to juggle ideals and expedience. Copy-pasting that operating system into AI would cement our blind spots at planetary scale. Instead, we must acknowledge the gap with data, expose the psychological code that sustains it, and design socio-technical systems — including AI — that lean toward our declared ideals.
Ronen Bar @ 2025-05-07T05:32 (+2)
How do you think we can align humans with their own best values? is it more of a outside of the AI space societal work or is it also tied to the AI space?
I think they are both connected and we should work on both, and we really need a new story as a species for us to be able to rise up to this challenge.
Beyond Singularity @ 2025-05-07T10:22 (+1)
Thank you so much for engaging with the post — I really appreciate your thoughtful comment.
You're absolutely right: this is a deeply interconnected issue. Aligning humans with their own best values isn’t separate from the AI alignment agenda — it’s part of the same challenge. I see it as a complex socio-technical problem that spans both cultural evolution and technological design.
On one side, we face deeply ingrained psychological and societal dynamics — present bias, moral licensing, systemic incentives. On the other, we’re building AI systems that increasingly shape those very dynamics: they mediate what we see, amplify certain behaviors, and normalize patterns of interaction.
So I believe we need to work in parallel:
- On the AI side, to ensure systems are not naïvely trained on our contradictions, but instead scaffold better ethical reasoning.
- On the human side, to address the root misalignments within ourselves — through education, norm-shaping, institutional design, and narrative work.
I also resonate with your point about needing a new story — a shared narrative that can unify these efforts and help us rise to the moment. It's a huge challenge, and I don’t pretend to have all the answers, but I’ve been exploring directions and would love to share more concrete ideas with the community soon.