Developing AI Safety: Bridging the Power-Ethics Gap (Introducing New Concepts)
By Ronen Bar @ 2025-04-16T11:25 (+18)
TLDR
- One of the most significant challenges facing humanity is the widening gap between our rapidly increasing power and our slower progress (if there is any) in ethical understanding and application. I term this the power-ethics gap, and this has been creating increasing suffering and death (to animals and humans) from human actions as history progresses.
- AI threatens to dramatically expand the power-ethics gap.
- This creates an urgent imperative for humanity to cultivate better ethics.
- Currently, the AI safety field predominantly focuses on the critical task of maintaining human control over powerful AI systems, often overlooking the vital ethical dimension. Consequently, philosophical inquiry is essential to re-view the field, introducing new concepts (or refining existing ones) that properly emphasize this ethical component.
- I suggest several new concepts in this post.
- More resources can be dedicated to the crucial question of AI value selection, and related work that aims to ensure the values embedded within AI systems align with an encompassing ethical view.
This post can be seen as a continuation of the this post.
(To further explore this topic, you can watch a 34-minute video outlining a concept map of the AI space and potential additions. Recommended viewing speed: 1.25x).
This post drew some insights from the Sentientism podcast and the Buddhism for AI course.
My Point of View
I am looking at the AI safety space mainly through the three fundamental questions: What is? What is good? How do we get there?
- "What is?" relates to understanding reality through data and intelligence. I define intelligence as the capacity to make accurate predictions across increasingly complex counterfactual scenarios, given specific data and environmental context. The combination of data and intelligence constitutes power. Greater power enables a deeper understanding of "what is" and improved predictive capabilities across various scenarios.
- "What is good?" pertains to ethics – the principles guiding right action.
- "How to get there?" pertains to wisdom, the effective integration of power and ethics. Consider an analogy: power is the speed of a car, while ethics is the driver's skill. Low speed with an expert driver poses little risk. However, high speed with an unskilled driver will probably end in disaster. Wisdom, therefore, represents the effective application of power guided by ethical understanding.
Human History Trends
Historically, human power – driven by increasing data and intelligence – is scaling rapidly and exponentially. Our ability to understand and predict "what is" continues to grow. However, our ethical development ("understanding what is good") is not keeping pace. The power-ethics gap is the car driving increasingly faster, while the driver’s skill is improving just a little bit as the ride goes on and on. This arguably represents one of the most critical problems globally. This imbalance has contributed significantly to increasing suffering and killing throughout history, potentially more so in recent times than even before. The widening power-ethics gap appears correlated with large-scale, human-caused harm.
The Focus of the AI Safety Space
Eliezer Yudkowsky, who describes himself as 'the original AI alignment person,' is one of the most prominent figures in the AI safety space. His philosophical work, many concepts he created, and his discussion forum platform and organizations have significantly shaped the AI safety field. I am in awe of his tremendous work and contribution to humanity, but he has a significant blind spot regarding his understanding of “what is”. Yudkowsky operates within a framework where (almost) only humans are considered sentient, that is his claim, whereas scientific evidence suggests that probably all vertebrates, and possibly many invertebrates, are sentient. This discrepancy is crucial: one of the key founders of the AI safety space has built his perspective on an unscientific assumption that limits his view to a tiny fraction of the world's sentience.
The potential implications of this are profound and this highlights the necessity of re-evaluating AI safety from a broader ethical perspective encompassing all sentient beings, both present and future. This requires introducing new concepts and potentially redefining existing ones. This work is critical since the pursuit of artificial intelligence is primarily focused on increasing power (capabilities), hence it risks further widening the existing power-ethics gap within humanity.
Since advanced AI poses the threat talking away control and mastery of from humans, two crucial pillars for AI safety emerge: maintaining meaningful human control (power) and ensuring ethical alignment (ethics). Currently, the field heavily prioritizes the former, while the latter remains underdeveloped. From an ethical perspective, particularly one concerned with the well-being of sentientkind ('Sentientkind' being analogous to 'humankind' but inclusive of all feeling beings), AI safety and alignment could play a greater role. Given that AI systems may eventually surpass human capabilities, their embedded values will have immense influence.
We must strive to prevent an AI-driven power-ethics gap far exceeding the one already present in humans.
Suggesting New Concepts, Redefining or Highlighting Existing Ones
- Value Selection: AI alignment involves understanding "what is" (related to AI capabilities), determining "what is good" (related to value selection), and figuring out "how to get there" (related to technical alignment). "Value Selection"—the core ethical question of which values AI should pursue—receives insufficient focus. The lack of a universally accepted term is indicative of this ("The steering problem" post in the LessWrong forum, by Paul Christiano, revolves around this question, and other names were used for this by different scholars). A clear, established term for this crucial ethical component is needed for productive discourse.
- Alignment: Should 'alignment' solely refer to maintaining human control? A broader definition could encompass AI acting in accordance with a robust understanding of both reality ("what is") and ethics ("what is good"). This could also also suggests differentiating types of alignment, such as:
- Human-Centric Alignment: Focused primarily on aligning AI with human interests and control.
- Sentientkind Alignment: A broader goal of aligning AI with the well-being and interests of all sentient beings.
- Misaligned AI: Similarly, does a 'misaligned AI' refer only to loss of human control, or does it also imply ethical failure? We likely need sub-terms to distinguish scenarios: AI systems might be controllable but ethically misaligned, ethically aligned but uncontrollable, aligned on both fronts, or deficient in both.
- Human-Centric AI: AI that is aligned with the values chosen by humans, regardless of what those values are.
- Omnikind AI / Sentientkind AI: Proposed terms for a model that considers the well-being of all sentient beings, reflecting a broader ethical foundation.
- Buddha AI: Reflecting a specific ethical framework, which is advocated by The Monastic Academy for the Preservation of Life on Earth (MAPLE); Developing AI that will “walk the path” of Buddha to deeply understand the world and become compassionate for all.
- Human Alignment: Considering the causal chain (Evolution → Humans → AI → Superintelligence), achieving beneficial outcomes might require alignment at each stage; Aligned humans create aligned AI which create aligned superintelligence. Nevertheless, humans are not ethically aligned, and evolutionary pressures prioritized survival and replication (by increasing power) over ethics, making complete alignment much harder. Hence I would argue we are currently the smart human, not necessarily the wise human.
- There seems to be insufficient focus on this 'human alignment' prerequisite. While one can argue this falls under broader societal efforts and not AI safety/alignment ones, the urgency created by AI's power scaling suggests it might warrant dedicated attention, perhaps even a distinct movement focused on human alignment to close the power-ethics gap and create wisdom.
- Value Alignment Strategies: An encompassing term for the diverse set of technical methods (e.g., RLHF), frameworks (e.g., Constitutional AI, Coherent Extrapolated Volition), and approaches used for value alignment (make AI act according to human chosen values, whatever they are).
Moral Alignment: A proposed broader concept, somewhat overlapping with AI safety, but explicitly emphasizing the moral imperative to scale human and artificial ethics in response to escalating power. It encompasses the goals of maintaining human control (technical alignment), fostering human ethical development ('human alignment'), and ensuring AI systems are ethically sound ('AI ethical alignment').
Different people use different terms in AI safety in different ways, so I would love to hear any thoughts on what I got wrong in my understanding of different concepts.
Beyond Singularity @ 2025-04-16T12:56 (+3)
Thank you for this deep and thought-provoking post! The concept of the "power-ethics gap" truly resonates and seems critically important for understanding current and future challenges, especially in the context of AI.
The analogy with the car, where power is speed and ethics is the driver's skill, is simply brilliant. It illustrates the core of the problem very clearly. I would even venture to add that, in my view, the "driver's skill" today isn't just lagging behind, but perhaps even degrading in some aspects due to the growing complexity of the world, information noise, and polarization. Our collective ability to make wise decisions seems increasingly fragile, despite the growth of individual knowledge.
Your emphasis on the need to shift the focus in AI safety from purely technical aspects of control (power) to deep ethical questions and "value selection" seems absolutely timely and necessary. This truly is an area that appears to receive disproportionately little attention compared to its significance.
The concepts you've introduced, especially the distinction between Human-Centric and Sentientkind Alignment, as well as the idea of "Human Alignment," are very interesting. The latter seems particularly provocative and important. Although you mention that this might fall outside the scope of traditional AI safety, don't you think that without significant progress here, attempts to "align AI" might end up being built on very shaky ground? Can we really expect to create ethical AI if we, as a species, are struggling with our own "power-ethics gap"?
It would be interesting to hear more thoughts on how the concept of "Moral Alignment" relates to existing frameworks and whether it could help integrate these disparate but interconnected problems under one umbrella.
The post raises many important questions and introduces useful conceptual distinctions. Looking forward to hearing the opinions of other participants! Thanks again for the food for thought!
VeryJerry @ 2025-04-16T13:44 (+1)
How can we unhypocritically expect AI superintelligence to respect "inferior" sentient beings like us when we do no such thing for other species?
How can we expect AI to have "better" (more consistent, universal, compassionate, unbiased, etc) values than us and also always only do what we want it to do?
What if extremely powerful, extremely corrigible AI falls into the hands of a racist? A sexist? A speciesist?
Some things to think about if this post doesn't click for you
Ronen Bar @ 2025-04-17T12:00 (+2)
And thinking more long term, when AGI builds a superintelligence, that will build the next agents, and humans are somewhere 5-6 scales down the intelligence scale, what chance do we have for moral consideration and care by those superior beings? unless we realize we need to care for all beings, and build an AI that cares for all beings...