When should we worry about AI power-seeking?

By Joe_Carlsmith @ 2025-02-19T19:44 (+21)

This is a linkpost to https://joecarlsmith.substack.com/p/when-should-we-worry-about-ai-power

This is a crosspost, probably from LessWrong. Try viewing it there.

null
Dan Oblinger @ 2025-03-01T07:20 (+1)

I agree that we should be considering "grey area" cases where both motivation and ability may not ensure domination.  Indeed I argue we need to further muddy the motivational landscape to include "rouge" human motivations as we find in abundance today.

The first AGI systems won't need to 'go rogue' in order for its behavior to be 'rogue' as viewed by humanity at large.  It is overwhelmingly that likely these systems will be borne-into and harnessed-in-service-of the corporate and military struggles that already exist today.

These systems will be 'rouge' and in conflict by design.  And just like their master's they will be very acquainted with many 'rogue' methods of achieving their goals.

It seems likely humans over time will voluntarily relinquish control and understanding to their AIs in service of the shared goal they have with their AIs.

 

Of course over time the AI's goals may not stay aligned, but even knowing this we are financially and politically incapable of resisting this path, where ever it may lead.  It is an obvious prisoner's dilemma where humans the defect stand to gain enormously.

Ironically I think this nearly unavoidable path LOWERS our chances for a unilateral fast take off.  Long before one is possible many human+machine corporations and militaries will already be very focused on detecting and stopping any OTHER rogue AIs attempting this.

Still I am not sanguine.  This militarized community will be fast moving and hard for humans to understand.  The chances we can manage shape such a society seems lower than our proven inability to shape human society today.  And our competing interests will make unified human action nearly impossible. 

SummaryBot @ 2025-02-19T21:36 (+1)

Executive summary: AI power-seeking becomes a serious concern when three prerequisites are met: (1) the AI has agency and the ability to plan strategically, (2) it has motivations that extend over long time horizons, and (3) its incentives make power-seeking the most rational choice; while the first two prerequisites are likely to emerge by default, the third depends on factors like the ease of AI takeover and the effectiveness of human control strategies.

Key points:

  1. Three prerequisites for AI power-seeking: (1) Agency—AI must engage in strategic planning and execution, (2) Motivation—AI must value long-term outcomes, and (3) Incentives—power-seeking must be a rational choice from the AI’s perspective.
  2. Incentive analysis matters: While instrumental convergence suggests many AI goals may lead to power-seeking, evaluating AI incentives requires understanding available options, likelihood of success, and AI's preferences regarding failure or constraints.
  3. Motivation vs. Option control: Effective AI safety requires both shaping AI motivations (so it avoids power-seeking) and restricting its available options (so power-seeking isn’t feasible).
  4. The risk of decisive strategic advantage (DSA): A single superintelligent AI with overwhelming power could easily take control, but a broader concern is global vulnerability—where AI development makes humanity increasingly dependent on AI restraint or active containment.
  5. Multilateral risks beyond a single AI: Coordination between multiple AI systems (either intentional or unintentional) could pose an even greater risk than a single rogue superintelligence, making alignment and oversight more complex.
  6. AI safety strategies should go beyond extremes: AI alignment efforts often focus on either complete control over AI motivations or extreme security measures, but real-world solutions likely involve a mix of both approaches.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.