Risk Alignment in Agentic AI Systems

By Hayley Clatterbuck, arvomm @ 2024-10-01T22:51 (+31)

This is a linkpost to https://static1.squarespace.com/static/6035868111c9bd46c176042b/t/66fc78b1a8062a5674c05e25/1727822001879/Risk+Alignment.pdf

Agentic AIs—AIs that are capable and permitted to undertake complex actions with little supervision—mark a new frontier in AI capabilities and raise new questions about how to safely create and align such systems with users, developers, and society. Because agents’ actions are influenced by their attitudes toward risk, one key aspect of alignment concerns the risk profiles of agentic AIs. What risk attitudes should guide an agentic AI’s decision-making? What guardrails, if any, should be placed on the range of permissible risk attitudes? What are the ethical considerations involved when designing systems that make risky decisions on behalf of others?

Risk alignment will matter for user satisfaction and trust, but it will also have important ramifications for society more broadly, especially as agentic AIs become more autonomous and are allowed to control key aspects of our lives. AIs with reckless attitudes toward risk (either because they are calibrated to reckless human users or are poorly designed) may pose significant threats. They might also open “responsibility gaps” in which there is no agent who can be held accountable for harmful actions.

In this series of reports, we consider ethical and technical issues that bear on designing agentic AI systems with acceptable risk attitudes.

In the first report, we examine the relationship between agentic AIs and their users.

In the second report, we examine the duties and interests of AI developers:

In the third report, we examine technical questions about how we might develop proxy AIs that are calibrated to the risk attitudes of their users:

Learning processInput to learning processRisk data
Imitation learningObserved behaviorsActual choice behavior
PromptingNatural language instructionSelf-report
Preference modelingRatings of optionsLottery preferences

Acknowledgments

This report is a project of Rethink Priorities. The authors are Hayley Clatterbuck, Clinton Castro, and Arvo Muñoz Morán. Thanks to Jamie Elsey, Bob Fischer, David Moss, Mattie Toma and Willem Sleegers for helpful discussions and feedback. This work was supported by funding from OpenAI under a Research into Agentic AI Systems grant. If you like our work, please consider subscribing to our newsletter. You can explore our completed public work here.


SummaryBot @ 2024-10-02T18:46 (+1)

Executive summary: Aligning the risk attitudes of agentic AI systems with users and society raises complex ethical and technical challenges that require balancing user preferences, developer responsibilities, and societal impacts.

Key points:

  1. Two models for agentic AI risk attitudes are proposed: Proxy Agent (mimicking user preferences) and Off-the-Shelf Tool (constrained for desirable outcomes).
  2. AI developers must navigate legal, reputational, and ethical liabilities while creating systems of shared responsibility with users.
  3. Calibrating AI risk attitudes to users involves eliciting behaviors/judgments, modeling underlying attitudes, and designing appropriate actions.
  4. Actual behaviors and self-reported general risk attitudes are more reliable indicators than hypothetical choices or lottery preferences.
  5. Pre-existing risk classes may outperform learning-based calibration methods due to limitations in available user risk preference data.
  6. Balancing user satisfaction, developer duties, and societal impacts is crucial for responsible agentic AI development.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.