Catastrophic Risks from AI #1: Introduction
By Dan H @ 2023-06-22T17:09 (+28)
This is a linkpost to https://arxiv.org/abs/2306.12001
This is a crosspost, probably from LessWrong. Try viewing it there.
nullDan H @ 2023-06-27T04:58 (+8)
A brief overview of the contents, page by page.
1: most important century and hinge of history
2: wisdom needs to keep up with technological power or else self-destruction / the world is fragile / cuban missile crisis
3: unilateralist's curse
4: bio x-risk
5: malicious actors intentionally building power-seeking AIs / anti-human accelerationism is common in tech
6: persuasive AIs and eroded epistemics
7: value lock-in and entrenched totalitarianism
8: story about bioterrorism
9: practical malicious use suggestions
10: LAWs as an on-ramp to AI x-risk
11: automated cyberwarfare -> global destablization
12: flash war, AIs in control of nuclear command and control
13: security dilemma means AI conflict can bring us to brink of extinction
14: story about flash war
15: erosion of safety due to corporate AI race
16: automation of AI research; autnomous/ascended economy; enfeeblement
17: AI development reinterpreted as evolutionary process
18: AI development is not aligned with human values but with competitive and evolutionary pressures
19: gorilla argument, AIs could easily outclass humans in so many ways
20: story about an autonomous economy
21: practical AI race suggestions
22: examples of catastrophic accidents in various industries
23: potential AI catastrophes from accidents, Normal Accidents
24: emergent AI capabilities, unknown unknowns
25: safety culture (with nuclear weapons development examples), security mindset
26: sociotechnical systems, safety vs. capabilities
27: safetywashing, defense in depth
28: story about weak safety culture
29: practical suggestions for organizational safety
30: more practical suggestions for organizational safety
31: bing and microsoft tay demonstrate how AIs can be surprisingly unhinged/difficult to steer
32: proxy gaming/reward hacking
33: goal drift
34: spurious cues can cause AIs to pursue wrong goals/intrinsification
35: power-seeking (tool use, self-preservation)
36: power-seeking continued (AIs with different goals could be uniquely adversarial)
37: deception examples
38: treacherous turns and self-awareness
39: practical suggestions for AI control
40: how AI x-risk relates to other risks
41: conclusion