Joe_Carlsmith
Former senior advisor at Open Philanthropy. Doctorate in philosophy at the University of Oxford. Opinions my own.
Posts
Video and transcript of talk on giving AIs safe motivations
by Joe_Carlsmith @ 2025-09-22 | +10 | 0 comments
by Joe_Carlsmith @ 2025-09-22 | +10 | 0 comments
Video and transcript of talk on "Can goodness compete?"
by Joe_Carlsmith @ 2025-07-17 | +34 | 0 comments
by Joe_Carlsmith @ 2025-07-17 | +34 | 0 comments
Video and transcript of talk on automating alignment research
by Joe_Carlsmith @ 2025-04-30 | +11 | 0 comments
by Joe_Carlsmith @ 2025-04-30 | +11 | 0 comments
Takes on "Alignment Faking in Large Language Models"
by Joe_Carlsmith @ 2024-12-18 | +72 | 0 comments
by Joe_Carlsmith @ 2024-12-18 | +72 | 0 comments
How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
by Joe_Carlsmith @ 2024-10-28 | +18 | 0 comments
by Joe_Carlsmith @ 2024-10-28 | +18 | 0 comments
Video and transcript of presentation on Otherness and control in the age of AGI
by Joe_Carlsmith @ 2024-10-08 | +18 | 0 comments
by Joe_Carlsmith @ 2024-10-08 | +18 | 0 comments
Video and transcript of presentation on Scheming AIs
by Joe_Carlsmith @ 2024-03-22 | +23 | 0 comments
by Joe_Carlsmith @ 2024-03-22 | +23 | 0 comments
Empirical work that might shed light on scheming (Section 6 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-11 | +7 | 0 comments
by Joe_Carlsmith @ 2023-12-11 | +7 | 0 comments
Speed arguments against scheming (Section 4.4-4.7 of “Scheming AIs")
by Joe_Carlsmith @ 2023-12-08 | +6 | 0 comments
by Joe_Carlsmith @ 2023-12-08 | +6 | 0 comments
Simplicity arguments for scheming (Section 4.3 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-07 | +6 | 0 comments
by Joe_Carlsmith @ 2023-12-07 | +6 | 0 comments
The counting argument for scheming (Sections 4.1 and 4.2 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-06 | +9 | 0 comments
by Joe_Carlsmith @ 2023-12-06 | +9 | 0 comments
Arguments for/against scheming that focus on the path SGD takes (Section 3 of...
by Joe_Carlsmith @ 2023-12-05 | +7 | 0 comments
by Joe_Carlsmith @ 2023-12-05 | +7 | 0 comments
Non-classic stories about scheming (Section 2.3.2 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-04 | +12 | 0 comments
by Joe_Carlsmith @ 2023-12-04 | +12 | 0 comments
Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming...
by Joe_Carlsmith @ 2023-12-03 | +6 | 0 comments
by Joe_Carlsmith @ 2023-12-03 | +6 | 0 comments
The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-02 | +6 | 0 comments
by Joe_Carlsmith @ 2023-12-02 | +6 | 0 comments
How useful for alignment-relevant work are AIs with short-term goals? (Section 2..
by Joe_Carlsmith @ 2023-12-01 | +6 | 0 comments
by Joe_Carlsmith @ 2023-12-01 | +6 | 0 comments
Is scheming more likely in models trained to have long-term goals? (Sections 2.2..
by Joe_Carlsmith @ 2023-11-30 | +6 | 0 comments
by Joe_Carlsmith @ 2023-11-30 | +6 | 0 comments
“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-29 | +7 | 0 comments
by Joe_Carlsmith @ 2023-11-29 | +7 | 0 comments
Two sources of beyond-episode goals (Section 2.2.2 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-28 | +8 | 0 comments
by Joe_Carlsmith @ 2023-11-28 | +8 | 0 comments
Two concepts of an “episode” (Section 2.2.1 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-27 | +11 | 0 comments
by Joe_Carlsmith @ 2023-11-27 | +11 | 0 comments
Situational awareness (Section 2.1 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-26 | +12 | 0 comments
by Joe_Carlsmith @ 2023-11-26 | +12 | 0 comments
On “slack” in training (Section 1.5 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-25 | +14 | 0 comments
by Joe_Carlsmith @ 2023-11-25 | +14 | 0 comments
Why focus on schemers in particular (Sections 1.3 and 1.4 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-24 | +10 | 0 comments
by Joe_Carlsmith @ 2023-11-24 | +10 | 0 comments
A taxonomy of non-schemer models (Section 1.2 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-22 | +6 | 0 comments
by Joe_Carlsmith @ 2023-11-22 | +6 | 0 comments
Varieties of fake alignment (Section 1.1 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-21 | +6 | 0 comments
by Joe_Carlsmith @ 2023-11-21 | +6 | 0 comments
New report: "Scheming AIs: Will AIs fake alignment during training in order to...
by Joe_Carlsmith @ 2023-11-15 | +71 | 0 comments
by Joe_Carlsmith @ 2023-11-15 | +71 | 0 comments
Superforecasting the premises in “Is power-seeking AI an existential risk?”
by Joe_Carlsmith @ 2023-10-18 | +114 | 0 comments
by Joe_Carlsmith @ 2023-10-18 | +114 | 0 comments
[Linkpost] Shorter version of report on existential risk from power-seeking AI
by Joe_Carlsmith @ 2023-03-22 | +49 | 0 comments
by Joe_Carlsmith @ 2023-03-22 | +49 | 0 comments
A Stranger Priority? Topics at the Outer Reaches of Effective Altruism (my...
by Joe_Carlsmith @ 2023-02-21 | +64 | 0 comments
by Joe_Carlsmith @ 2023-02-21 | +64 | 0 comments
[Linkpost] Human-narrated audio version of "Is Power-Seeking AI an Existential...
by Joe_Carlsmith @ 2023-01-31 | +9 | 0 comments
by Joe_Carlsmith @ 2023-01-31 | +9 | 0 comments
Video and Transcript of Presentation on Existential Risk from Power-Seeking AI
by Joe_Carlsmith @ 2022-05-08 | +97 | 0 comments
by Joe_Carlsmith @ 2022-05-08 | +97 | 0 comments
On expected utility, part 4: Dutch books, Cox, and Complete Class
by Joe_Carlsmith @ 2022-03-24 | +7 | 0 comments
by Joe_Carlsmith @ 2022-03-24 | +7 | 0 comments
On expected utility, part 3: VNM, separability, and more
by Joe_Carlsmith @ 2022-03-22 | +9 | 0 comments
by Joe_Carlsmith @ 2022-03-22 | +9 | 0 comments
On expected utility, part 2: Why it can be OK to predictably lose
by Joe_Carlsmith @ 2022-03-18 | +8 | 0 comments
by Joe_Carlsmith @ 2022-03-18 | +8 | 0 comments
On expected utility, part 1: Skyscrapers and madmen
by Joe_Carlsmith @ 2022-03-16 | +22 | 0 comments
by Joe_Carlsmith @ 2022-03-16 | +22 | 0 comments
Reviews of "Is power-seeking AI an existential risk?"
by Joe_Carlsmith @ 2021-12-16 | +71 | 0 comments
by Joe_Carlsmith @ 2021-12-16 | +71 | 0 comments
SIA > SSA, part 4: In defense of the presumptuous philosopher
by Joe_Carlsmith @ 2021-10-01 | +9 | 0 comments
by Joe_Carlsmith @ 2021-10-01 | +9 | 0 comments
SIA > SSA, part 3: An aside on betting in anthropics
by Joe_Carlsmith @ 2021-10-01 | +10 | 0 comments
by Joe_Carlsmith @ 2021-10-01 | +10 | 0 comments
SIA > SSA, part 2: Telekinesis, reference classes, and other scandals
by Joe_Carlsmith @ 2021-10-01 | +10 | 0 comments
by Joe_Carlsmith @ 2021-10-01 | +10 | 0 comments
SIA > SSA, part 1: Learning from the fact that you exist
by Joe_Carlsmith @ 2021-10-01 | +17 | 0 comments
by Joe_Carlsmith @ 2021-10-01 | +17 | 0 comments
In search of benevolence (or: what should you get Clippy for Christmas?)
by Joe_Carlsmith @ 2021-07-20 | +17 | 0 comments
by Joe_Carlsmith @ 2021-07-20 | +17 | 0 comments
Draft report on existential risk from power-seeking AI
by Joe_Carlsmith @ 2021-04-28 | +88 | 0 comments
by Joe_Carlsmith @ 2021-04-28 | +88 | 0 comments
On future people, looking back at 21st century longtermism
by Joe_Carlsmith @ 2021-03-22 | +102 | 0 comments
by Joe_Carlsmith @ 2021-03-22 | +102 | 0 comments
Alienation and meta-ethics (or: is it possible you should maximize helium?)
by Joe_Carlsmith @ 2021-01-15 | +19 | 0 comments
by Joe_Carlsmith @ 2021-01-15 | +19 | 0 comments