Joe_Carlsmith

Video and transcript of talk on "Can goodness compete?"
by Joe_Carlsmith @ 2025-07-17 | +30 | 0 comments

Video and transcript of talk on AI welfare
by Joe_Carlsmith @ 2025-05-22 | +22 | 0 comments

The stakes of AI moral status
by Joe_Carlsmith @ 2025-05-21 | +54 | 0 comments

Video and transcript of talk on automating alignment research
by Joe_Carlsmith @ 2025-04-30 | +11 | 0 comments

Can we safely automate alignment research?
by Joe_Carlsmith @ 2025-04-30 | +13 | 0 comments

AI for AI safety
by Joe_Carlsmith @ 2025-03-14 | +34 | 0 comments

Paths and waystations in AI safety
by Joe_Carlsmith @ 2025-03-11 | +22 | 0 comments

When should we worry about AI power-seeking?
by Joe_Carlsmith @ 2025-02-19 | +21 | 0 comments

What is it to solve the alignment problem?
by Joe_Carlsmith @ 2025-02-13 | +25 | 0 comments

How do we solve the alignment problem?
by Joe_Carlsmith @ 2025-02-13 | +28 | 0 comments

Fake thinking and real thinking
by Joe_Carlsmith @ 2025-01-28 | +78 | 0 comments

Takes on "Alignment Faking in Large Language Models"
by Joe_Carlsmith @ 2024-12-18 | +72 | 0 comments

Incentive design and capability elicitation
by Joe_Carlsmith @ 2024-11-12 | +9 | 0 comments

Option control
by Joe_Carlsmith @ 2024-11-04 | +11 | 0 comments

Motivation control
by Joe_Carlsmith @ 2024-10-30 | +18 | 0 comments

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
by Joe_Carlsmith @ 2024-10-28 | +18 | 0 comments

Video and transcript of presentation on Otherness and control in the age of AGI
by Joe_Carlsmith @ 2024-10-08 | +18 | 0 comments

What is it to solve the alignment problem? (Notes)
by Joe_Carlsmith @ 2024-08-24 | +32 | 0 comments

Value fragility and AI takeover
by Joe_Carlsmith @ 2024-08-05 | +39 | 0 comments

A framework for thinking about AI power-seeking
by Joe_Carlsmith @ 2024-07-24 | +44 | 0 comments

Loving a world you don’t trust
by Joe_Carlsmith @ 2024-06-18 | +65 | 0 comments

On “first critical tries” in AI alignment
by Joe_Carlsmith @ 2024-06-05 | +29 | 0 comments

On attunement
by Joe_Carlsmith @ 2024-03-25 | +28 | 0 comments

Video and transcript of presentation on Scheming AIs
by Joe_Carlsmith @ 2024-03-22 | +23 | 0 comments

On green
by Joe_Carlsmith @ 2024-03-21 | +61 | 0 comments

On the abolition of man
by Joe_Carlsmith @ 2024-01-18 | +71 | 0 comments

Being nicer than Clippy
by Joe_Carlsmith @ 2024-01-16 | +26 | 0 comments

An even deeper atheism
by Joe_Carlsmith @ 2024-01-11 | +26 | 0 comments

Does AI risk “other” the AIs?
by Joe_Carlsmith @ 2024-01-09 | +23 | 0 comments

When "yang" goes wrong
by Joe_Carlsmith @ 2024-01-08 | +57 | 0 comments

Deep atheism and AI risk
by Joe_Carlsmith @ 2024-01-04 | +65 | 0 comments

Gentleness and the artificial Other
by Joe_Carlsmith @ 2024-01-02 | +90 | 0 comments

Otherness and control in the age of AGI
by Joe_Carlsmith @ 2024-01-02 | +37 | 0 comments

Empirical work that might shed light on scheming (Section 6 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-11 | +7 | 0 comments

Summing up "Scheming AIs" (Section 5)
by Joe_Carlsmith @ 2023-12-09 | +9 | 0 comments

Speed arguments against scheming (Section 4.4-4.7 of “Scheming AIs")
by Joe_Carlsmith @ 2023-12-08 | +6 | 0 comments

Simplicity arguments for scheming (Section 4.3 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-07 | +6 | 0 comments

The counting argument for scheming (Sections 4.1 and 4.2 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-06 | +9 | 0 comments

Arguments for/against scheming that focus on the path SGD takes (Section 3 of...
by Joe_Carlsmith @ 2023-12-05 | +7 | 0 comments

Non-classic stories about scheming (Section 2.3.2 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-04 | +12 | 0 comments

Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming...
by Joe_Carlsmith @ 2023-12-03 | +6 | 0 comments

The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs")
by Joe_Carlsmith @ 2023-12-02 | +6 | 0 comments

How useful for alignment-relevant work are AIs with short-term goals? (Section 2..
by Joe_Carlsmith @ 2023-12-01 | +6 | 0 comments

Is scheming more likely in models trained to have long-term goals? (Sections 2.2..
by Joe_Carlsmith @ 2023-11-30 | +6 | 0 comments

“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-29 | +7 | 0 comments

Two sources of beyond-episode goals (Section 2.2.2 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-28 | +8 | 0 comments

Two concepts of an “episode” (Section 2.2.1 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-27 | +11 | 0 comments

Situational awareness (Section 2.1 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-26 | +12 | 0 comments

On “slack” in training (Section 1.5 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-25 | +14 | 0 comments

Why focus on schemers in particular (Sections 1.3 and 1.4 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-24 | +10 | 0 comments

A taxonomy of non-schemer models (Section 1.2 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-22 | +6 | 0 comments

Varieties of fake alignment (Section 1.1 of “Scheming AIs”)
by Joe_Carlsmith @ 2023-11-21 | +6 | 0 comments

New report: "Scheming AIs: Will AIs fake alignment during training in order to...
by Joe_Carlsmith @ 2023-11-15 | +71 | 0 comments

Superforecasting the premises in “Is power-seeking AI an existential risk?”
by Joe_Carlsmith @ 2023-10-18 | +114 | 0 comments

In memory of Louise Glück
by Joe_Carlsmith @ 2023-10-15 | +22 | 0 comments

The “no sandbagging on checkable tasks” hypothesis
by Joe_Carlsmith @ 2023-07-31 | +16 | 0 comments

Predictable updating about AI risk
by Joe_Carlsmith @ 2023-05-08 | +135 | 0 comments

[Linkpost] Shorter version of report on existential risk from power-seeking AI
by Joe_Carlsmith @ 2023-03-22 | +49 | 0 comments

A Stranger Priority? Topics at the Outer Reaches of Effective Altruism (my...
by Joe_Carlsmith @ 2023-02-21 | +64 | 0 comments

Seeing more whole
by Joe_Carlsmith @ 2023-02-17 | +124 | 0 comments

Why should ethical anti-realists do ethics?
by Joe_Carlsmith @ 2023-02-16 | +118 | 0 comments

[Linkpost] Human-narrated audio version of "Is Power-Seeking AI an Existential...
by Joe_Carlsmith @ 2023-01-31 | +9 | 0 comments

On sincerity
by Joe_Carlsmith @ 2022-12-23 | +46 | 0 comments

Against meta-ethical hedonism
by Joe_Carlsmith @ 2022-12-02 | +27 | 0 comments

Against the normative realist's wager
by Joe_Carlsmith @ 2022-10-13 | +26 | 0 comments

Video and Transcript of Presentation on Existential Risk from Power-Seeking AI
by Joe_Carlsmith @ 2022-05-08 | +97 | 0 comments

On expected utility, part 4: Dutch books, Cox, and Complete Class
by Joe_Carlsmith @ 2022-03-24 | +7 | 0 comments

On expected utility, part 3: VNM, separability, and more
by Joe_Carlsmith @ 2022-03-22 | +9 | 0 comments

On expected utility, part 2: Why it can be OK to predictably lose
by Joe_Carlsmith @ 2022-03-18 | +8 | 0 comments

On expected utility, part 1: Skyscrapers and madmen
by Joe_Carlsmith @ 2022-03-16 | +22 | 0 comments

Simulation arguments
by Joe_Carlsmith @ 2022-02-18 | +39 | 0 comments

On infinite ethics
by Joe_Carlsmith @ 2022-01-31 | +96 | 0 comments

The ignorance of normative realism bot
by Joe_Carlsmith @ 2022-01-18 | +25 | 0 comments

Morality and constrained maximization, part 2
by Joe_Carlsmith @ 2022-01-12 | +9 | 0 comments

Morality and constrained maximization, part 1
by Joe_Carlsmith @ 2021-12-22 | +13 | 0 comments

Reviews of "Is power-seeking AI an existential risk?"
by Joe_Carlsmith @ 2021-12-16 | +71 | 0 comments

Anthropics and the Universal Distribution
by Joe_Carlsmith @ 2021-11-28 | +18 | 0 comments

On the Universal Distribution
by Joe_Carlsmith @ 2021-10-29 | +25 | 0 comments

SIA > SSA, part 4: In defense of the presumptuous philosopher
by Joe_Carlsmith @ 2021-10-01 | +8 | 0 comments

SIA > SSA, part 3: An aside on betting in anthropics
by Joe_Carlsmith @ 2021-10-01 | +10 | 0 comments

SIA > SSA, part 2: Telekinesis, reference classes, and other scandals
by Joe_Carlsmith @ 2021-10-01 | +10 | 0 comments

SIA > SSA, part 1: Learning from the fact that you exist
by Joe_Carlsmith @ 2021-10-01 | +16 | 0 comments

Can you control the past?
by Joe_Carlsmith @ 2021-08-27 | +46 | 0 comments

In search of benevolence (or: what should you get Clippy for Christmas?)
by Joe_Carlsmith @ 2021-07-20 | +17 | 0 comments

On the limits of idealized values
by Joe_Carlsmith @ 2021-06-22 | +80 | 0 comments

Draft report on existential risk from power-seeking AI
by Joe_Carlsmith @ 2021-04-28 | +88 | 0 comments

Problems of evil
by Joe_Carlsmith @ 2021-04-19 | +31 | 0 comments

The innocent gene
by Joe_Carlsmith @ 2021-04-05 | +16 | 0 comments

The importance of how you weigh it
by Joe_Carlsmith @ 2021-03-29 | +44 | 0 comments

On future people, looking back at 21st century longtermism
by Joe_Carlsmith @ 2021-03-22 | +102 | 0 comments

Against neutrality about creating happy lives
by Joe_Carlsmith @ 2021-03-15 | +95 | 0 comments

Care and demandingness
by Joe_Carlsmith @ 2021-03-08 | +54 | 0 comments

Subjectivism and moral authority
by Joe_Carlsmith @ 2021-03-01 | +15 | 0 comments

Two types of deference
by Joe_Carlsmith @ 2021-02-22 | +10 | 0 comments

Contact with reality
by Joe_Carlsmith @ 2021-02-15 | +47 | 0 comments

Killing the ants
by Joe_Carlsmith @ 2021-02-07 | +232 | 0 comments

Believing in things you cannot see
by Joe_Carlsmith @ 2021-02-01 | +15 | 0 comments

On clinging
by Joe_Carlsmith @ 2021-01-24 | +29 | 0 comments

A ghost
by Joe_Carlsmith @ 2021-01-21 | +2 | 0 comments

Actually possible: thoughts on Utopia
by Joe_Carlsmith @ 2021-01-18 | +86 | 0 comments

Alienation and meta-ethics (or: is it possible you should maximize helium?)
by Joe_Carlsmith @ 2021-01-15 | +19 | 0 comments

The impact merge
by Joe_Carlsmith @ 2021-01-13 | +29 | 0 comments

Shouldn't it matter to the victim?
by Joe_Carlsmith @ 2021-01-11 | +22 | 0 comments

Thoughts on personal identity
by Joe_Carlsmith @ 2021-01-08 | +21 | 0 comments

Grokking illusionism
by Joe_Carlsmith @ 2021-01-06 | +29 | 0 comments

The despair of normative realism bot
by Joe_Carlsmith @ 2021-01-03 | +34 | 0 comments

Thoughts on being mortal
by Joe_Carlsmith @ 2021-01-01 | +60 | 0 comments

Wholehearted choices and "morality as taxes"
by Joe_Carlsmith @ 2020-12-21 | +89 | 0 comments

Joe_Carlsmith

Posts