Blog update: Reflective altruism

By David Thorstad @ 2024-12-15T16:05 (+107)

1. Introduction

My blog, Reflective Altruism, aims to use academic research to drive positive change within and around the effective altruism movement. Part of that mission involves engagement with the effective altruism community. For this reason, I try to give periodic updates on blog content and future directions (previous updates: here and here)

In today’s post, I want to say a bit about new content published in 2024 (Sections 2-3) and give an overview of other content published so far (Section 4). I’ll also say a bit about upcoming content (Section 5) as well as my broader academic work (Section 6) and talks (Section 7) related to longtermism. Section 8 concludes with a few notes about other changes to the blog.

I would be keen to hear reactions to existing content or suggestions for new content. Thanks for reading.

2. New series this year

I’ve begun five new series since last December.

  1. Against the singularity hypothesis: One of the most prominent arguments for existential risk from artificial agents is the singularity hypothesis. The singularity hypothesis holds roughly that self-improving artificial agents will grow at an accelerating rate until they are orders of magnitude more intelligent than the average human. I think that the singularity hypothesis is not on as firm ground as many advocates believe. My paper, “Against the singularity hypothesis,” makes the case for this conclusion. I’ve written a six-part series Against the singularity hypothesis summarizing this paper. Part 1 introduces the singularity hypothesis. Part 2 and Part 3 together give five preliminary reasons for doubt. The next two posts examine defenses of the singularity hypothesis by Dave Chalmers (Part 4) and Nick Bostrom (Part 5). Part 6 draws lessons from this discussion.
  2. Harms: Existential risk mitigation efforts have important benefits but also identifiable harms. This series discusses some of the most important harms of existential risk mitigation efforts. Part 1 discusses distraction from other pressing issues. Part 2 discusses surveillance. Part 3 discusses delayed technological development. Part 4 discusses inequality.
  3. Human biodiversity: Human biodiversity (HBD) is the latest iteration of modern race science. This series discusses the impact of HBD on effective altruism and adjacent communities, as well as the harms done by debating and propounding race science. Part 1 introduces the series. Part 2 focuses on events at Manifest 2023 and Manifest 2024. Part 3 discusses Richard Hanania. Part 4 discusses Scott Alexander.
  4. Papers I learned from: This series highlights papers that have informed my own thinking and draws attention to what might follow from them. Part 1 discusses a paper on temporal discounting by Harry Lloyd. Part 2 discusses a paper by Richard Pettigrew drawing a surprising consequence for longtermism on some leading approaches to risk-averse decisionmaking. Part 3 discusses a reply to Pettigrew by Nikhil Venkatesh and Kacper Kowalczyk.
  5. The scope of longtermism: Longtermists hold that in a large class of decision situations, the best thing we can do is what is best for the long-term future. The scope question for longtermism asks: how large is the class of decision situations for which this is true? This series discusses my paper on the scope of longtermism. I argue that the scope of longtermism may be narrower than many longtermists suppose. Part 1 introduces the series. Part 2 introduces one concern: rapid diminution. Part 3 discusses a second concern: washing out. The full paper contains a third concern and further discussion, both of which will be reflected in later posts in this series.

3. Continuations of old series

In addition to these new series, I have also added to three existing series this year.

  1. Billionaire philanthropy: Effective altruism relies increasingly on a few billionaire donors to sustain its operations. In this series, I ask what drives billionaire philanthropists, how they are taxed and regulated, and what sorts of influence they should be allowed to wield within a democratic society. An initial series of posts from late 2022 to early 2023 discussed the relationship between philanthropy and democracy (Part 2), patient philanthropy (Part 3), philanthropic motivations (Part 4), sources of wealth (Part 5), and extravagant spending (Part 6). After a long hiatus, I recently added the first of two posts on the role of donor discretion (Part 7).
  2. Epistemics: Effective altruists use the term `epistemics’ to describe practices that shape knowledge, belief and opinion within a community. This series focuses on areas in which community epistemics could be productively improved. This year, I added new posts discussing the role of legitimate authority within the effective altruism movement and a distinction between two types of decoupling.
  3. Exaggerating the risks: One of the core series of this blog, Exaggerating the risks argues that many existential risk claims have been exaggerated. Part 1 introduced the series. Parts 2-5 (sub-series: “Climate risk”) looked at climate risk. Parts 6-8 (sub-series: “AI risk”) looked at the Carlsmith report on power-seeking AI. Parts 9-17 (sub-series: “Biorisk”) looked at biorisk. Since last December, I have rounded out the discussion of biorisk by responding to arguments by Toby Ord (Part 13) and Will MacAskill (Part 14), as well as claimed biorisks from LLMs (Part 15 and Part 16). I concluded the subseries and drew lessons in Part 17.

4. Additional series

Not every series was expanded in 2024. In two cases, this happened because the series was complete.

  1. Existential risk pessimism and the time of perils: This series is based on my paper “Existential risk pessimism and the time of perils” (published here; note title change). The paper develops a tension between two claims: Existential Risk Pessimism (levels of existential risk are very high) and the Astronomical Value Thesis (efforts to reduce existential risk have astronomical value). It explores the Time of Perils hypothesis as a way out of the tension. Part 1 introduces the tension. Part 2 reviews failed solutions. Part 3 reviews a better solution: the Time of Perils hypothesis. Parts 4-6 review arguments for the time of perils hypothesis and argue that they do not succeed. Part 4 discusses space settlement. Part 5 discusses the existential risk Kuznets curve. Part 6 discusses wisdom growth. Part 7 discusses an application to the moral mathematics of existential risk, pursued further in my series on mistakes in the moral mathematics of existential risk. Part 8 discusses further implications. Part 9 discusses objections and replies. Special thanks are due to Toby Ord, who introduced a substantial fraction of the models used in this paper, as well as to Tom Adamczewski, who wrongly refuses credit for advancing these models.
  2. Mistakes in the moral mathematics of existential risk: This series is based on my paper “Mistakes in the moral mathematics of existential risk”. The paper discusses three mistakes in the way that existential risk mitigation efforts are often valued. Correcting these mistakes reduces the expected value of existential risk mitigation efforts and also reveals important areas for future work. Part 1 discusses the first mistake: confusing cumulative existential risk with risk during a smaller time period. Part 2 discusses a second mistake, ignoring background risk, raised in my earlier paper on existential risk pessimism. Part 3 discusses the final mistake: ignoring population dynamics. Part 4 generalizes this discussion to more optimistic assumptions about population dynamics. Part 5 discusses implications.

Several other series are on an extended pause.

  1. Academics review What we owe the future: In this series, I draw lessons from some of my favorite academic reviews of What we owe the future. Part 1 discusses Kieran Setiya’s review, focusing on the intuition of neutrality in population ethics. Part 2 discusses Richard Chappell’s review, which found many things to like in the book. Part 3 discusses Regina Rini’s review, focusing on the demandingness of longtermism, cluelessness, and the inscrutability of existential risk. I think that this series will remain paused for two reasons. First, several years have passed since the publication of What we owe the future. Second, I have not been overly impressed with the quality of many recent reviews.
  2. Belonging: This series discusses questions of inclusion and belonging within and around the effective altruism movement, with a focus on identifying avenues for positive and lasting change. Part 1, Part 2 and Part 3 discuss the Bostrom email and subsequent controversy. Part 4 discusses the TIME magazine investigation into sexual harassment and abuse within the effective altruism movement. I was very happy to see that this series did not grow in 2024, which was in many ways a calmer year than 2023. I hope that fewer posts will be necessary in this series going forward.
  3. The good it promises: This series is based on a volume of essays entitled The good it promises, the harm it does: Critical essays on effective altruism. The volume brings together a diverse collection of scholars, activists and practitioners to critically reflect on effective altruism. In this series, I draw lessons from papers contained in the volume. Part 1 introduces the book. Part 2 discusses Simone de Lima’s work on colonialism in vegan advocacy. Part 3 discusses Carol J. Adams’ work on lessons from feminist care ethics. Part 4 discusses Lori Gruen’s work on systematic change. Part 5 discusses work by Andrew deCoriolis and colleagues on strategies for animal advocacy. Part 6 and Part 7 discuss Alice Crary’s critique of effective altruism. I think that I have probably written all that I care to write about this book, though I may continue this series if there is interest in examining any specific chapters that have not yet been discussed.

5. Upcoming content

In 2025, I hope to add to existing series as well as to introduce several new series. Here are three series that I would like to introduce, if possible, in 2025:

  1. Beyond longtermism: When I came to Oxford in 2020, it was widely thought that a traditional package of consequentialist-adjacent views such as consequentialism, totalism, and fanaticism led almost inevitably to longtermism. The primary aim of my research on longtermism has been to show how someone who likes science, mathematics, decision theory, and the elements of a traditional consequentialist package may nonetheless not be convinced by longtermism. I am writing a book, Beyond longtermism, to weave together a series of challenges which together may amount to a case against longtermism that is compelling even to those with normative views similar to my own. I hope to start a blog series on this book in 2025, with the caveat that this series may need to be delayed to appease potential publishers.
  2. Getting it right: There are many things I would like to change about the effective altruism movement. But there are also important things that effective altruists get right. I spoke briefly in 2022 about ten things that I think the effective altruism movement gets right. I hope to make many of these points at post-length next year.
  3. What power-seeking theorems do not show: A leading argument for existential risk from artificial intelligence is the argument from power-seeking. This argument claims that a wide variety of artificial agents would find power conducive to their goals and hence would pursue it, leading to the permanent and existentially catastrophic disempowerment of humanity. A spate of recent power-seeking theorems aims to show formally that a wide variety of artificial agents would seek power in this way. I don’t think that these theorems succeed. My paper, “What power-seeking theorems do not show,” explains why I am not convinced. This series would discuss the paper together with some content, such as coverage of an early power-seeking theorem out of MIRI, that will likely be cut by journal referees.

I also hope to add to existing series in 2025. Here are some examples of the additions that I hope to make:

  1. Harms: I hope to continue my discussion of potential harms from existential risk mitigation efforts. Potential harms that I would like to discuss include regulatory capture and opportunity cost.
  2. Human biodiversity: There is a lot more to cover in this series. My plan is to continue by discussing rationalist-adjacent venues, focusing for several posts on the communities surrounding Astral Codex Ten and LessWrong. Then I will discuss the role of HBD in more traditional effective altruist venues, such as the EA Forum. Thankfully, HBD plays a smaller role in many corners of the effective altruist community than it does in the rationalist community. However, I think it is worth noting places where HBD still plays an important role.
  3. Papers I learned from: I would like to continue this series by allowing the authors of papers to speak for themselves, as in Part 3 of the series. I have tentatively secured a guest post by Dmitri Gallow on his paper about instrumental convergence. After that, I have no firm plans, though I am open to suggestions.
  4. The scope of longtermism: Parts 1-3 of this series have been written. These posts cover approximately the first half of my paper. I hope to wrap up the second half of this series in 2025.
  5. Billionaire philanthropy: At the very least, I hope to round out my discussion of donor discretion from Part 7. I may write further posts in this series, but it is not my highest priority at the moment.
  6. Epistemics: The next post in this series will discuss the practice of ironic authenticity, in which authors use irony to convey politically sensitive messages that the author likely endorses while providing a degree of plausible deniability. After that, I hope to write something about the declining role of global priorities research in many corners of the effective altruism movement.
  7. Exaggerating the risks: I certainly hope to add many posts to this series in 2025. I am not entirely sure which posts I would like to add. The natural continuation of this series would be a sub-series on AI risk. However, my existing academic papers cover most of the topics about AI risk on which I am comfortable speculating. I think it may be helpful to write a sub-series on lessons from previous discussions of existential risk, such as past overestimates of risk from self-replicating nanobots, the history of technology panics, and an argument by Petra Kosonen for an optimistic meta-induction from the failure of past risk predictions to the likely failure of current risk predictions.

 6. Academic papers

I am, for the most part, a traditional academic. The core of my work is my academic research, expressed in academic papers and sometimes in scholarly monographs. I think that much of my academic work is of a significantly higher quality than my blogging, and I try when possible to help to make the contents of my academic papers accessible, for example by writing blog series about them.

So far, I have written five academic papers about longtermism, four of which have been published. Here are the papers and their abstracts. I would encourage interested readers to have a look at the papers. I’ve linked to penultimate drafts of published papers so that they will not be behind a paywall (except when the paper itself is open access).

  1. Against the singularity hypothesis: The singularity hypothesis is a hypothesis about the future of artificial intelligence on which self-improving artificial agents will quickly become orders of magnitude more intelligent than the average human. Despite the ambitiousness of its claims, the singularity hypothesis has been defended at length by leading philosophers and artificial intelligence researchers. In this paper, I argue that the singularity hypothesis rests on undersupported growth assumptions. I show how leading philosophical defenses of the singularity hypothesis fail to overcome the case for skepticism. I conclude by drawing out philosophical and policy implications of this discussion.
  2. High risk, low reward: A challenge to the astronomical value of existential risk mitigation: Many philosophers defend two claims: the astronomical value thesis that it is astronomically important to mitigate existential risks to humanity, and existential risk pessimism, the claim that humanity faces high levels of existential risk. It is natural to think that existential risk pessimism supports the astronomical value thesis. In this paper, I argue that precisely the opposite is true. Across a range of assumptions, existential risk pessimism significantly reduces the value of existential risk mitigation, so much so that pessimism threatens to falsify the astronomical value thesis. I argue that the best way to reconcile existential risk pessimism with the astronomical value thesis relies on a questionable empirical assumption. I conclude by drawing out philosophical implications of this discussion, including a transformed understanding of the demandingness objection to consequentialism, reduced prospects for ethical longtermism, and a diminished moral importance of existential risk mitigation.
  3. Mistakes in the moral mathematics of existential risk: Longtermists have recently argued that it is overwhelmingly important to do what we can to mitigate existential risks to humanity. I consider three mistakes that are often made in calculating the value of existential risk mitigation. I show how correcting these mistakes pushes the value of existential risk mitigation substantially below leading estimates, potentially low enough to threaten the normative case for existential risk mitigation. I use this discussion to draw four positive lessons for the study of existential risk.
  4. The scope of longtermism: Longtermism is the thesis that in a large class of decision situations, the best thing we can do is what is best for the long-term future. The scope question for longtermism asks: how large is the class of decision situations for which this is true? In this paper, I suggest that the scope of longtermism may be narrower than many longtermists suppose. I identify a restricted version of longtermism: swamping axiological strong longtermism (swamping ASL). I identify three scope-limiting factors - probabilistic and decision-theoretic phenomena which, when present, tend to reduce the prospects for swamping ASL. I argue that these scope-limiting factors are often present in human decision problems, then use two case studies from recent discussions of longtermism to show how the scope-limiting factors lead to a restricted, if perhaps nonempty, scope for swamping ASL.
  5. What power-seeking theorems do not show: Recent years have seen increasing concern that artificial intelligence may soon pose an existential risk to humanity. One leading ground for concern is that artificial agents may be power-seeking, aiming to acquire power and in the process disempowering humanity. A range of power-seeking theorems seek to give formal articulation to the idea that artificial agents are likely to be power-seeking. I argue that leading theorems face five challenges, then draw lessons from this result.

Finally, a quick thanks to the Survival and Flourishing Fund for funding my work on “What power-seeking theorems do not show.”

 7. Talks and podcasts


Not everyone likes reading long papers or blog posts. I know that my writing is not especially accessible, so I have tried in the past several years to make the main ideas of my papers accessible by increasing the number of recorded talks and podcasts discussing my work. Here are some recent talks I have given on my papers:

  1. Against the singularity hypothesis: ANU; AI Safety Reading Group; EAGxVirtual 2024.
  2. High risk, low reward: A challenge to the astronomical value of existential risk mitigation: EAGxCambridge (full); EAGxOxford (short).
  3. Mistakes in the moral mathematics of existential risk: Recorded myself
  4. The scope of longtermism: 7th Oxford Workshop on Global Priorities Research

I’ve also done podcasts at The Gradient and Critiques of EA. I’ll be recording an episode of Bio(un)ethical with Leah Pierson and Sophie Gibert in January 2025.

8. Other blog changes

There have been two main changes to the blog this year. First, I’ve decreased my posting frequency from weekly to biweekly. I did this because I am committed to providing the highest quality content that I can, and after starting a new job I was no longer confident that I could continue to provide high-quality content on a weekly basis. I think that I will probably continue posting biweekly in the future.

Second, I’ve finished migrating the site to its new domain, reflectivealtruism.com. This completes my effort to bring the branding of the blog in line with its mission of driving positive change within and around the effective altruism movement. Thanks to the many of you who contributed to the renaming and transition process.


mhendric🔸 @ 2024-12-16T16:24 (+33)

I love your blog and reliably find it to provide the highest-quality EA criticism I have found. I shifted my view on a handful of issues based on it. 

It may be helpful for non-philosophy readers to know that the journals these paper are published in are very impressive. For example, Ethics (Mistakes in Moral Math of Longtermism paper) is the most well-regarded ethics journal I know of in our discipline, akin to e.g. what Science or Nature would be for a natural scientist. 

I am somewhat disheartened that those papers did not gain visible uptake from key players in the EA space (e.g. 80K, Openphil), especially since it was published when most EA organizations strike me as moving strongly towards longtermism/AI risk. My sense is that it was briefly acknowledged, then simply ignored. I don't think that the same would have happened with e.g. a Science or Nature paper.

To stick with the Mistakes in Moral Math paper, for example: I think it puts forward a very strong argument against the very few explicit numerical models of EV calculations for longtermist causes. A natural longtermist response would be to either adjust models or present new models, incorporating factors such as background risk that are currently not factored in. I have not seen any such models. Rather, I feel like longtermist pitches often get very handwavey when pressed on explicit EV models that compare their interventions to e.g. AMF or Give Directly. I take it to be a central pitch of your paper that it is very bad that we have almost no explicit numerical models, and that those we have neglect crucial factors_. To me, it seems like that very valid criticism went largely unheard. I have not seen new numerical EV calculations for longtermist causes since publication. This may of course be a me problem - please send me any such comparative analyses you know!

 I don't want to end on such a gloomy note - even if I were right that these criticisms are valid, and that EA fails to update on them, I am very happy that you do this work. Other critics often strike me as arguing in bad faith or being fundamentally misinformed - it is good to have a good-faith, quality critique to discuss with people. And in my EA-adjacent house, we often discuss your work over beers and food and greatly enjoy it haha. Please keep it coming!
 

justsaying @ 2024-12-17T03:00 (+29)

I think your coverage of Scott Alexander's alleged association with HBD is both unfair and unethical. This is evidenced in part by the fact that you lead your post about him with an allegedly leaked private email. You acknowledge deep into your post that you are largely basing your accusations on various unconfirmed sources yet you repeatedly summarize your claims about him without such disclaimers. Even if the email was real, it seems to form almost the entire basis of your case against him and you don't know the context of a private email. Taking the email at face value, it does not say the things you imply it says.

I don't know Scott personally but I have been a reader of his blog and various associated forums for many years. Contrary to your characterization, he has in fact actively pushed back against a lot of discussion around HBD on his blog and related spaces. I think your posting about him undermines your credibility elsewhere.

David Thorstad @ 2024-12-17T14:41 (+38)

Thanks! 

My discussion of Alexander is based on a number of sources. One of the primary sources is the email that you mention, though I do not think it is accurate to say that this source forms almost the entire basis of my case.

I do not publish documents whose authenticity I have significant reason to doubt, and in this case I excluded several documents because I could not confirm their authenticity to my satisfaction. In the case of the email that you discuss, my reasons for posting the email are as follows.

First, to my knowledge Alexander has never denied authorship of the email. This would, presumably, be a natural first step for someone who did not write it. Second, I asked publicly for credible denials by anyone, not just Alexander, of the authenticity of the email and found none. Third, I wrote personally to Alexander and offered him the opportunity to comment on the authenticity of the email. Alexander declined.

I am surprised by your comments about disclaimers. My intention was to include enough disclaimers to make a tax lawyer blush. For example:

First, The quote which leads the article does include a disclaimer: it is cited as stemming from an "(Alleged) email to Topher Brennan". 

Second, The preliminary notes to the article directly express my attitudes towards authenticity, and offer to publicly apologize if I am wrong in attributing views to Alexander or anyone else. (So far, no one has offered any evidence that I was wrong, or even attempted to offer any sort of denial, evidenced or not.). I wrote: "A note about authenticity. I have done my best to verify that the cited views in this post are the work of the authors they are attributed to. This is difficult because Alexander has alleged that some of the most troubling comments attributed to him are not his. My policy has been to stick with the sources whose authenticity I am most confident in, and to the best of my knowledge the authenticity of all passages quoted in this post has not been publicly disputed by their alleged authors. (Alexander declined a request for comment on the authenticity of passages quoted in this post). However, I am conscious of the possibility of error. While I have closed comments on all posts in this series, I am highly open to correction on any matters of fact, and if I am wrong in attributing any views to an individual I will correct them and publicly apologize. You can reach me at reflectivealtruismeditor@gmail.com."

Third, immediately before presenting Alexander's email I write: "I am not able to confirm the veracity of the email, so readers are asked to use their own judgment, but I am also not aware of credible attempts by anyone, including Alexander, to deny the veracity of this email (Alexander declined to comment on the veracity of the email. I did check for other credible attempts to deny its veracity.)."

These are not the only disclaimers included in the article, but I think that they are more than adequate and included in the proper places. 

You seem skeptical of some of the implications that I draw from this email. I quote the email in full, then draw six implications, giving evidence for each. If you are concerned about some of these implications, you are of course welcome to say which implications you are concerned about and give reasons to think that they do not follow. 

I do acknowledge that Alexander has pushed back against some of the worst excesses of race science, including neoreaction. For example, I wrote: "I would be remiss if I did not remind readers that not all of Alexander’s contributions to this space are bad. Alexander has penned one of the most successful and best-known criticisms of the excesses of the reactionary movement, in the form of his 2013 “Anti-reactionary FAQ“, which has inspired useful follow-ups from others. Alexander does write some posts supportive of the cause of social justice, such as his “Social justice for the highly demanding of rigor,” though the implication that rigor is not already present in discussions of social justice may not be appreciated by all readers. Alexander has drawn attention to the real and important phenomenon of sexual harassment perpetrated by women, or against men. And we will see below that Alexander calls out Richard Hanania for his strange obsession with office romance." 

All of these behaviors are consistent with the conclusions that I draw in the article. I do not, and will not say that Alexander has never done anything to push back against racism or sexism. That would be a bald-faced lie. It is possible to push back against some bad behaviors and beliefs on these fronts while engaging in other troubling behaviors and holding other troubling beliefs. 

Jason @ 2024-12-17T14:47 (+34)

David offered Scott an opportunity to comment, and Scott turned it down. You can see that David also offered Julia an opportunity to comment on a short quote included in the blog post, and he ran the four-paragraph statement she provided which included context for the quoted material. Moreover, the leaked e-mail is not new, and Scott has had both the time and the platform to contextualize it if he wished to do so. 

If Scott has made public statements clearly inconsistent with the e-mail that David missed, then it would be much more helpful to show the receipts rather than vaguely assert that he has "actively pushed back against a lot of discussion around HBD on his blog and related spaces." Even accepting that assertion at face value, a history of pushing back against some unspecified elements of HBD discussion isn't inconsistent with the views expressed in the e-mail or otherwise discussed in David's post.

David Mathers🔸 @ 2024-12-17T12:12 (+16)

"I think your posting about him undermines your credibility elsewhere." This seems worryingly like epistemic closure to me (though it depends a bit what "elsewhere" refers to.) A lot of Thorstad's work is philosophical criticism of longtermist arguments, and not super-technical criticism either. You can surely just assess that for yourself rather than discounting it because of what he said about an unrelated topic, unless he was outright lying. I mostly agree with Thorstad's conclusions about Scott's views on HBD, but whilst that makes me distrust Scott's political judgement, it doesn't effect my (positive) view of the good stuff Scott has written about largely unrelated topics like whether antidepressants work, or the replication crisis.  

I'd also say that the significance of Scott sometimes pushing back against HBD stuff is very dependent on why he pushes back. Does he push back because he thinks people are spreading harmful ideas? Or does he push back because he thinks if the blog becomes too associated with taboo claims it will lose influence, or bring him grief personally? The former would perhaps indicate unfairness in Thorstad's portrayal of him, but the latter certainly would not. In the leaked email (which I think is likely genuine, or he'd say it wasn't, but of course we can't be 100% sure) he does talk about stratigising to maintain his influence with liberals on this topic. My guess, as a long-time reader is that it's a bit of both. I don't think Scott is sympathetic to people genuinely wanting to hurt Black people, and I'm sure there are Reactionary claims about race that he thinks are just wrong. But he's also very PR conscious on this topic in my view. And it's hard to see why he's had so many HBD-associated folk on his blogroll if he doesn't want to quietly spread some of the ideas. 

Jonas Hallgren @ 2024-12-18T12:45 (+21)

Thank you for this post David! 

I've from time to time engaged with my friends in discussion about your criticisms of longtermism and some existential risk calculations. I found that this summary post of your work and interaction calrifies my perspective on the general "inclination" that you have in engaging with the ideas, one that seems like a productive one! 

Sometimes, I felt that it didn't engage with some of the core underlying claims of longtermism and exisential risk which did annoy me. 

I want to respect the underlying time spend assymmetry of the following question as I feel I could make myself less ignorant if I had the time to spend which I feel I currently do not have. But what are your thoughts on Toby Ord's perspective and posts on existential risk?: https://forum.effectivealtruism.org/posts/XKeQbizpDP45CYcYc/on-the-value-of-advancing-progress
and:
https://forum.effectivealtruism.org/posts/hh7bgsDzP6rKZ5bbW/robust-longterm-comparisons

I felt that some of the arguments where about discount rates and that they didn't really make that much moral sense to me, neither did person-affecting views. I have hard time seeing the arguments for them and maybe that's just the crux of the matter. 

The following will be unfair to say as I haven't spent the time required to fully understand your models but I sometimes feel that there are deeper underlying assumptions and questions that you pass by in your arguments. 

I will be going to a domain I know well, AI Safety. For example, I agree with the power-seeking arguments not being fully true, especially not the early papers yet it doesn't engage with later follow up work such as:
https://arxiv.org/pdf/2303.16200

Finally I believe that for the power-seeking claim, there's a large amount of evidence for power-seeking within real world systems. For me it seems an overstep to reject power-seeking due to MIRI work? 

You can redefine power-seeking itself as minimizing free energy which is in itself a theory of predictive processing or Active Inference and that has showed to have remarkable predictive capacity for saying useful things about systems that are alive. Yes a specific interpretation of power-seeking may not hold true but for me it is throwing the baby out with the bathwater. 

I would love to hear your thoughts here and I'm looking forward to more good-faith discussions! (this is not sarcasm but I'm genuinely happy that you're engaging with good faith arguments!)

Edit: I do want to clarify that I do not believe that any AI system will converge towards instrumental goals and that it does make sense to question the foundations of the AI Safety assumptions and that I appluade you for doing so. It is rather a question of how much it will do so and under what conditions, in what systems it will do so. (I also made the language less combative)

David Thorstad @ 2024-12-18T16:40 (+8)

Thanks Jonas! I really appreciate your constructive engagement. 

I’m not very sympathetic to pure time preference – in fact, most philosophers hate pure time preference and are leading the academic charge against it. I’m also not very sympathetic to person-affecting views, though many philosophers whom I respect are more sympathetic. I don’t rely on either of these views in my arguments, because I think they are false.


In general, my impression on the topic of advancing progress is that longtermists have not yet identified plausible interventions which could do this well enough to be worth funding. There has certainly been a lot written about the value of advancing progress, but not many projects have been carried out to actually do this. That tends to suggest to me that I should hold off on commenting about the value of advancing progress until we have more concrete proposals on the table. It also tends to suggest that advancing progress may be hard. 

I think that covers most of the content of Toby’s posts (advancing progress and temporal discounting). Perhaps one more thing that would be important to stress is that I would really like to see Oxford philosophers following the GPI model of putting out smaller numbers of rigorous academic papers rather than larger numbers of forum posts and online reports. It’s not very common for philosophers to engage with one another on internet forums – such writing is usually regarded as a form of public outreach. My first reaction to EA Forum posts by Toby and others would be that I’d like to see them written up as research papers. When ideas are written up online before they are published in academic venues, we tend to be skeptical that the research has actually been done or that it would hold up to scrutiny if it were.

On power-seeking, one of the most important features of academic papers is that they aim to tackle a single topic in depth, rather than a broad range of topics in less depth. So for example, my power-seeking paper does not engage with evolutionary arguments by Hendrycks and others because they are different arguments and would need to be addressed in a different paper.

To be honest, my selection of arguments to engage with is largely driven by the ability of those making the arguments to place them in top journals and conferences. Academics tend to be skeptical of arguments that have not cleared this bar, and there is little career value in addressing them. The reason why I addressed the singularity hypothesis and power-seeking is that both have had high-profile backing by quality authors (Bostrom, Chalmers) or in high-profile conferences (NeurIPS). I’d be happy to engage with evolutionary arguments when they clear this bar, but they’re a good ways off yet and I’m a bit skeptical that they will clear it. 

I don’t want to legislate a particular definition of power, because I don’t think there’s a good way to determine what that should be. My concern is with the argument from power-seeking. This argument needs to use a notion of power such that AI power-seeking would lead to existentially catastrophic human disempowerment. We can define power in ways that make power-seeking easier to establish, but this mostly passes the buck, since we will then need to argue why power-seeking in this sense would lead to existentially catastrophic human disempowerment.

One example of this kind of buck-passing is in the Turner et al. papers. For Turner and colleagues, power is (roughly) the ability to achieve valuable outcomes or to promote an agent’s goals. It’s not so hard to show that agents will tend to be power-seeking in this sense, but this also doesn’t imply that the results will be existentially catastrophic without further argument, and that argument will probably take us substantially beyond anything like the traditional argument from power-seeking. 

Likewise, we might, as you suggest, think about power-seeking as free-energy minimization and adopt a view on which most or all systems seek to minimize free energy. But precisely because this view takes free energy minimization to be common, it won’t put us very far on the way towards arguing that the results will be existentially catastrophic, nor even that they will involve human disempowerment (in the sense of failure to minimize free energy). 

I agree with you that it is unfair to conclude much from the failure of MIRI arguments. The discussion of the MIRI paper is cut from the version of the power-seeking paper that I submitted to a journal. Academic readers are likely to have had high credence that the MIRI argument was bad before reading it, so they will not update much on the failure of this argument. I included this argument in the extended version of the paper because I know that some effective altruists were moved by it, and also because I didn’t know of any other formal work that broke substantially from the Turner et al. framework. I think that the MIRI argument is not anywhere near as good as the Turner et al. argument. While I have many disagreements with Turner et al., I want to stress that their results are mathematically sophisticated and academically rigorous. The MIRI paper is netiher. 

I hope that helps! 

Jonas Hallgren @ 2024-12-19T08:53 (+6)

Thank you for that substantive response, I really appreciate it! It was also very nice that you mentioned the Turner et.al definitions, I wasn't expecting that. 

(Maybe write a post on that? There's a comment that mentions uptake from major players in the EA ecosystem and maybe if you acknowledge you understand the arguments they would be more sympathetic? Just a quick thought but it might be worth engaging there a bit more?)

I just wanted to clarify some of the points I was trying to make yesterday as I do realise that they didn't all get across as I wanted them to. 

I completely agree with you on the advancing progress point, I personally am quite against it from a "general"-level, I do not believe that we will be able to counterfactually change the "rowing" speed that much in the grand scheme of things. I also believe that is the conclusion of Toby's posts if I remember correctly. Toby was rather stating that existential risk reduction is worth a lot compared to any progress that we might be able to make. "Steering" away from the bad stuff is worth more. (That's the implicit claim from the modelling even though he's as epistemically humble as you philosophers always are (which is commendable!).)

Now for the power-seeking stuff. I appreciate your careful reasoning about these things and I see what you mean in that there's no threat model from that claim in itself. If we say that the classical way it is construed is something that is equivalent to minimizing free energy, this is a tautological statement and doesn't help for existential risk. 

I think I can agree with you that we're not clear enough about the existential risk angle to have a clearly defined goal for what to do. I do think there's an argument there but that we have to be quite clear with how we're defining it for it to make foundational sense. A question that arises is if in the process of working on it we get more clarity about what it fundamentally is, similar to a startup figuring out what they're doing along the way? It might still be worth the resources from a unknown unknown perspective and institutional practices shifting perspective if that makes sense? TAI is such a big thing and it will only happen once so spending those resources on relatively shaky foundations might still make sense?

I'm, however, not sure that this is the case and Wei Dai for example has an entire agenda about "metaphilosophy" where the claim is that we're too philosophically confused to make sense of alignment. In general, I would agree that ensuring the philosophical and mathematical basis is very important to coordinate the field and it is something I've been thinking about for a while. 

I personally am trying to import ideas from existing fields that deal with generally intelligent agents in biology and cognitive science such as Active Inference and Computational Biology into the mix to see how TAI will affect society. If we see smaller branches of science as specific offshoots of philosophy then I think the places with the most rigorous thinking on the foundations are the ones that have dealt with it for a long time. I've found a lot of interesting models about misalignment in these areas that I think can be transported into the AI Safety frame. 

I really appreciate the deconstructive approach that you have to the intellectual foundations of the field. I do believe that there are alternatives to the classic risk story but you have to some extent break down the flaws in the existing arguments in order to advocate for new arguments. 

Finally, where I think these threat models come from are arguments similar to the ones in What Failure Looks Like from Paul Christiano and the going out with a wimper idea. This is also explored in Yuval Noah Harari's books Nexus and Homo Deus. This threat model is more similar to the authoritian capture idea compared to something like a runaway intelligence explosion. 

I'm looking forward to more work in this area from you! 

Toby Tremlett🔹 @ 2024-12-16T12:20 (+12)

Thanks for sharing, David! And thank you for your continued dedication to providing constructive criticism. 

I've got a pretty long "to read" backlog, does anyone have recommendations for posts from reflective altruism which particularly changed their mind/ gave them action-relevant updates?

Also, question for David- is there a particular area of beliefs which you think the EA Forum community doesn't examine enough, which you'd like to see us hold a debate week about? I'm planning a bunch of debate weeks next year, so I'm always open to suggestions. 

David Thorstad @ 2024-12-16T13:31 (+4)

Thanks Toby!

Maybe biorisk or the time of perils could be good topics for a debate week?

Toby Tremlett🔹 @ 2024-12-16T13:32 (+8)

Thanks! Yep I'd be very interested in a biorisk week... I'll put your time of perils posts in my to-read list. 

SummaryBot @ 2024-12-16T13:01 (+1)

Executive summary: This blog update outlines recent and upcoming content for Reflective Altruism, which uses academic research to critically examine and improve the effective altruism movement, with particular focus on longtermism and existential risk claims.

Key points:

  1. New 2024 series challenge key EA concepts, including critiques of the singularity hypothesis, examining harms of existential risk mitigation, and analyzing the scope of longtermism.
  2. Author's academic work argues that existential risk claims are often exaggerated and that longtermism's scope may be narrower than commonly believed.
  3. Upcoming content will include series on "Beyond longtermism," positive aspects of EA ("Getting it right"), and critiques of AI power-seeking theorems.
  4. Blog posting frequency decreased from weekly to biweekly to maintain quality while balancing new job commitments.
  5. Content is delivered through multiple formats including academic papers, blog posts, talks, and podcasts to increase accessibility. 

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.