AI Safety Landscape & Strategic Gaps

By MichaelDickens @ 2025-09-17T23:02 (+93)

The content of this report was written by me, Michael Dickens, and does not represent the views of Rethink Priorities. The executive summary was prepared by Rethink Priorities; I edited the final revision, so any mistakes are my own.

Executive Summary

Why AI Landscape Research and Incubating New Projects Matters

TAI may be a decade or less away, and misalignment could be catastrophic. We need repeated fast, rigorous scans of the field—what exists, what works, where funding and talent are concentrated, and where the next dollar or FTE most reduces risk. To that end, Rethink Priorities commissioned a rapid landscape analysis to surface neglected, high-leverage opportunities. The report, authored by Michael Dickens (views his own), reviews 34 areas of interest and identifies 77 organizations in the relevant fields.

Key Findings

1. Implementation-focused policy advocacy and communication are neglected

Both AI-safety technical research and policy research only reduce risk when governments and AI companies actually adopt their recommendations. Adoption hinges on clear, persuasive communication and sustained advocacy that turn safety ideas into things like tangibly implementable legal clauses and enforcement mechanisms. Ultimately, that is the job of implementation-focused work including communications: turning promising ideas into action.

Yet funding, talent, and attention remain skewed toward technical research and policy research; quick estimates suggest safety implementation funding is less than 10% of that of all research. The result is comparatively few teams able to work on projects like converting technical ideas on safety into enforceable standards, or carrying bipartisan messages. Without this translation layer, legislation may end up delayed, weak, or industry-shaped.

Opportunities:

X-risk legislative & regulatory drafting: supply model clauses that embed safety clauses e.g. pre-deployment dangerous-capability evaluations, safety enforcement, and other kinds of measures that would help reduce x-risk. These measures must also factor in different kinds of challenges policymakers face, and create regulatory environments that are sufficiently flexible, agile and future‑proof.
Policymaker engagement at scale: brief national authorities, committees, and regulators with bipartisan message-tested materials; pair with newsroom toolkits to normalize safety-first framing.

2. Non-technical, short-timelines work is undervalued

Much current AI safety work is most useful given long timelines (>15 years).
Yet forecasts suggest a 25–75% chance of TAI within 5–10 years.
Underfunded activities:
- Bipartisan, de-polarizing work: leveraging polling showing strong support for responsible AI rules; packaging newsroom toolkits to emphasize dangers and avoid capability-salient hype.
- Concrete demos for decision-makers: staging eval-grounded capability/safety demos.
- Civic mobilization: coordinating peaceful, safety-focused actions and expert briefings (e.g. in media) around concrete policy windows.
- These interventions can be less “prestigious” than research but may have higher marginal impact if timelines are short.

3. Work on AI for animal welfare is neglected & high-leverage

As AI further shapes supply chains, R&D, wildlife management systems, and agriculture, it could reduce animal suffering or entrench/scale it. Trillions of animals are affected by industrial systems likely to be further optimized by AI; early evidence shows speciesist or harm-blind model behavior, and the literature on animal-inclusive AI remains thin compared to human-centric ethics/safety. Neglect here risks locking in large-scale suffering as AI-enabled practices scale. Even assuming an 80%+ chance that aligned AI will benefit animals, a 20% chance of neglect is catastrophic given the scale of animal suffering. Despite this, animal-inclusive AI receives <5% of AI safety resources. The topic is breaking into mainstream venues, but dedicated benchmarks, guardrails, and policy hooks remain sparse.

Opportunities:

Advocate for inclusion of animal-friendliness benchmarks in post-training (some frontier labs are already receptive).
Research short-timelines interventions that could lock in animal welfare protections.

Introduction

This report represents my [Michael Dickens'] beliefs, and not the beliefs of Rethink Priorities.

This report was prompted by two questions:

What are some things we can do to make transformative AI go well?
What are a few high-priority projects that deserve more attention?

I reviewed the AI safety landscape, starting by prioritizing to narrow my focus to areas that look particularly promising and feasible for me to review. I focused on two areas:

AI x-risk advocacy
Making transformative AI go well for animals

A summary of my reasoning on prioritization regarding AI misalignment risk:

I focused on AI policy advocacy over technical safety research, primarily because it's much more neglected, and there are other people with more expertise than me who already look for neglected research ideas. [More]
I focused on policy advocacy over policy research, again because it's particularly neglected. [More]
I considered the downsides of advocacy, which ultimately I don't believe are strong enough to outweigh the upsides. [More]
I decided not to spend much time evaluating which policies are best to advocate for, because a wide variety of policies could be helpful, and we need more advocacy in general. [More]

Regarding AI issues beyond misalignment:

Transformative AI may not go well for animals. There are some tractable interventions for improving post-TAI animal welfare. [More]
There are other important issues like digital sentience, AI-enabled coups, moral error, etc. But there is no visible path to solving these problems before transformative AI; and I find it quite difficult to weigh the importance of these issues, so I did not discuss them more than briefly. [More]

I created a list of projects within my two focus areas and identified four top project ideas (presented in no particular order):

Talk to policy-makers about AI x-risk
Write AI x-risk legislation
Advocate to change AI training to make LLMs more animal-friendly
Develop new plans / evaluate existing plans to improve post-TAI animal welfare

I also included honorable mentions and a longer list of other project ideas. For each idea, I provide a theory of change, list which orgs are already working on it (if any), and give some pros and cons.

Finally, I list a few areas for future work.

There are two external supplements on Google Docs:

Appendix: Some miscellaneous topics that were relevant, but not quite relevant enough to include in the main text.
List of relevant organizations: A reference list of orgs doing work in AI-for-animals or AI policy/advocacy, with brief descriptions of their activities.

Prelude

I was commissioned by Rethink Priorities to do a broad review of the AI safety/governance landscape and find some neglected interventions. Instead of doing that, I reviewed the landscape of just two areas within AI safety:

AI x-risk advocacy
Making transformative AI go well for animals

I narrowed my focus in the interest of time, and because I believed I had the best chance of identifying promising interventions within those two fields.

There is a tradeoff between (a) giving recommendations that are easy to agree with, but weak; (b) giving strong recommendations that only make sense if you hold certain idiosyncratic beliefs. This report leans more toward (b), making some strong assumptions and building recommendations off of those, although I tried to avoid making assumptions whenever I could do so without weakening the conclusions. I also tried to be clear about what assumptions I'm making.

This report is broad, but I only spent three months writing it. There are some topics in this report that could have been a PhD dissertation, but instead, I spent an hour on them.

Most of this report is about AI policy, but I don't have a background in policy. I did speak to a number of people who work in policy, and I read a lot of published materials, but I lack personal experience, and I expect that there are important things happening in AI policy that I don't know about.

Some positions I'm going to take as given

The following premises would probably be controversial with some audiences, but I expect them to be uncontroversial for the readers of this report, so I will treat them as background assumptions.

Effective altruist principles are correct (e.g. cost-effectiveness matters; you can, in principle, quantify the expected value of an intervention).
Animal welfare matters.
Digital minds can matter.
AI misalignment is a serious problem that could cause human extinction.

Definitions

The terms AGI/ASI/TAI can often be used interchangeably, but in some cases the distinctions matter:

AGI = human-level AI: Capable enough to match the economic output of a large percentage of humans (say, at least half).
ASI = superintelligent AI: Smart enough to vastly outperform humans on every task.
TAI = transformative AI: Smart enough to radically transform society (without making a claim about whether that happens at AGI-level or ASI-level or in between).

I use the terms "legislation" and "regulation" largely interchangeably. For my purposes, I don't need to draw a distinction between government mandates that are directly written into law vs. decreed by a regulatory body.

Prioritization

For this report, I focused on AI risk advocacy and on post-TAI animal welfare, and I did not spend much time on other AI-related issues.

For the sake of time-efficiency, rather than creating a big list of ideas in the full AI space, I first narrowed down to the regions within the AI space that I thought were most promising and then came up with a list of ideas in those regions. I could have spent time investigating (say) alignment research, but I doubt I would have ended up recommending any alignment research project ideas.

In this section:

Why not technical safety research?
Why not policy research?
Downsides of AI policy/advocacy (and why they're not too big)
What kinds of policies would be good?
Maybe prioritizing post-TAI animal welfare
Why not prioritize digital minds / S-risks / moral error / AI misuse x-risk / gradual disempowerment?

Why not technical safety research?

Technical safety research (mainly alignment research, but also including control, interpretability, monitoring, etc.) is considerably better-funded than AI safety policy.

AI companies invest a significant amount into safety research. They also invest in policy, but their investments are mostly counterproductive (they are mostly advocating against safety regulations^[1]). Philanthropic funders invest a lot into technical research, and less into policy.

I have not made a serious attempt to estimate the volume of work going into research vs. policy/advocacy, but my sense is that the former receives much more funding.^[2]

Some sub-fields within technical research may be underfunded. But:

I am not in a great position to figure out what those are.
There are already many grantmakers who seek out neglected technical research directions. There are recent requests for proposals (RFPs) from UK AI Security Institute, Open Philanthropy, Future of Life Institute, and Frontier Model Forum's AI Safety Fund, among others.

Why not AI policy research?

Is it better to do policy research (figure out what policies are good) or policy advocacy (try to get policies implemented)?

Both are necessary, but this article focuses on policy advocacy for the following reasons:

There is much more money in AI policy research. By a large margin, most of what's happening in AI safety policy could be described as "research".
- Recently, Jason Green-Lowe estimated from LinkedIn data that there are 3x as many governance researchers as governance advocates.
Policy research is a necessary step in the funnel. We also need people writing legislation and people advocating for the legislation, both of which we have very little of.
At this point, we have at least some idea of how to implement AI safety regulations. More research would be valuable, but it likely has diminishing returns.
- I wrote more about this last year in Slow nuanced regulation vs. fast coarse regulation.
Research works best with long timelines. Timelines are probably not long.
- The ideal situation is to spend 10–20 years developing a field of AI policy, write many reports until a consensus slowly develops about how to govern AI development, then advocate for the consensus policies. But it is likely that by the time we have a reasonable consensus, it's already too late to do anything about TAI.
- Even if you think timelines are probably long, we are currently under-investing in activities that pay off given short timelines.

Downsides of AI policy/advocacy (and why they're not too big)

Basically, policy work does one or both of these things:

Legally enforce AI safety measures
Slow down AI development

People who care about x-risk broadly agree that AI safety regulations can be good, although there's some disagreement about how to write good regulations.

The biggest objection to regulation is that it (often) causes AI development to slow down. People usually don't object to easy-to-satisfy regulations; they object to regulations that will impede progress.

So, is slowing down AI worth the cost?

I am aware of two good arguments against slowing down AI (or imposing regulations that de facto slow down AI):

Opportunity cost – we need AI to bring technological advances (e.g., medical advances to reduce mortality and health risks)
AI could prevent non-AI-related x-risks

Two additional arguments against AI policy advocacy:

Meaningful AI regulations are not politically feasible to implement
Advocacy can backfire

An additional argument that applies to some types of advocacy but not others:

Advocacy may slow down safer actors without slowing down more reckless actors. For example, it is sometimes argued that US regulations are bad if they allow Chinese developers to gain the lead. I believe this outcome is avoidable—and it's a good reason to prefer global cooperation over national or local regulations. But I can't address this argument concisely, so I will just acknowledge it without further discussion.

(For a longer list of arguments, with responses that are probably better-written than mine, see Katja Grace's Let’s think about slowing down AI.)

Regarding the opportunity cost argument, it makes sense if you think AI does not pose a meaningful existential risk or if you heavily discount future generations. If there is a significant probability that future generations are ~equally valuable to current generations, then the opportunity cost argument does not work. The opportunity cost of delaying AI by (say) a few decades is easily dwarfed by the risk of extinction.

As to the non-AI x-risk argument, it is broadly (although not universally) accepted among people in the x-risk space that AI x-risk is 1–2 orders of magnitude higher than total x-risk from other sources (see Michael Aird's database of x-risk estimates or the Existential Risk Persuasion Tournament, although I don't put much weight on individual forecasts). Therefore, delaying AI development seems preferable as long as it buys us a meaningful reduction in AI risk.

See Quantitative model on AI x-risk vs. other x-risks under Future Work.

The tractability argument seems more concerning. Preventing AI extinction via technical research and preventing it via policy both seem unlikely to work. But I am more optimistic about policy because:

We have a good enough understanding of AI alignment to say with decent confidence that we're nowhere close to solving it. It's less clear what it would take to get good regulations put in place.
AI regulations are unpopular both in the US Congress and within the Trump administration, but popular among the general public. Popular support increases the feasibility of getting regulations passed.
- And there is a good chance that Congress will be more regulation-friendly after the 2026 Congressional elections.
SB-1047 nearly got passed into law, making it through the California legislature and only failing due to veto. A near-win suggests that a win isn't far away in possibility-space.

(My reasoning on tractability focused on US policy because that's where most of the top AI companies operate. The UK seems to be the current leader on AI policy, although it's not clear to what extent UK regulations matter for x-risk.^[3])

That leaves the backfire argument. This is a real concern, but ultimately it's a risk you have to take at some point because you can't get policies passed if you don't advocate for them. It could make sense to delay advocacy if one has good reason to believe that future advocacy is less likely to backfire; to my knowledge this is not a common position, and it's more common for people to oppose advocacy unconditionally. For more on this topic, see Appendix: When is the right time for advocacy?

Also: I'm somewhat less concerned about this than many people. When I did research on protest outcomes, I found that peaceful protests increased public support, even though many people intuitively expect the opposite. Protests aren't the only type of advocacy, but it's a particularly controversial type of advocacy. If protests don't backfire, then it stands to reason—although I have no direct evidence—that other, tamer forms of advocacy are unlikely to backfire.

(There is some evidence that violent protests backfire, however.)

If policy-maker advocacy is similar to public advocacy, then probably the competence bar is not as high as many people think it is. Perhaps policy-makers are more discerning/critical than the general public; on the other hand, it's specifically their job to do what their constituents want, so it stands to reason that it's a good idea to tell them what you want.

My main concern comes from deference: some people whom I respect believe that advocacy backfires by default. I don't understand why they believe that, so I may be missing something important.

I do believe that much AI risk advocacy has backfired in the past, but I believe this was fairly predictable and avoidable. Specifically, talking about the importance of AI has historically encouraged people to build it, which increased x-risk. People should not advocate for AI being a big deal; they should advocate for AI being risky. (Which it is.) See Advocacy should emphasize x-risk and misalignment risk.

What kinds of policies might reduce AI x-risk?

There are many policies that could help. And many policy ideas are independent: we could have safety testing requirements AND frontier-model training restrictions AND on-chip monitoring AND export controls. Those could all be part of the same bill or separate bills.

It's beyond the scope of this report to come up with specific policy recommendations. I do, however, have some things I would like to see in policy proposals:

They should be relevant to existential risk, especially misalignment risk.
I would like to see work on policies that would help in the event of a global moratorium on frontier AI development (e.g. we'd need ways to enforce the moratorium).
For an ideal policy proposal, it is possible to draw a causal arrow from "this regulation gets passed" to "we survive". Policies don't have to singlehandedly prevent extinction, but given that we may only have a few years before AGI, I believe we should be seriously trying to draft bills that are sufficient on their own to avert extinction.

(The closest thing I've seen to #3 is Barnett & Scher's AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions. It does not propose a set of policies that would (plausibly) prevent extinction, but it does propose a list of research questions that (may) need to be answered to get us there.)

Some people are concerned about passing suboptimal legislation. I'm not overly concerned about this because the law changes all the time. If you pass some legislation that turns out to be less useful than expected, you can pass more legislation. For example, the first environmental protections were weak, and later regulations strengthened them.

Regulations could create momentum, or they could create "regulation fatigue". I did a brief literature review of historical examples, and my impression was that weak regulation begets strong regulation more often than not, but there are examples in both directions. See my reading notes.

Some AI policy ideas I like

I believe that a moratorium on frontier AI development is the best outcome for preventing x-risk (see Appendix for an explanation of why I believe that). None of my top project ideas depend on this belief, although it would inform the details of how I'd like to see some of those project ideas implemented.

I didn't specifically do research on policy ideas while writing this report, but I did incidentally come across a few ideas that I'd like to see get more attention. Since I didn't put meaningful thought into them, I will simply list them here.

Operationalization of "pause frontier AI development until we can make it safe." For example, what infrastructure and operations are required to enforce a pause?
Rules about when to enforce a pause on frontier AI development: something like "When warning sign X occurs, companies are required to stop training bigger AI systems until they implement mitigations Y/Z."
Ban recursively self-improving AI.
- Recursive self-improvement is the main way that AI capabilities could rapidly grow out of control, but banning it does not impede progress in the way that most people care about.
- Some work needs to be done to operationalize this, but we shouldn't let the perfect be the enemy of the good.
Require companies to publish binding safety policies (e.g. responsible scaling policies [RSPs] or similar). That is, if a company's policy says it will do something, then that constitutes a legally binding promise.
- This would prevent the situation we have seen in the past, where, when a company fails to live up to a particular self-imposed requirement, it simply edits its safety policy to remove that requirement.
- This sort of regulation isn't strong enough to prevent extinction, but it has the advantage that it should be easy to advocate for.
- California bill SB 53 says something like this (see sb53.info for a summary), but its rules would not fully come into effect until 2030.

Maybe prioritizing post-TAI animal welfare

Making TAI go well for animals is probably less important than x-risk because:

Almost everyone cares about animals. An AI that's aligned to human values would also care about animals, and it would probably figure out ways to prevent large sources of animal suffering like factory farming.
A technologically advanced civilization could develop cheaper alternatives to animal farming (e.g. cultured meat), rendering factory farming unnecessary.
There wouldn't be much benefit in spreading wild animal suffering, so it stands to reason that post-TAI civilization won't do it. (Although I'm not at all confident about this.)

However, AI-for-animals could still be highly cost-effective.

An extremely basic case for cost-effectiveness:

There's a (say) 80% chance that an aligned(-to-humans) AI will be good for animals, but that still leaves a 20% chance of a bad outcome.
AI-for-animals receives much less than 20% as much funding as AI safety.
Cost-effectiveness maybe scales with the inverse of the amount invested. Therefore, AI-for-animals interventions are more cost-effective on the margin than AI safety.

Why not prioritize digital minds / S-risks / moral error / better futures / AI misuse x-risk / gradual disempowerment?

There are some topics on how to make the future go well that aren't specifically about AI alignment:

Ensuring digital minds have good welfare
Preventing S-risks — risks of astronomical suffering
Moral error — the risk that we make a big mistake because we are wrong about what's morally right
Better futures — ensuring that the future is as good as possible, as opposed to simply preventing bad outcomes
Preventing powerful AI from being misused to cause existential harm
Preventing powerful AI from gradually disempowering sentient beings and slowly leading to a bad outcome, as opposed to a sudden bad outcome like extinction

Call these "non-alignment risks".

Originally, I included each of these as separate project ideas, but I decided not to focus on any of them. This decision deserves much more attention than I gave it, but I will briefly explain why I did not spend much time on non-alignment risks.

All of these cause areas are extremely important and neglected (more neglected than AI misalignment risk), and (for the most part) very different from each other. And I am happy for the people who are working on them—there are some enormous issues in this space where only one person in the world is working on it. Nonetheless, I did not prioritize them.

My concern is that, if AI timelines are short, then there is virtually no chance that we can solve these problems before TAI arrives.

There is a dilemma:

If TAI can help us solve these problems, then there isn't much benefit in working on them now.
If we can't rely on TAI to help solve them (e.g. we expect value lock-in), then we have little hope of solving them in time.

(There is a way out of this dilemma: perhaps AI timelines are long enough that these problems are tractable, but short enough that we need to start working on them now—we can't wait until it becomes apparent that timelines are long. That seems unlikely because it's rather specific, but I didn't give much thought to this possibility.)

It looks like we have only two reasonable options for handling AI welfare / S-risks / moral error / etc.:

Increase the probability that we end up in world #1, where TAI can help us solve these problems—for example, by increasing the probability that something like a Long Reflection happens.
Slow down AI development.

I lean toward the second option. For more reasoning on this, see Appendix: Slowing down is a general-purpose solution to every non-alignment problem.

I'm quite uncertain about the decision not to focus on these cause areas. They are arguably as important as AI alignment, and much more neglected.

AI-for-animal-welfare could also be included on my list of non-alignment risks, but I did prioritize it because I can see some potentially tractable interventions in the space.

AI welfare seems more tractable than animal welfare in that AI companies care more about it, but it seems less tractable because it involves extremely difficult problems like "when are digital minds conscious?" There may be some tractable, short-timelines-compatible ideas out there, but I did not see any in the research agendas I read.

Perhaps I could identify tractable interventions by digging deeper into the space and maybe doing some original research, but that was out of scope for this article.

Who's working on them?

In each of my project idea sections, I included a list of orgs working on that idea (if any). I didn't write individual project ideas for non-alignment risks (other than AI-for-animals), but I still wanted to include lists of relevant orgs, so I've put them below.

There are also some individual researchers who have published articles on these topics in the past; I will not include those.

AI welfare / digital minds: Anthropic; Center on Long-Term Risk; Digital Sentience Consortium; Eleos AI; NYU Center for Mind, Ethics, and Policy Sentient Futures; Sentience Institute.
AI misuse x-risks: Forethought; probably a number of others, but I didn't spend time specifically looking for them. (AI misuse is a relatively popular subject matter, but extinction-level misuse isn't much discussed.)
Better futures: Forethought.
Gradual disempowerment: To my knowledge, no orgs specifically work on this, but there is the Gradual Disempowerment paper written by authors with various affiliations.
Moral error: Center on Long-Term Risk; Forethought; Global Priorities Institute (now defunct as of just before this writing).
S-risks from cooperation failure: Center on Long-Term Risk.

Some relevant research agendas

Although I decided not to prioritize this space, others have done work on preparing research agendas, which readers may be interested in. Here, I include a list of research agendas (or problem overviews, which can inform research agendas) with no added commentary.

Anthony DiGiovanni – Clarifying “wisdom”: Foundational topics for aligned AIs to prioritize before irreversible decisions
Center on Long-Term Risk – Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda
Chi Nguyen – AI things that are perhaps as important as human-controlled AI
Digital Sentience Consortium – Applied work on digital sentience and society
Eleos AI – Research priorities for AI welfare
Forethought – AI-Enabled Coups: How a Small Group Could Use AI to Seize Power
Kevin Xia – Transformative AI and Animals: Animal Advocacy Under A Post-Work Society
Raymond Douglas – Gradual Disempowerment: Concrete Research Projects
Will MacAskill – Better Futures
Will MacAskill – Moral error as an existential risk

General recommendations

Advocacy should emphasize x-risk and misalignment risk

I would like to make two assertions:

AI x-risk is more important than non-existential AI risks.
Advocates should say that.

Given weak longtermism, or even significant credence to weak longtermism on a moral-uncertainty system, x-risks dwarf non-existential AI risks in importance (except perhaps for S-risks, which are a whole can of worms that I won't get into in this section). See Bostrom's Existential Risk Prevention As Global Priority. Risks like "AI causes widespread unemployment" are bad, but given the fact that we have to triage, extinction risks should take priority over them.

(To my knowledge, people advocating for focusing on non-existential AI risks have never provided supporting cost-effectiveness estimates. I don't think such an estimate would give a favorable result. If you strongly discount AI x-risk/longtermism, then most likely you should be focusing on farm animal welfare (or similar), not AI risk.)

Historically, raising concerns about ASI has caused people to take harmful actions like:

I need to be the one who builds ASI before anyone else, I think I'll start a new frontier AI company.
AI is a big deal, so we need to race China.

I don't have a straightforward solution to this. You can't reduce x-risk by doing nothing, but if you do something, there's a risk that it backfires.

My best answer is that advocacy should emphasize misalignment risk and extinction risk. Many harmful actions were committed with the premise "TAI is dangerous if someone else builds it, but safe if I build it." When in fact it is dangerous, no matter who builds it. "If anyone builds it, everyone dies" is more the correct sort of message.

Misalignment isn't the only way AI could cause extinction, although it does seem to be the most likely way. I believe advocacy should focus on misalignment risk not only because it's the most concerning risk, but also it has historically been under-emphasized in favor of other risks (if you read Congressional testimonies by AI risk orgs, they mention other risks but rarely mention misalignment risk), and it is (in my estimation) less likely to backfire.

Many advocates are concerned that x-risk and misalignment risk sound too "out there". Two reasons why I believe advocates should talk about them:

All else equal, it's better to say what you believe and ask for what you want. It's too easy to come up with galaxy-brained reasons why you will get what you want by not talking about what you want.
Emphasizing the less important risks is more likely to backfire by increasing x-risk.
There is precedent for talking about x-risk without being seen as too weird, for example, the CAIS Statement on AI Risk. If you're worried about x-risk, you're in good company.

Nate Soares says more about this in A case for courage, when speaking of AI danger.

Prioritize work that pays off if timelines are short

There is a strong possibility (25–75% chance) of transformative AI within 5 years. 80,000 Hours reviews forecasts and predicts AGI by 2030; Metaculus predicts 50% chance of AGI by 2032; AI company CEOs have predicted 2025–2035 (see Appendix, When do AI company CEOs expect advanced AI to arrive?); AI 2027 team predicts 2030ish (their scenario has AGI arriving in 2028, but that's their modal prediction, not median).

The large majority of today's AI safety efforts work best if timelines are long (2+ decades). Short-timelines work is neglected. It would be neglected even if there were only (say) a 10% chance of short timelines, but the probability is higher than that.

For example, that means there should be less academia-style long-horizon research, and more focus on activities that have a good chance of bearing fruit quickly.^[4]

Top project ideas

After collecting a list of project ideas, I identified four that look particularly promising (at least given the limited scope of my investigation). This section presents the four ideas in no particular order.

Talk to policy-makers about AI x-risk

The way to get x-risk-reducing regulations passed is to get policy-makers on board with the idea. The way to get them on board is to talk to them. Therefore, we should talk to them.

Talking to them may entail advocating for specific legislative proposals, or it may just entail raising general concern for AI x-risk.

Theory of change:

Increases the chance that safety legislation gets passed or regulations get put in place. Likely also increases the chance of an international treaty.

Who's working on it?

Center for AI Safety / CAIS Action Fund (US); Control AI (UK/US); Encode AI (US/global); Good Ancestors (Australia); Machine Intelligence Research Institute (US/global); Palisade Research (US); PauseAI US (US).

(That's not as many as it sounds like because some of these orgs have one or fewer full-time-employee-equivalents talking to policy-makers.)

Various other groups do political advocacy on AI risk, but mainly on sub-existential risks. The list above only includes orgs that I know have done advocacy on existential risk specifically.

Pros:

Political advocacy is neglected compared to policy research.
You can advocate to the public or directly to policy-makers. Both can help, but talking to policy-makers is more "leverage-efficient" than public outreach because policy-makers have much more leverage over policy.
- According to my back-of-the-envelope calculation on policy-maker advocacy vs. public protests (see Appendix), policy-maker advocacy looks more cost-effective (although I was writing on the back of a very small envelope, so to speak).
- Policy-maker advocacy can be bottlenecked on public support—they don't want to support policies that their constituents dislike—but this isn't a problem because AI safety regulation is popular among the general public.
More advocacy is better: bringing up AI risk repeatedly makes it more likely that policy-makers will take notice.

Cons:

Poorly executed advocacy has a risk of turning off policy-makers.
- See Downsides of policy/advocacy (and why they're not too big), specifically the fourth downside.
Now may be too early. See Appendix: When is the right time for advocacy?

Some comments on political advocacy:

According to the book Lobbying and Policy Change: Who Wins, Who Loses, and Why (2009), most factors could not predict whether lobbying efforts would succeed or fail. One of the best predictors of lobbying success was the number of employed lobbyists who previously worked as government policy-makers. This suggests that political advocacy orgs should try to hire former policy-makers/staffers.
It is possible to hire generalist lobbyists who have political experience and will lobby for any cause. AI risk orgs could hire them.
Advocacy in the US or China is best because those are the countries with by far the most advanced AI. Ideally, we also want good policies in China, but I can't confidently recommend advocacy there, see Policy/advocacy in China.
- Advocacy in the UK has had the most success. UK AI policy is less directly relevant because there are no frontier AI companies based in the UK, but (1) companies still care about being able to operate in the UK and (2) UK policy can be a template for other countries or for international agreements.2
- Advocacy in California looks promising, and all American frontier AI companies are based in California.
There's a question as to what policy positions we should advocate for, but I believe there are many correct answers. See What kinds of policies might reduce AI x-risk?

Write AI x-risk legislation

AI policy research is relatively well-funded, but little work has been done to convert the results of this research into fully fleshed-out bills. Writers can learn from AI policy researchers what sorts of regulation might work, and learn from advocates what regulations they want and what they expect policy-makers to support.

This work also requires prioritizing which policy proposals look most promising and converting those into draft legislation. I think of that as part of the same work, but it could also be separate—for example, a team of AI safety researchers could prioritize policy proposals and sketch out legislation, and a separate team of legal experts (who don't even necessarily need to know anything about AI) can convert those sketches into usable text.

Theory of change:

Many policy-makers care about AI risk and would support legislation, but there's a big difference between "would support legislation" and "would personally draft legislation". To get AI legislation passed, it helps if the legislation is already written. Instead of telling policy-makers

AI x-risk is a big deal, please write some legislation.

it's a much easier ask if you can say

AI x-risk is a big deal, here is a bill that I already wrote, would you be interested in sponsoring it?

Who's working on it?

Center for AI Policy (now mostly defunct); Center for AI Safety / CAIS Action Fund; Encode AI.

Others may be working on it as well, since legislation often doesn't get published. I spoke to someone who has been involved in writing AI risk legislation, and they said that few people are working on this, so I don't think the full list is much longer than the names I have.

(My contact also said that they wished more people were writing legislation.)

See my notes for a list of AI safety bills that have been introduced in the US, UK, and EU. Most of those bills were written by legislators, not by nonprofits, as far as I can tell.

Pros:

It's common for policy-makers to sponsor bills that were written by third parties.
A small number of writers could draft a (relatively) large volume of legislation by leaning on pre-existing research.
There are lawyers who specialize in writing legislation. You can hire them to do the bulk of the work (you don't need value alignment or even much skill at hiring, just find a law firm with a good reputation).
The general idea of "write AI legislation" looks good under many beliefs about AI risk. But your beliefs will impact what kinds of legislation you want.

Cons:

The natural argument against writing legislation is that it's too early and we don't know how to regulate AI yet. (Related from the Appendix: When is the right time for advocacy?)
- That's sort of true, but if timelines are short, then we don't have time to wait; we have to just do our best.
- And I don't think this concern is fatal: we do have some concrete ideas about how to regulate AI, so we can write legislation for those.
- And it's still a good idea to get some regulations in place now, and then we can pass new regulations later as necessary. That's how regulation in nascent industries often works.

Advocate to change AI training to make LLMs more animal-friendly

LLMs undergo post-training to make their outputs satisfy AI companies' criteria. For example, Anthropic post-trains its models to be "helpful, honest, and harmless". AI companies could use the same process to make LLMs give regard to animal welfare.

Animal advocates could use a few strategies to make this happen, for example:

Build a benchmark that measures LLMs' friendliness toward animals and try to get AI companies to train on that benchmark.
Advocate for AI companies to include animal welfare in AI constitutions/model specs.
Advocate for AI companies to incorporate animal welfare when doing RLHF, or ask to directly participate in RLHF.

Theory of change:

Insofar as the current alignment paradigm works at aligning AIs to human preferences, incorporating animal welfare into post-training would align LLMs to animal welfare in the same way.

Who's working on it?

Compassion in Machine Learning (CaML); Sentient Futures. (They mainly do research to develop animal-friendliness benchmarks and other related projects, but they have also worked with AI companies.)

For some useful background, see Road to AnimalHarmBench by Artūrs Kaņepājs and Constance Li.

Pros:

People at AI companies have told me that getting a company to pay attention to animal welfare isn't too difficult—in fact, one frontier company already uses an animal welfare benchmark.
- As I understand, AI companies don't want to be seen as imposing their own values on LLMs, but they are open to tuning the values based on what external parties want.
Insofar as post-training works at preventing misalignment risk, it should also prevent suffering-risk / animal-welfare-risk.
- I don't expect current alignment techniques to continue working on superintelligent AI, so I don't expect them to make ASI friendly toward animals, either. But if I'm right, then we won't get a friendly-to-humans AI that causes astronomical animal suffering; we will get a paperclip maximizer.
Even if current known techniques can't help get AI to care about animals, this work could get a foot in the door, establishing relationships between animal advocates and AI companies and increasing the chances that the companies will pay attention to animal welfare in their future work.

Cons:

If I'm right that the current alignment paradigm won't scale to superintelligence, then animal-friendliness (post-)training will fail because it relies on the same foundations as the current alignment paradigm.
It might turn out to be difficult to get AI companies to implement animal welfare mitigations.
There might be consumer backlash, which could make frontier models less friendly to animals in the long run.
If TAI arrives soon, there may not be time for this intervention to have an effect. It could take too long to get the new post-training implemented; or it could be that current-gen models will perform well on friendliness-to-animals benchmarks, but this will not be due to true alignment, and there won't be enough time to iterate.
Aligning current-gen AIs to human preferences might make them better at assisting with alignment research, but it seems less likely that aligning current-gen AIs to animal welfare would carry through to future generations—it's not clear that animal-aligned AIs would be more helpful at aligning future AIs to animal welfare.

Develop new plans / evaluate existing plans to improve post-TAI animal welfare

Some people have proposed plans for making TAI go well for animals, but I have reservations:

Most plans only work under long timelines (ex: "broadly influence society to care more about animals").
Some plans focus specifically on farm animal welfare, and it seems very unlikely that factory farming will continue to exist in the long term (see Using TAI to improve farm animal welfare).
Many plans assume a particular future in which we develop transformative AI, but the world does not radically change—for example, plans about how TAI can help animal activists be more effective. I think this future is quite unlikely, and even if it does occur, there's no particular need to figure out what to do in advance.

(Lizka Vaintrob and Ben West in A shallow review of what transformative AI means for animal welfare raised essentially the same reservations. See their article for more detailed reasoning on this topic.)

In light of these reservations, I would like to see research on post-TAI animal welfare interventions that look good (1) given short timelines and (2) without having to make strong predictions about what the future will look like for animals (e.g. without assuming that factory farming will exist).

Since I have pressed the importance of short timelines, I'm not envisioning a long-term research project. I expect it would be possible to come up with useful results in 3–6 months (maybe even less). The research should be laser-focused on finding near-term actions that take a few years at most, but still have a good chance of making the post-TAI future better for animals.

(I identified Advocate to change AI training to make LLMs more animal-friendly as potentially a top intervention after about a week of research, although to be fair, that's mostly because I talked to other people who had done more research than me.)

Theory of change:

AI-for-animals interventions are underexplored. I expect that a few months of well-targeted research could turn up useful information about how to make AI go well for animals.

Who's working on it?

Center on Long-Term Risk, Sentience Institute, and some individuals have written project proposals on AI-for-animals, but they almost always hinge on AI timelines being long. The closest thing I'm aware of is Max Taylor's Bringing about animal-inclusive AI, which does include short-timelines proposals, but they are not directly actionable. For example, one idea is "representation of animals in AI decision-making", which is an action an AI company could take, but AI companies are not the relevant actors. An actionable project would be something like "a nonprofit uses its connections at an AI company to persuade/pressure the company to include representation of animals in its AI decision-making".

NYU Center for Mind, Ethics, and Policy and Sentient Futures have done similar work, but nothing exactly like this project proposal. I expect they could do a good job at identifying/prioritizing AI-for-animals interventions that fit my criteria. I would be excited to see a follow-up to Max Taylor's Bringing about animal-inclusive AI focused on converting his ideas into actionable projects.

Pros:

AI-for-animals seems more tractable than other post-TAI welfare causes (e.g. AI welfare or S-risks from cooperation failure). There are already proposed interventions that could work if timelines are short.
A short research project may come up with useful ideas, or at least prioritize between pre-existing ideas.

Cons:

A research project might not come up with any really good ideas. Pre-existing research has mostly failed to come up with good ideas that work under short timelines (although to a large extent, that's because they weren't trying to).
I'm suspicious of "meta" work in general, and I'm suspicious of research because I personally like doing research, and I believe the value of research is usually overrated by researchers. It might be better to work directly on AI-for-animals—my current favorite "direct" project idea is Advocate to change AI training to make LLMs more animal-friendly or similar.

Honorable mentions

Directly push for an international AI treaty

The best kind of AI regulation is the kind that every country agrees to (or at least every country that has near-frontier AI technology).

If we need an international treaty to ensure that nobody builds a misaligned AI, then an obvious thing to do is to talk directly to national leaders about how we need an international treaty.

Theory of change:

An internationally-agreed moratorium on advanced AI would straightforwardly prevent advanced AI from killing everyone or otherwise destroying most of the value of the future.

One way to get an international treaty is to talk to governments and tell them you think they should sign an international treaty.

Who's working on it?

Global AI Risks Initiative. There are other orgs that are doing work with the ultimate goal of an international treaty, but to my knowledge, they're not directly pushing for a treaty. Those other orgs include: Future of Life Institute; Machine Intelligence Research Institute; PauseAI Global; PauseAI US; and Safe AI Forum.

Pros:

There is a short causal chain from "advocate for an international treaty" to "ASI doesn't kill everyone".
It's high-leverage—you only need to get a relatively small set of people on board.

Cons:

There is not much political will for an international treaty, especially a strong one. Public advocacy and smaller-scale political advocacy seem better for that reason, at least for now.
This idea only works if you can figure out who will do a good job at pushing for an international treaty. I think it's more difficult than generic public advocacy or talking to policy-makers about x-risk.

This might be one of my top ideas if I knew how to do it, and I wish I could put it on my top-ideas list, but I don't know how to do it.

Organize a voluntary commitment by AI scientists not to build advanced AI

I heard this idea from Toby Ord on the 80,00 Hours podcast #219. He said,

If AI kills us and we end up standing front of St. Peter, and he asks, "Well did you try a voluntary agreement not to build it?" And we said, "No, we thought it wouldn't work", that's not a good look for us.

At the 1975 Asilomar Conference, the international community of biologists voluntarily agreed not to conduct dangerous experiments on recombinant DNA. Perhaps something similar could work for TAI. The dangers of advanced AI are widely recognized among top AI researchers; it may be possible to organize an agreement not to work on powerful AI systems.

(There are some details to be worked out as to exactly what sort of work qualifies as dangerous. As with my stance on what kinds of policies would be helpful, I believe there are many agreements we could reach that would be better than the status quo. I expect leading AI researchers can collectively work out an operationalization that's better than nothing.)

Theory of change:

If leading AI scientists all agree not to build advanced AI, then it does not get built. The question is whether a non-binding commitment will work. There have been similar successes in the past, especially in genetics with voluntary moratoriums on human cloning and human genetic engineering.

Who's working on it?

Toby Ord has raised this idea. According to a personal communication, he did some research on its plausibility, but he is not actively working on it.

The CAIS Statement on AI Risk and FLI Pause Letter are related but weaker.

FLI organized the 2017 Asilomar Conference on Beneficial AI, but to my knowledge, the goal was not to make any commitments regarding AI safety.

Pros:

Unlike most project ideas, if this one succeeds, x-risk will immediately go down by multiple percentage points.
A voluntary commitment may be sufficient to prevent extinction, and it may be easier to achieve than a legally-mandated moratorium or strict regulations.
Talking to politicians or pushing for regulation has the problem that you'd really rather get regulations in all countries simultaneously. Researchers are (I think) less prone to inter-country adversarialism than nations' leaders, especially between the West and China—American and Chinese scientists collaborate often.
You don't need everyone to sign. If (say) Nobel Prize winners sign the agreement, it raises questions about why (say) the head of ML at OpenBrain hasn't signed.

Cons:

In When is the right time for advocacy?, I argued that now is the right time. But a voluntary moratorium seems more likely than other ideas to fail if done at a suboptimal time because you need to get a ~majority on board.
- It's not clear whether a failed attempt will decrease the probability of success for subsequent attempts. For example, there are many historical instances where a bill failed to pass, and then a very similar bill got passed later. But it's not clear that this sort of voluntary agreement works the same way.
Voluntary commitments are easily violated. For example, biologists agreed not to do research on human cloning, but then a few rogue scientists (Richard Seed, etc.) did it anyway.
- But a voluntary agreement can create strong social pressure not to violate it.
AI researchers have a vested interest in being able to do AI research; policy-makers do not.
- But the dangers of advanced AI are much better understood among AI researchers than among policy-makers.
It is doubtful that AI company CEOs will agree to a moratorium.
- But if you get the majority of leading AI scientists to agree, CEOs will be left with insufficient talent to lead their research.

Peaceful protests

Organize peaceful protests to raise public concern and salience regarding AI risk. Historically, protests have asked for a pause on AI development, although that might not be the only reasonable ask.

(I don't have any other specific asks in mind. The advantage of "pause" is that it's a simple message that fits on a picket sign.)

Theory of change:

Protests may increase public support and salience via reaching people in person or via media (news reporting, etc.). They may also provide a signal to policy-makers about what their constituents want.

Who's working on it?

PauseAI Global; PauseAI US.

Stop AI organizes disruptive protests (e.g. blockading AI company offices), and the evidence is ambiguous as to whether disruptive protests work. See "When Are Social Protests Effective?" (Shuman et al. 2024), although I should note that I think the authors overstate the strength of evidence for their claims—see my notes on the paper.

Pros:

Natural experiments suggest that protests are effective at changing voter behavior and/or increasing voter turnout.
- Experiments also find that peaceful protests increase support in a lab setting—see "Social Movement Strategy (Nonviolent Versus Violent) and the Garnering of Third-Party Support: A Meta-Analysis" (Orazani et al. 2021). For a summary, see my notes on the paper.
Protesting is widely seen as the thing you do when you are concerned about an issue. Many people take not-protesting as a sign that you aren't serious. It's valuable to be able to say "yes, we are taking AI risk seriously, you can tell because we are staging protests". Regardless of the cost-effectiveness of marginal protesters, it's good for there to be nonzero protests happening.
Protests make it clear to policy-makers that their constituents care about an issue. This is especially important for AI because the general public is very worried about AI (see e.g. 2025 Pew poll), but the issue is not high-salience (see 2023 YouGov poll: respondents were worried about AI extinction risk, but it only ranked as the #6 most concerning x-risk). Protests increase its salience.
Protests are an effective way to get media attention. It's common for protests with only a dozen participants to get news coverage.
I expect that a really good media project would be more cost-effective, but creating a good media project requires exceptional talent. Organizing a protest requires some skill, but the bar isn't particularly high. Therefore, you can support protests without having to identify top-tier talent.
Protests are highly neglected, and I expect them to continue to be neglected.

Cons:

Some people are concerned that protests can backfire.
- I'm not concerned about peaceful protests backfiring. The scientific literature universally shows that peaceful protests have a positive effect, although the strength of the evidence could be better—see Do Protests Work? A Critical Review.
My back-of-the-envelope calculation suggested that talking directly to policy-makers is more cost-effective.
Now may be too early. See When is the right time for advocacy?
It may be bad for large funders to fund protests because it could create an appearance of astroturfing.
- This is an argument against large funders funding them, but in favor of individual donors supporting protests.

Media about dangers of AI

Create media explaining why AI x-risk is a big deal and what we should do about it.

Things like:

Books (ex: If Anyone Builds It, Everyone Dies)
News articles (ex: Existential Risk Observatory)
Videos (ex: Rob Miles)
Forecasts of how AI can go badly (ex: AI 2027)

Theory of change:

Media increase public concern, which makes policy-makers more likely to put good regulations in place.

Who's working on it?

80,000 Hours; AI Futures Project; AI Safety and Governance Fund; AI Safety Info; Centre pour la Sécurité de l'IA (CeSIA); CivAI; Existential Risk Observatory; Machine Intelligence Research Institute; Midas Project; various media projects by individual people.

Pros:

We need to get good policies in place. Media can influence policy-makers, and can influence the public, which is important because policy-makers largely want to do what their constituents want.
Polls show that the public is concerned about AI risk and even x-risk, but it's not a high-priority issue. Media can make it more salient and/or raise "common knowledge" of concern about AI.

Cons:

The impact of media is fat-tailed and heavily depends on quality. It's hard to identify which media projects to fund.
Poorly done or misleading media projects could backfire.

Message testing

Many people have strong opinions about the correct way to communicate AI safety to a non-technical audience, but people's hypotheses have largely not been tested. A project could make a systematic attempt to compare different messages and survey listeners to assess effectiveness.

Theory of change:

We can't ultimately get good AI safety outcomes unless we communicate the importance of the problem, and having data on message effectiveness will help with that.

Who's working on it?

AI Safety and Governance Fund has an ongoing project (see Manifund) to test AI risk messages via online ads. The project is currently moving slowly due to lack of funding.

Some advocacy orgs have done small-scale message testing on their own materials.

Pros:

A smallish investment in empirical data could inform a large amount of messaging going forward.
Some types of experiments (using online ads or Mechanical Turk) can scale well with funding.

Cons:

The best types of communication may be long, individually tailored (e.g. in one-on-one conversations with policy-makers), or otherwise difficult to test empirically.
I suspect that a person with excellent communication skills would not benefit much from seeing A/B-tested messaging because they can already intuit which wording will be best. (But it may be difficult to identify and hire those people.)
This is the sort of thing that might be best to do internally by an advocacy org that already has a reasonable idea of what kind of message it wants to send.

Host a website for discussion of AI safety and other important issues

LessWrong and the Effective Altruism Forum are upstream of a large quantity of work (in AI safety as well as other EA cause areas). It is valuable that these websites continue to exist, and that moderators and web developers continue to work to preserve/improve the quality of discussion.

Realistically, it doesn't make sense to start a new discussion forum, so this idea amounts to "fund/support LessWrong and/or the EA Forum".

Theory of change:

On Lightcone Infrastructure's Manifund, Oliver Habryka lists some concrete outcomes that are attributable to the existence of LessWrong. You could probably find similar evidence of impact for the EA Forum.

Who's working on it?

Centre for Effective Altruism; Lightcone Infrastructure.

Pros:

Nearly every project on my list has benefited in some way from the existence of LessWrong or the EA Forum.
Hosting a platform for sharing research/discussion is cheaper (and therefore arguably more cost-effective) than directly conducting research.

Cons:

The benefits are diffuse: high-quality discussion forums provide small-to-moderate benefits to every cause, but usually not huge benefits. So it may be better to directly support your favorite intervention(s).

A toy model: Suppose that

There are 10 categories of AI safety work.
LessWrong makes each of them 20% better.
The average AI safety work produces 1 utility point.
Well-directed AI policy produces 5 utility points.

Then marginal work on LessWrong is worth 2 utility points, and my favorite AI policy orgs are worth 5 points.

List of other project ideas

A project not being a top idea doesn't mean it's bad. In fact, it's likely that at least one or two of these ideas should be on my top-ideas list; I just don't know which ones.

I sourced ideas from:

reviewing other lists of project ideas and filtering for the relevant ones;
looking at what existing orgs are working on;
writing down any (sufficiently broad) idea I came across over the last ~6 months;
writing down any idea I thought of.

Ideas are roughly ordered from broadest to most specific.

AI-for-animals ideas

Neartermist animal advocacy

There are various projects to improve current conditions for animals, particularly farm animals: cage-free campaigns, humane slaughter, vegetarian activism, etc. I will lump all these projects together for the purposes of this report.

Theory of change:

Animal advocacy increases concern for animals, which likely has positive flow-through effects into the future, by affecting future generations or by shaping the values of the transformative AI that will control the future.

Who's working on it?

Too many to list.

Pros:

Neartermist animal advocacy has the dual benefit of definitely helping animals today, and building momentum to make future work more effective (to borrow a framing from Jeff Sebo).
Neartermist animal advocacy is tractable and has clear feedback loops. It looks especially promising if you're highly uncertain or clueless about longtermist or post-TAI interventions.

Cons:

The benefits are diffuse. Creating one new vegan helps many animals in the short term, but has only a tiny effect on society's future values. I expect direct attempts to improve AI alignment-to-animals to be much more cost-effective.
- I created a back-of-the-envelope calculation that aligns with my initial expectation: my BOTEC-informed guess is that direct advocacy on AI values (by promoting a friendliness-to-animals LLM benchmark) is 2–3 orders of magnitude more cost-effective than conventional animal advocacy.
Neartermist animal advocacy works best if timelines are long. Timelines are likely not long.
Effective animal activists generally regard corporate campaigns as more effective than advocacy directed at consumers, but changing corporate practices seems less relevant for shifting society's values.

Using TAI to improve farm animal welfare

I'm concerned about how TAI could negatively impact non-human welfare. There are some proposals on how TAI could negatively impact farm animals (e.g. by making factory farming more efficient), and on how animal activists could use TAI to make their activism more effective. I will take these proposals as a broad category rather than discussing them individually.

Theory of change:

Depends on the specific proposal.

Who's working on it?

Electric Sheep; Hive (see Transformative AI and Animals: Animal Advocacy Under A Post-Work Society); Open Paws; Wild Animal Initiative (see Transformative AI and wild animals: An exploration).

Pros:

Highly neglected.
I am generally skeptical of interventions of the form "teach people to leverage AI to do X better", but farm animal advocacy seems sufficiently important that it might be worthwhile in this case.

Cons:

Although two of my top ideas relate to post-TAI animal welfare (Develop new plans / evaluate existing plans to improve post-TAI animal welfare and Advocate to change AI training to make LLMs more animal-friendly), I don't think it's worth focusing on farm animal welfare in particular.

Factory farming interventions only matter if factory farming still exists. Cultured meat outcompetes factory farming once you get a sufficiently strong understanding of biology (there's no way growing a whole chicken is the cheapest possible way to create chicken-meat). We are far from that level of understanding, but I would be surprised if (aligned) TAI couldn't figure it out.
The scale of wild animal welfare is orders of magnitude larger than that of factory farming. The case for prioritizing farm animals over wild animals is that we don't have the power or knowledge to positively influence nature, but TAI should change the equation.
Space colonization would ultimately dominate earth-based welfare, so questions about panspermia or digital minds have a bigger expected impact.
Proposals for how to use TAI to improve animal advocacy only make sense if TAI does not cause value lock-in. If there is no value lock-in, then there's no strong reason to spend time now trying to figure out how to use TAI. It would be better to wait until after TAI because at that point, we will have a much better understanding of how TAI works, and there's no urgency.

Lobby governments to include animal welfare in AI regulations

If governments put safety restrictions on advanced AI, they could also create rules about animal welfare.

Theory of change:

Getting regulations in place would force companies' AIs to respect animal welfare.

Pros:

One set of regulations can alter the behavior of many frontier companies.
If companies voluntarily change their behavior, they can regress at any time with no consequences. But companies have to obey regulations.

Cons:

It's unclear what exactly regulations could do about animal welfare. AI safety regulations, insofar as they exist (which they mostly don't), don't dictate how LLMs are required to behave; they dictate what companies are required to do to make LLMs safe. What is a regulatory rule that policy-makers would plausibly be on board with, that would also influence model behavior to be friendlier to animals?
Influencing the government on animal welfare seems harder than influencing AI companies.

Traditional animal advocacy targeted at frontier AI developers

Animal advocacy orgs could use their traditional techniques, but focus on raising concern for animal welfare among AI developers. For example, buy billboards outside AI company offices or use targeted online ads.

Theory of change:

AI developers become more concerned for animal welfare, and they make AI development decisions that improve the likelihood that transformative AI is good for animals.

Pros:

Similar to neartermist animal advocacy, but plausibly more cost-effective because it's more targeted.

Cons:

It's not known whether techniques like animal welfare ads are effective in general, and they may even be particularly ineffective among demographics like AI developers.
Even if AI developers cared more about animal welfare, it's not clear that this would carry through to their work on AI.
In 2016, I created a back-of-the-envelope calculation on this idea, and the result wasn't as good as I expected (it looked worse than standard animal advocacy, if you assume the animal advocacy propagates values into the far future). However, the numbers are outdated because we know a lot more about AI now than we did in 2016.

Research which alignment strategies are more likely to be good for animals

Some alignment strategies may be better or worse for non-human welfare. For example, I expect CEV would be better than the current paradigm of "teach the LLM to say things that RLHF judges like", which is better than "hard-code (GOFAI-style) whatever moral rules the AI company thinks are correct".

A research project could go more in-depth on which alignment techniques are most likely to be good for animals (or digital minds, etc.).

Theory of change:

Identify promising alignment techniques, in the hope that people use those techniques. There are enough animal-friendly alignment researchers at AI companies that this might happen.

Pros:

To my knowledge, this question has never been studied.
Some alignment techniques may be much better for animals than others.

Cons:

We have a poor understanding of what ASI will look like, which makes it very hard to say what will work for animal welfare.
We don't know how to align ASI to any goals at all. We can't align AI to animal welfare until we can align AI to something.
In the world where alignment turns out to be tractable, it's likely that there will be strong incentives shaping how ASI is aligned. The choice of whether to use (say) something-like-CEV or something-like-RLHF will be difficult to influence.

AI policy/advocacy ideas

Improving US <> China relations / international peace

Theory of change:

US and China (and other countries) need to agree not to build dangerous AI. Generically improving international cooperation, especially between the US and China, increases the chance that nations cooperate on AI (non-)development.

Who's working on it?

Too many to list. Some examples: Asia Society's Center on US–China Relations; Carnegie Endowment for International Peace; Carter Center; National Committee on United States–China Relations; US-China Policy Foundation.

Orgs that work on international cooperation specifically on AI safety (although not necessarily existential risk) include: AI Governance Exchange; Global AI Risks Initiative; Safe AI Forum.

Pros:

International cooperation is likely necessary to prevent existentially risky AI from being built.
Efforts to improve cooperation have succeeded in the past; for example, the US–China Strategic and Economic Dialogue (source).
International cooperation has wide-ranging benefits; efforts can attract funding from many parties with varying agendas.

Cons:

The route to preventing extinction is indirect, which dilutes the cost-effectiveness of this intervention.
International cooperation is far from neglected. Marginal efforts might not make much difference.

Talk to international peace orgs about AI

Theory of change:

Most international peace orgs probably aren't aware of how important AI regulation is, and they would likely help develop international treaties on AI if they knew it was important.

Who's working on it?

Nobody that I know of.

Pros and cons are largely the same as Improving US <> China relations / international peace. In addition:

Pros:

This plan is higher-leverage than simply funding international peace orgs.

Cons:

Unclear how to do it. What sorts of evidence would the orgs find persuasive? Which orgs are best suited to working on AI-related cooperation? Those questions are answerable, but I'm not in a good position to answer them.

Increasing government expertise about AI

Talk to policy-makers or create educational materials about how AI works, or help place AI experts in relevant policy roles.

Theory of change:

Policy-makers can do a more effective job of regulating AI if they understand it better.

Pros:

Increasing government expertise may improve the quality of AI regulations.

Cons:

The experts need to actually care about x-risk. Plenty of experts want to accelerate AI development/prevent regulation. For extant projects designed to increase AI expertise, I am skeptical that the expertise would be appropriately x-risk-oriented.
- In practice, "hire experts" often means "hire current or former AI company employees", which is a recipe for regulatory capture. I expect this would significantly decrease our chance of getting useful x-risk-reducing regulations.
Expertise is much less of a bottleneck than willingness to regulate AI. If I can spend $1 on increasing willingness or $1 on expertise, I'd much rather spend it on willingness.
- And in a sense, the AI safety community already invests way more in expertise (via policy research) than in advocacy. On the margin, we need advocacy more.
The downside of poorly-targeted AI safety regulations is that they end up hurting economic development. That's bad, but it looks pretty trivial in a cost-benefit analysis compared to extinction.
I wrote a longer comment about this subject on the EA Forum.
Unless by "expertise" we're talking about "expertise at recognizing that AI x-risk is a big problem". In which case, yes, we need expertise.
- Right now, the main strategy for getting x-risk people into government is "pretend not to care about x-risk so you seem normal, and never voice your concerns". In which case, what's the point?
- I think a better strategy is "talk to policy-makers about x-risk and straightforwardly tell them what you believe."
I've heard it argued that increasing government expertise is low tractability because government is so big, and making internal changes like that is slow. I don't think this is a strong consideration because other approaches to AI safety are also low tractability. (I still don't think increasing expertise is a good plan, but this particular argument seems weak.)
Policy-makers are largely in the dark about x-risk, which is indeed a problem. But I don't see clean routes to increasing AI expertise that don't also push in the wrong direction. Raising concern about the importance of TAI has historically led people to believe things like "I need to be the one who controls TAI, so I will start a new AI company" or "we need to make sure we get TAI before China". Generically increasing AI expertise is, in my best estimation, net harmful. For more on this, see Advocacy should emphasize x-risk and misalignment risk.

Policy/advocacy in China

Take any of my ideas on AI policy/advocacy, and do that in China instead of in the West.

Theory of change:

The high-level argument for policy/advocacy in China is largely the same as the argument for prioritizing AI risk policy/advocacy in general. China is currently the #2 leading country in AI development, so it's important that Chinese AI developers take safety seriously.

Who's working on it?

AI Governance Exchange; Safe AI Forum (sort of); perhaps some Chinese orgs that I'm not familiar with.

Pros:

Many of the ideas listed above are intended to improve the state of AI policy in the US/UK. The pros for those ideas largely also apply to equivalent projects conducted in China (modulo the obvious differences, e.g. China is not a representative democracy, so political advocacy works differently).

Cons:

Probably nobody in China is reading this report. Advocating for Chinese policy as a non-Chinese person is fraught because the CCP will not trust our motivations, just as the American government would not trust a Chinese philanthropist who funds American AI safety advocacy.
According to an 80,000 Hours career review: "[The Chinese government] is often wary of non-governmental groups that try to bring about grassroots change. If an organisation is blacklisted, then that’s a nearly irreversible setback."

A comment on my state of knowledge:

I don't know much about the current state of AI safety in China, or what sort of advocacy might work. I did not prioritize looking into it because my initial impression is that I would require high confidence before recommending any interventions (due to the cons listed above), and I would be unlikely to achieve the necessary level of confidence in a reasonable amount of time.

Corporate campaigns to advocate for safety

Run public campaigns to advocate for companies to improve safety practices and call out unsafe behavior.

Theory of change:

Companies may improve their behavior in the interest of maintaining a good public image.

Who's working on it?

Midas Project; More Light.

Pros:

Corporate campaigns have worked well in animal advocacy.
Companies are smaller than governments, which means they're more agile and potentially easier to influence.

Cons:

Companies have strong internal incentives to be unsafe. By contrast, governments don't have a profit motive. They may be harder to move, but they have less reason to oppose safety efforts.
Making more actors safe is better than making fewer actors safe. International treaty > single-country regulation > single-company safety efforts.
Democratic governments are specifically designed to do what people want. That doesn't always happen, but at least there are mechanisms pushing them that way. Companies are not democratic.
There is a good chance that safety standards strong enough to prevent human extinction would pose an existential threat to AI companies (or at least would be incompatible with their current valuations). If that's the case, then corporate campaigns will not be able to get companies to implement adequate safety measures.

Overall, this seems similar to political advocacy, but worse.

Develop AI safety/security/evaluation standards

Work inside a company, as a nonprofit, or with a governmental body (NIST, ISO, etc.) to develop AI safety standards.

Theory of change:

Sufficiently well-written standards can define under what conditions a frontier AI is safe, and potentially enforce those conditions.

Who's working on it?

AI Standards Lab; various AI companies; various governmental bodies.

Pros:

Companies might abide by the standards.
Standards can provide a template with which to write regulations.

Cons:

Standards are not currently the bottleneck to getting regulation written. We already have a substantial amount of work on AI safety standards, but we still don't have good regulations. There aren't people waiting around to write legislation if only they had some standards they could use. (Of course, more standards would still be better.)
If standards are voluntary, companies can stop abiding by them when they turn out to be hard to satisfy (and in fact, companies have already done that on multiple occasions, with respect to their self-imposed safety standards).
Standards agencies such as NIST generally don't have enforcement power.
Nobody knows how to write standards that will prevent extinction if implemented. See Appendix for more on this.

Slow down Chinese AI development via ordinary foreign policy

Both the "slow down AI" crowd and the "maintain America's lead" crowd agree that it is good for China's AI development to slow down. The American government could accomplish this using foreign policy levers such as:

restricting chip exports;
encouraging scientists to immigrate to the US.

Theory of change:

On my view, slowing down Chinese AI development is good because it gives the leading AI developers in the United States more room to slow down. It also makes it less likely that a Chinese company develops misaligned TAI (although right now, US companies are more likely to develop TAI first).

Slowing down Chinese AI development looks good on the "maintain America's lead" view, although I believe this view is misguided—making TAI safe is much more important than making one country build it before another.

Who's working on it?

Center for Security and Emerging Technology has written memos recommending similar interventions. I expect there are some other orgs doing similar activities, including orgs that are more concerned about national security than AI risk.

Pros:

Looks (plausibly) good on multiple worldviews.

Cons:

Some policies could antagonize China and make cooperation more difficult.
- Mutual sabotage between the US and China would probably decrease AI x-risk, but it would also have negative effects. I'd rather the countries increase safety via mutual cooperation.
- This depends on the policy. Improving AI model security is fine, relaxing immigration restrictions is probably fine, but export restrictions or tariffs would likely heighten international tensions. (The United States already has export restrictions and tariffs, but it might be bad to add more on the margin.)
Advocating for these sorts of foreign policy interventions is likely not cost-effective because they're in controversial political areas that already see significant funding and effort. For example, there are already strong and well-funded interests arguing both for and against immigration.
On the "slow down AI" view, this seems less promising than domestic US regulation because US-based companies look significantly more likely than China to be the first to build misaligned ASI.

Whistleblower protection/support

Provide legal support for whistleblowers inside AI companies and assist in publicizing whistleblowers' findings (e.g. setting up press interviews).

Theory of change:

A few ways this could help:

Whistleblowers can force companies to change their unsafe behavior.
The possibility of whistleblowers incentivizes companies to be safe.
Publicizing companies' bad behavior can raise public concern about AI safety.

Who's working on it?

More Light.

Pros:

Whistleblower support could be high-leverage: the whistleblowers themselves bring the important information, but they often can't do much without help.

Cons:

Only helps if there are things worth whistleblowing on. There might not be any warning shots (see When is the right time for advocacy? for some relevant discussion).
- OpenAI's bad behavior on secret NDAs was whistleblow-worthy, but it wasn't directly related to AI risk, and it's not clear that the news about OpenAI's bad behavior decreased x-risk.
Other paths to impact are more direct, e.g. media projects or corporate campaigns.

Opinion polling

Run polls to learn public opinion on AI safety.

Theory of change:

Polls can inform policy-makers about what their constituents want. They also inform people working on AI safety about where their views most align with the public, which can help them prioritize.

Who's working on it?

AI Policy Institute; traditional polling agencies (Pew and YouGov have done polls on people's views on AI).

Pros:

When talking to policy-makers, it's useful to be able to point to polls as evidence that the public cares about AI safety.
Polls provide common knowledge of concern for AI risk.

Cons:

A good amount of polling already exists. If we already know that people in 2024 were concerned about AI risk, there's not as much value in knowing that they're still concerned in 2025.
The main benefit of polls is to empower advocacy, but we have precious little advocacy right now.

Help AI company employees improve safety within their companies

Work with people in AI companies (by organizing conferences, peer support, etc.) to help them learn about good safety practices.

Theory of change:

AI company employees can push leadership to implement stronger internal safety standards.

Who's working on it?

AI Leadership Collective.

Pros:

If internal employees work more on safety, that will make AI companies safer.

Cons:

This theory of change has the same issues as corporate campaigns: companies have strong incentives to be unsafe; global (or at least national) safety measures are better than single-company measures.

Working with AI company employees might be sufficiently high-leverage to make up for my concerns. Answering that question would require going more in-depth, so I will go with my intuition and say it's probably not as cost-effective as my top ideas.

Direct talks with AI companies to make them safer

If you can talk to policy-makers about AI x-risk, then maybe you can also talk to AI company executives about AI safety.

Theory of change:

AI company execs have significant control over the direction of AI. If they started prioritizing safety to a significantly greater extent, they could probably do a lot to decrease x-risk.

Who's working on it?

To my knowledge, nobody is systematically working on this.

Pros:

High-leverage: you may be able to prevent extinction by changing the minds of a half-dozen or so people.

Cons:

Similar issues to corporate campaigns: governments have less incentive to be unsafe; it's better to make all companies safe simultaneously (via regulation). Therefore, political action seems more promising.
The CEOs of most AI companies are well aware that AI poses an extinction risk, but they are building it anyway, and they are massively under-investing in safety anyway. It's not clear what additional information would change their minds. So this seems worse than corporate campaigns.

People in relevant positions should push companies to be safer in ways that they can, but I don't see any way to support this intervention as a philanthropist.

Monitor AI companies on safety standards

Track how well each frontier AI company does model risk assessments, security, misuse prevention, and other safety-relevant behaviors.

Theory of change:

Monitoring AI companies can inform policy-makers and the public about the state of company safety. Monitoring could have a similar effect as corporate campaigns, where it pushes companies to be safer, or it could even directly inform corporate campaigns.

Who's working on it?

AI Lab Watch & AI Safety Claims Analysis; Future of Life Institute's AI Safety Index; Midas Project; Safer AI.

Pros:

Easy to do: a solo developer can run a monitoring website as a side project.

Cons:

The theory of change seems somewhat weak to me. There are other ways to demonstrate AI risks to policy-makers and the public, and it's not clear that this way is particularly good.
- But I don't know what kind of impact the monitoring websites have had; maybe they've had some big positive influences that I don't know about.

Create a petition or open letter on AI risk

Write a petition raising concern about AI risk or calling for action (such as a six-month pause or an international treaty). Get respected figures (AI experts, etc.) to sign the petition.

Theory of change:

A petition can make it apparent to policy-makers and the public that many people/experts are concerned about AI risk, while also creating common knowledge among concerned people that they are in good company if they speak up about it.

Who's working on it?

aitreaty.org; Center for AI Safety / CAIS Action Fund; Future of Life Institute / FLI Action and Research, Inc.

Pros:

Brings AI risk into the public conversation and makes it easier to talk about.
Relatively easy to do—the main difficulty is in finding people to sign it who can bring credibility.

Cons:

The biggest con is that petitions have diminishing marginal utility, and several have been made already.
The path from a petition to concrete outcomes isn't entirely clear (although I'm inclined to believe that petitions can work).

My sense is that the existing petitions have been quite helpful, but there isn't clear value in creating more petitions. There may be some specific call to action that a new petition ought to put forward, but I'm not sure what that would be.

Create demonstrations of dangerous AI capabilities

Make AI risk concrete by building concrete demonstrations of how AI can be dangerous.

Theory of change:

Many people don't find it plausible that AI could cause harm; concrete demonstrations may change their minds. It can also serve to make AI risk more visceral.

Who's working on it?

FAR.AI; Palisade Research; some one-off work by others (e.g. Apart Research).

Pros:

Concrete demonstrations can aid advocacy efforts.

Cons:

Capability demonstrations may send the wrong message, encouraging accelerationism instead of caution.
- I am more concerned about this for general AI capability evaluations. For this project, I am specifically thinking of demonstrations of how AI can do harm.
You can't create a concrete demonstration of superintelligent AI's capabilities until you already have superintelligent AI, at which point it's too late. Pre-superintelligent AIs can have scary capabilities, but demos are misleading in a sense because they may create a skewed understanding of where the risks come from.

I'm most optimistic about demonstrations where there is a clear plan for how to use them, e.g. Palisade Research builds its demos specifically to show to policy-makers. I think Palisade is doing a particularly good version of this idea, but for the most part, I think other ideas are better.

Sue OpenAI for violating its nonprofit mission

OpenAI:

Our mission is to ensure that artificial general intelligence […] benefits all of humanity.

OpenAI has more-or-less straightforwardly violated this mission in various ways. Humanity plausibly has grounds to sue OpenAI.

Theory of change:

A lawsuit would change OpenAI's incentives and may force OpenAI to actually put humanity's interest first, depending on how well the lawsuit goes.

Who's working on it?

In 2024, Elon Musk filed a lawsuit against OpenAI on this basis. The lawsuit is set to go to trial in 2026.

Given that there is already an ongoing lawsuit, it may be better to support the existing suit (e.g. by writing an amicus brief or by offering expert testimony) than to start a new one.

Pros:

A lawsuit could force OpenAI to significantly improve safety.

Cons:

A failed lawsuit has numerous downsides—it can make the plaintiff look bad; it can set an unfavorable precedent; it's expensive.
Legal matters may have other, hard-to-predict downsides, and I'm not qualified to evaluate them.
I am not a lawyer, but my impression is that courts are typically quite lenient about what nonprofits are allowed to do, so it would be difficult for a lawsuit to succeed.
Given that there is an ongoing lawsuit by Elon Musk, who is known to behave erratically, Musk may do something unpredictable that causes harm.

Send people AI safety books

Books are a tried-and-true method of explaining complex ideas. One could mail books on AI risk to Congress people, or staffers, or AI company execs, or other relevant people.

Theory of change:

A book can explain in detail why AI risk is a big deal and thus persuade people that it's a big deal.

Who's working on it?

MIRI did something similar when promoting their new book: "We cold-emailed a bunch of famous people (like Obama and Oprah)". They were asking people to write blurbs for the book, which isn't exactly what I had in mind, but it's related.
Aidar Toktargazin has a Manifund project to give out AI safety books to researchers and professors at Nazarbayev University in Kazakhstan.

Pros:

Mailing books is cheap.

Cons:

Mailing books seems riskier than other kinds of advocacy—it could be viewed as excessively pushy.
- Mormons give out free books, and they're viewed as pushy, but they've also grown a lot. It's unclear whether Mormons' publicity strategies are worth emulating.
It may be better to let publishers do their own publicity.

This idea might be really good, but it's high-variance. I would not recommend it without significantly investigating the possible downsides first.

AI research ideas

I said I wasn't going to focus on technical research or policy research, but I did incidentally come up with a few under-explored ideas. These are research projects that I'd like to see more work on, although I still believe advocacy is more important.

Research on how to get people to extrapolate

A key psychological mistake: "superintelligent AI has never caused extinction before, therefore it won't happen." Or: "AI is not currently dangerous, therefore it will never be dangerous."

Compare: "Declaring a COVID emergency is silly; there are currently zero cases in San Francisco." (I am slightly embarrassed to say that that is a thought I had in February 2020.)

Relatedly, some people expect there will not be much demand for AI regulation until we see a "warning shot". Perhaps, but I'm concerned we will run into this failure-to-extrapolate phenomenon. AI has already demonstrated alignment failures (Bing Sydney comes to mind; or GPT 4o's absurd sycophancy; or numerous xAI/Grok incidents). But clear examples of misalignment get fixed (because AI is still dumb enough for us to control). So people may draw the lesson that misalignment is fixable, and there may keep being progressively bigger incidents until we finally build an AI powerful enough to kill everyone.

I am concerned that the concept of AI x-risk will never be able to get sufficient attention due to this psychological mistake. Therefore, we need to figure out how to get people to stop making this mistake.

Theory of change:

Many people ignore AI x-risk because of this mistake. If we knew how to get people to extrapolate, we could use that knowledge to improve communication on AI risk.

Who's working on it?

Some academic psychologists have done related research.

Pros:

If successful, this research would significantly increase how many people take AI risk seriously.

Cons:

We know from the history of psychology that it's difficult to find psychological insights. I'd guess it would cost tens or hundreds of millions of dollars to produce meaningful results (if it's even possible at all).
Psychology research takes a long time to pay off. That doesn't work if timelines are short.
This research is not particularly neglected. The American Psychological Association wants more research on how to get people to care about climate change, which is related.
To the extent that science has already uncovered answers to this question, good communicators already know those answers. For example, studies have found that people are more likely to pay attention to future problems when you give concrete scenarios; but a good writer already does that. (See Deep Research (1, 2) for an attempt at finding relevant psychology studies, although only about a quarter of them are actually relevant.)

Rather than spending $50 million on psychology research and then $50 million on a variety of psychologically-motivated media projects informed by that research, I would rather just spend $100 million on media projects.

Investigate how to use AI to reduce other x-risks

An important argument against slowing down AI development is that we could use advanced AI to reduce other x-risks (climate change, nuclear war, etc.).

But an aligned AI wouldn't automatically reduce x-risk. It may increase technological risks (e.g. synthetic biology) if offensive capabilities outscale defensive ones. An aligned AI could reduce nuclear risk by improving global coordination, but it's not obvious that it would.

Therefore, it may be worth asking: Are there some paths of AI development that differentially reduce non-AI x-risk?

Theory of change:

Research on how to direct AI may inform efforts by the developers of advanced AI, which may ultimately reduce x-risk.

Who's working on it?

Nobody, to my knowledge.

Pros:

If you believe we need to build TAI to avert non-AI x-risks, then it stands to reason that you should also want to know how to direct TAI to accomplish that end.
This line of research is highly neglected (to my knowledge, there are zero people working on it).

Cons:

Non-AI x-risks seem less concerning than AI x-risk, so it seems better to work directly on reducing AI x-risk.
If you believe we should slow down AI development, then this line of research doesn't matter as much. And I do believe we should slow down AI development, and I believe that a wide range of worldviews should agree with me on that (see Appendix: A moratorium is the best outcome).
At a glance, the problem seems difficult to make progress on (see below).

My initial thoughts on this line of research:

You can't control what general AI is good at. It would be good at everything. There is no known way to make it (say) good at defending against biological weapons, but bad at creating biological weapons.
A narrow "cooperation superintelligence" would be better than a narrow "scientist superintelligence" because the latter increases technological x-risk. But based on current trends in AI, my guess is that we could develop a "scientist ASI" that's bad at cooperation, but we couldn't develop a "cooperation ASI" that's bad at science. So this idea is likely a dead end.
Even if we could build a "cooperation ASI", we still need to solve alignment problems first. So it seems better to focus on solving alignment.

A short-timelines alignment plan that doesn't rely on bootstrapping

To my knowledge, every major AI alignment plan depends on alignment bootstrapping, i.e., using AI to solve AI alignment. I am skeptical that bootstrapping will work, and even if you think it will probably work (with, say, 90% credence), you should still want a contingency plan.

Write a research agenda for how to solve AI alignment without using bootstrapping.

Theory of change:

If people come up with sufficiently good plans, then we might solve alignment.

Who's working on it?

Peter Gebauer is running a contest for short-timelines AI safety plans, but the plans are allowed to depend on bootstrapping (e.g. Gebauer favorably cites DeepMind's plan).

There are some plans that say something like "stop developing AI until we solve alignment" (ex: MIRI; Narrow Path), which is valid (and I agree), but it's not a technical plan.

The closest thing I've seen is John Wentworth's research agenda, but it specifically invokes the underpants gnome meme, i.e., the plan has a huge hole in the middle.

Pros:

Existing plans are insufficiently rigorous.
To my knowledge, there are zero meaningful plans that don't rely on alignment bootstrapping. If bootstrapping turns out not to work, every plan fails. There is a gap to be filled.

Cons:

It is highly unlikely that any satisfactory plan exists.
- I think trying to create plans is a reasonable idea on the off chance that somebody does come up with a good plan, but I don't think it's a good use of marginal philanthropic resources.
An alignment plan still leaves non-alignment problems unsolved.
AI companies cannot be trusted to implement a safe plan even if one exists.
- But the plan existing does increase the chance that companies follow the plan, or that external pressures can force companies to follow it.

Rigorous analysis of the various ways alignment bootstrapping could fail

I'm pessimistic about the prospects of alignment bootstrapping, and I've seen various AI safety researchers express similar skepticism, but I've never seen a rigorous analysis of the concerns with alignment bootstrapping.

Theory of change:

The AI companies with AI safety plans are all expecting bootstrapping to work. A thorough critique could convince AI companies to develop better plans, or create more of a consensus among ML researchers that bootstrapping is inadequate, or convince policy-makers that they need to make AI companies be safer.

Who's working on it?

Nobody, to my knowledge.

Pros:

This sort of analysis would be feasible to write—it would require expertise and time investment, but it doesn't require novel research.

Cons:

The theory of change seems weak—if companies haven't already figured out that their plans are inadequate, then I doubt that more criticism is going to change their minds.
Independent of AI companies' top-down plans, it's helpful if you can better inform alignment researchers about what they should be focusing on. But my guess is an analysis like this wouldn't shift much work.

Future work

Pros and cons of slowing down AI development, with numeric credences

I addressed this to some extent, but I could've gone into more detail, and I didn't do any numeric analysis.

I would like to see a more formal model that includes how the tradeoff changes based on considerations like:

One's view of population ethics
The importance of preventing deaths vs. causing births
Temporal discount rate or longtermism
P(doom)
The extent to which P(doom) is reduced if AI development slows down
Existential risk from sources other than AI (see Quantitative model on AI x-risk vs. other x-risks)
How AI development interacts with other x-risks

Relatedly, how does P(doom) change what actions you're willing to take? I often see people assume that at a P(doom) of, say, 25%, pausing AI development is bad. That seems wrong to me. I believe that at 25% you should be about as aggressive (about pushing for mitigations) as you would be at 95%, although I haven't put in the work to come up with a detailed justification for this position. The basic argument is that x-risk looks very bad on longtermist grounds, and a delay of even (say) 100 years doesn't look like as big a deal.

Quantitative model on AI x-risk vs. other x-risks

There is an argument that we need TAI soon because otherwise we are likely to kill ourselves via some other x-risk. I have a rough idea for how I could build a quantitative model to test under what assumptions this argument works. Building that model wasn't a priority for this report, but I could do it without much additional effort.

The basic elements the model needs are

X-risk from AI
X-risk from other sources
- How to estimate these? Expert forecasts seem unreliable.
How much we can reduce AI x-risk by delaying development
X-risk from other sources, conditional on TAI
- For an aligned totalizing TAI singleton that quickly controls the world, x-risk would be ~0.
- TAI doesn't trivially decrease other x-risks; it could increase x-risk by accelerating technological growth (which means it's easier to build dangerous technology—see The Vulnerable World Hypothesis). The mechanism of decreasing x-risk isn't that TAI is smarter; the mechanism is that it could increase global coordination / centralize the ability to make dangerous technology.

Deeper investigation of the AI arms race situation

Some open questions:

What are some historical examples of arms races that were successfully aborted? What happened to make things go well?
How hard are different factions racing, and what would it take to convince them to slow down?
How likely are we to end up in a bad totalitarian regime post-TAI if various parties end up "winning the race" and building an alignable ASI?

Does slowing down/pausing AI help solve non-alignment problems?

Pausing (or at least slowing down) clearly gives us more time to solve alignment, and has clear downsides in terms of opportunity cost. But other effects are less clear. Pausing may help with some other big problems: animal-inclusive AI; AI welfare; S-risks from conflict; gradual disempowerment; risks from malevolent actors; moral error. There are some arguments for and against pausing being useful for these non-alignment problems; for more on this topic, see Slowing down is a general-purpose solution to every non-alignment problem.

I have never seen an attempt to analyze why pausing AI development might or might not help with non-alignment problems; this seems like an important question.

Some considerations:

We don't want to build TAI until we become more wise, but it's not clear that we can become more wise, or perhaps TAI would be wiser than we would.
Pausing may increase misuse risk or some related risk.
- One conceivable outcome, albeit one that doesn't seem particularly likely, is that AI companies become increasingly wealthy and powerful by selling pre-TAI AI services, and this concentration of power ultimately allows them to build TAI in a way that goes against most people's interests.
Peace and democratic governance have been trending upward over the past century (see The Better Angels of Our Nature). Slowing/pausing means the world will probably be more peaceful and democratic when we get TAI, which is probably desirable (less chance of power struggle, etc.)
Moral circles have expanded over time (although they haven't strictly expanded—see Gwern's The Narrowing Circle). It's better to develop TAI when moral circles are wider.

Determine when will be the right time to push for strong restrictions on AI (if not now)

A common view: "We should push for strong restrictions on AI, but now is not the right time."

I disagree with this view; I think now is the right time. But suppose it isn't. When will be the right time?

Consider the tradeoff wherein if you do advocacy later, then

the risks of AI will be more apparent;
but there's a greater chance that you're too late to do anything about it.

Is there some inflection point where the first consideration starts outweighing the second?

And how do you account for uncertainty? (Uncertainty means you should do advocacy earlier, because being too late is much worse than being too early.)

I don't think this question is worth trying to answer because I am sufficiently confident that now is the right time. But I think this is an important question from the view that now is too early.

Supplements

I have written two supplements in separate docs:

Appendix: Some miscellaneous topics that weren't quite relevant enough to include in the main text.
List of relevant organizations: A reference list of orgs doing work in AI-for-animals or AI policy/advocacy, with brief descriptions of their activities.

For example, almost every frontier AI company opposed SB-1047; Anthropic supported the bill conditional on amendment, and Elon Musk supported it but xAI did not take any public position. See ChatGPT for a compilation of sources. ↩︎
I asked ChatGPT Deep Research to tally up funding for research vs. policy and it found ~2x as many researchers as policy people and also 2x the budget, although it miscounted some things; most of what it counted as "AI policy" is (1) unrelated to x-risk and (2) policy research, not policy advocacy; and it only included big orgs (e.g. it missed the long tail of independent alignment researchers). So I believe the true ratio is even more skewed than 2:1. ↩︎
According to my research, it's not difficult to find examples of times when UK policy influenced US policy, but it's still unclear to me how strong this effect is. There are also some theoretical arguments for and against the importance of UK AI policy, but I didn't find any of them particularly compelling, so I remain agnostic. ↩︎
I find this state of affairs confusing given how many people profess belief in short timelines. I think part of the reason is that people involved in AI safety tend to be intellectual researcher-types (like me, for example) who are more likely to orient their work toward "what is going to improve the state of knowledge?" rather than "what is likely to pay off in the near future?" ↩︎

Jelle Donders @ 2025-09-25T12:14 (+15)

Key Findings
1. Implementation-focused policy advocacy and communication are neglected
Both AI-safety technical research and policy research only reduce risk when governments and AI companies actually adopt their recommendations. Adoption hinges on clear, persuasive communication and sustained advocacy that turn safety ideas into things like tangibly implementable legal clauses and enforcement mechanisms. Ultimately, that is the job of implementation-focused work including communications: turning promising ideas into action.
Yet funding, talent, and attention remain skewed toward technical research and policy research; quick estimates suggest safety implementation funding is less than 10% of that of all research.

This has appeared to me as a serious bottleneck in the AI safety space for a while now. Does anyone know why this kind of work is rarely funded?

Josh Thorsteinson 🔸 @ 2025-09-28T15:46 (+9)

If leading AI scientists all agree not to build advanced AI, then it does not get built. The question is whether a non-binding commitment will work. There have been similar successes in the past, especially in genetics with voluntary moratoriums on human cloning and human genetic engineering.

It may be worth trying, but I don't think a voluntary moratorium for AI would work.

The Asilomar Conference was preceded by a seven-month voluntary pause on dangerous DNA splicing research, but I think this would be much harder to do for AI.

Some differences:
1. The molecular biology community in 1975 was small and mostly concentrated in the US, but the AI research community today is massive and globally dispersed.
2. Despite the pause being effective in the West, it didn't stop the USSR's DNA splicing research in its secret biological weapons program.
3. The voluntary DNA splicing pause was only for a few months, and the scientists believed their research would resume after the Asilomar Conference. An effective AI pause would ideally be much longer than that, probably without a defined end date.
4. FLI already called for a six-month pause in 2023 - it was ignored.
5. There are far greater incentives for TAI than recombinant DNA research.

I haven't looked into other historical voluntary moratoriums in depth but I don't think the bottom line would be different.

Thanks for this post, I think it's great! Just adding my perspective on this part since I've researched this topic before.

Sean_o_h @ 2025-09-28T09:37 (+9)

I agree especially with the first of these (third is outside my scope of expertise). There's a lot of work that feels chronically underserved and unsupported in 'implementation-focused policy advocacy and communication'.

E.g. standards are correctly mentioned. Participation in standards development processes is an unsexy, frequently tedious and time-consuming processes. The big companies have people whose day jobs is basically sitting in standards bodies like ISO making sure the eventual standards reflect company priorities. AI safety/governance has a few people fitting it in alongside a million other priorities.

e.g. the EU AI office is still struggling to staff the AI Office, one of the most important safety implementation offices out there. e.g. I serve on an OECD AI working group, an international working group with WIC, have served on GPAI, PAI, and regularly advise UN and national governments. You can make a huge difference on these things especially if you have the time to develop proposals/recommendations and follow up between meetings. But I see the same faces from academia & cvil society at all of these - all of us exhausted, trying to fit in as much as we can, alongside research, management, fundraising, teaching+student mentorship + essays (for the academics).

Some of this is that it takes time as an independent to reach the level of seniority/recognition to be invited to these working groups. But my impression from being on funding review panels that many of the people who are well-placed to do so still have an uphill battle to get funding for themselves or the support they need. It helps if you've had something flashy and obviously impactful (e.g. AI-2027) but there's a ton of behind-the-scenes work (that I think is much harder for funders to assess, and sometimes harder for philanthropists to get excited about) that an ecosystem with a chance of actually steering this properly and acting as the necessary counterweight to commercial pressures needs. Time and regular participation at national government level (a bunch of them!) plus US, EU, UN, OECD, G7/G20/G77, Africa, Belt&Road/ANSO, GPAI, ITU, ISO, things like Singapore SCAI & much more. Great opportunities for funders (including small funders).

SummaryBot @ 2025-09-18T21:38 (+2)

Executive summary: The author reviews the AI safety landscape and argues that neglected areas—especially AI existential-risk (x-risk) policy advocacy and ensuring transformative AI (TAI) goes well for animals—deserve more attention, highlighting four priority projects: engaging policymakers, drafting legislation, making AI training more animal-friendly, and developing short-timeline plans for animal welfare.

Key points:

Technical safety research is comparatively well-funded, while AI x-risk advocacy is neglected; Dickens prioritizes advocacy over research, despite risks of backfire or slowing progress.
Short timelines (25–75% chance of TAI within 5 years) make quick-payoff advocacy more urgent than long-horizon research.
Top recommended projects: (a) talk to policymakers about AI x-risk, (b) draft AI safety legislation, (c) advocate for LLM training that includes animal welfare, and (d) design/evaluate short-timeline animal welfare interventions.
Post-TAI animal welfare may be less critical than human survival but remains cost-effective and underfunded relative to its importance.
Non-alignment issues (digital minds, S-risks, moral error, gradual disempowerment) are highly important but judged intractable under short timelines, so not prioritized here.
General recommendations: advocacy should explicitly emphasize extinction and misalignment risks, prioritize work useful under short timelines, and consider slowing AI development as a cross-cutting solution.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.