AI Safety Landscape & Strategic Gaps

By MichaelDickens @ 2025-09-17T23:02 (+88)

The content of this report was written by me, Michael Dickens, and does not represent the views of Rethink Priorities. The executive summary was prepared by Rethink Priorities; I edited the final revision, so any mistakes are my own.

Executive Summary

Why AI Landscape Research and Incubating New Projects Matters

TAI may be a decade or less away, and misalignment could be catastrophic. We need repeated fast, rigorous scans of the field—what exists, what works, where funding and talent are concentrated, and where the next dollar or FTE most reduces risk. To that end, Rethink Priorities commissioned a rapid landscape analysis to surface neglected, high-leverage opportunities. The report, authored by Michael Dickens (views his own), reviews 34 areas of interest and identifies 77 organizations in the relevant fields.

Key Findings

1. Implementation-focused policy advocacy and communication are neglected

Both AI-safety technical research and policy research only reduce risk when governments and AI companies actually adopt their recommendations. Adoption hinges on clear, persuasive communication and sustained advocacy that turn safety ideas into things like tangibly implementable legal clauses and enforcement mechanisms. Ultimately, that is the job of implementation-focused work including communications: turning promising ideas into action.

Yet funding, talent, and attention remain skewed toward technical research and policy research; quick estimates suggest safety implementation funding is less than 10% of that of all research. The result is comparatively few teams able to work on projects like converting technical ideas on safety into enforceable standards, or carrying bipartisan messages. Without this translation layer, legislation may end up delayed, weak, or industry-shaped.

Opportunities:

2. Non-technical, short-timelines work is undervalued

3. Work on AI for animal welfare is neglected & high-leverage

As AI further shapes supply chains, R&D, wildlife management systems, and agriculture, it could reduce animal suffering or entrench/scale it. Trillions of animals are affected by industrial systems likely to be further optimized by AI; early evidence shows speciesist or harm-blind model behavior, and the literature on animal-inclusive AI remains thin compared to human-centric ethics/safety. Neglect here risks locking in large-scale suffering as AI-enabled practices scale. Even assuming an 80%+ chance that aligned AI will benefit animals, a 20% chance of neglect is catastrophic given the scale of animal suffering. Despite this, animal-inclusive AI receives <5% of AI safety resources. The topic is breaking into mainstream venues, but dedicated benchmarks, guardrails, and policy hooks remain sparse.

Opportunities:

  1. Advocate for inclusion of animal-friendliness benchmarks in post-training (some frontier labs are already receptive).
  2. Research short-timelines interventions that could lock in animal welfare protections.

Introduction

This report represents my [Michael Dickens'] beliefs, and not the beliefs of Rethink Priorities.

This report was prompted by two questions:

  1. What are some things we can do to make transformative AI go well?
  2. What are a few high-priority projects that deserve more attention?

I reviewed the AI safety landscape, starting by prioritizing to narrow my focus to areas that look particularly promising and feasible for me to review. I focused on two areas:

  1. AI x-risk advocacy

  2. Making transformative AI go well for animals

A summary of my reasoning on prioritization regarding AI misalignment risk:

Regarding AI issues beyond misalignment:

I created a list of projects within my two focus areas and identified four top project ideas (presented in no particular order):

  1. Talk to policy-makers about AI x-risk

  2. Write AI x-risk legislation

  3. Advocate to change AI training to make LLMs more animal-friendly

  4. Develop new plans / evaluate existing plans to improve post-TAI animal welfare

I also included honorable mentions and a longer list of other project ideas. For each idea, I provide a theory of change, list which orgs are already working on it (if any), and give some pros and cons.

Finally, I list a few areas for future work.

There are two external supplements on Google Docs:

  1. Appendix: Some miscellaneous topics that were relevant, but not quite relevant enough to include in the main text.

  2. List of relevant organizations: A reference list of orgs doing work in AI-for-animals or AI policy/advocacy, with brief descriptions of their activities.

Prelude

I was commissioned by Rethink Priorities to do a broad review of the AI safety/governance landscape and find some neglected interventions. Instead of doing that, I reviewed the landscape of just two areas within AI safety:

  1. AI x-risk advocacy

  2. Making transformative AI go well for animals

I narrowed my focus in the interest of time, and because I believed I had the best chance of identifying promising interventions within those two fields.

There is a tradeoff between (a) giving recommendations that are easy to agree with, but weak; (b) giving strong recommendations that only make sense if you hold certain idiosyncratic beliefs. This report leans more toward (b), making some strong assumptions and building recommendations off of those, although I tried to avoid making assumptions whenever I could do so without weakening the conclusions. I also tried to be clear about what assumptions I'm making.

This report is broad, but I only spent three months writing it. There are some topics in this report that could have been a PhD dissertation, but instead, I spent an hour on them.

Most of this report is about AI policy, but I don't have a background in policy. I did speak to a number of people who work in policy, and I read a lot of published materials, but I lack personal experience, and I expect that there are important things happening in AI policy that I don't know about.

Some positions I'm going to take as given

The following premises would probably be controversial with some audiences, but I expect them to be uncontroversial for the readers of this report, so I will treat them as background assumptions.

  1. Effective altruist principles are correct (e.g. cost-effectiveness matters; you can, in principle, quantify the expected value of an intervention).

  2. Animal welfare matters.

  3. Digital minds can matter.

  4. AI misalignment is a serious problem that could cause human extinction.

Definitions

The terms AGI/ASI/TAI can often be used interchangeably, but in some cases the distinctions matter:

I use the terms "legislation" and "regulation" largely interchangeably. For my purposes, I don't need to draw a distinction between government mandates that are directly written into law vs. decreed by a regulatory body.

Prioritization

For this report, I focused on AI risk advocacy and on post-TAI animal welfare, and I did not spend much time on other AI-related issues.

For the sake of time-efficiency, rather than creating a big list of ideas in the full AI space, I first narrowed down to the regions within the AI space that I thought were most promising and then came up with a list of ideas in those regions. I could have spent time investigating (say) alignment research, but I doubt I would have ended up recommending any alignment research project ideas.

In this section:

  1. Why not technical safety research?

  2. Why not policy research?

  3. Downsides of AI policy/advocacy (and why they're not too big)

  4. What kinds of policies would be good?

  5. Maybe prioritizing post-TAI animal welfare

  6. Why not prioritize digital minds / S-risks / moral error / AI misuse x-risk / gradual disempowerment?

Why not technical safety research?

Technical safety research (mainly alignment research, but also including control, interpretability, monitoring, etc.) is considerably better-funded than AI safety policy.

AI companies invest a significant amount into safety research. They also invest in policy, but their investments are mostly counterproductive (they are mostly advocating against safety regulations[1]). Philanthropic funders invest a lot into technical research, and less into policy.

I have not made a serious attempt to estimate the volume of work going into research vs. policy/advocacy, but my sense is that the former receives much more funding.[2]

Some sub-fields within technical research may be underfunded. But:

  1. I am not in a great position to figure out what those are.

  2. There are already many grantmakers who seek out neglected technical research directions. There are recent requests for proposals (RFPs) from UK AI Security Institute, Open Philanthropy, Future of Life Institute, and Frontier Model Forum's AI Safety Fund, among others.

Why not AI policy research?

Is it better to do policy research (figure out what policies are good) or policy advocacy (try to get policies implemented)?

Both are necessary, but this article focuses on policy advocacy for the following reasons:

Downsides of AI policy/advocacy (and why they're not too big)

Basically, policy work does one or both of these things:

  1. Legally enforce AI safety measures

  2. Slow down AI development

People who care about x-risk broadly agree that AI safety regulations can be good, although there's some disagreement about how to write good regulations.

The biggest objection to regulation is that it (often) causes AI development to slow down. People usually don't object to easy-to-satisfy regulations; they object to regulations that will impede progress.

So, is slowing down AI worth the cost?

I am aware of two good arguments against slowing down AI (or imposing regulations that de facto slow down AI):

  1. Opportunity cost – we need AI to bring technological advances (e.g., medical advances to reduce mortality and health risks)

  2. AI could prevent non-AI-related x-risks

Two additional arguments against AI policy advocacy:

  1. Meaningful AI regulations are not politically feasible to implement

  2. Advocacy can backfire

An additional argument that applies to some types of advocacy but not others:

  1. Advocacy may slow down safer actors without slowing down more reckless actors. For example, it is sometimes argued that US regulations are bad if they allow Chinese developers to gain the lead. I believe this outcome is avoidable—and it's a good reason to prefer global cooperation over national or local regulations. But I can't address this argument concisely, so I will just acknowledge it without further discussion.

(For a longer list of arguments, with responses that are probably better-written than mine, see Katja Grace's Let’s think about slowing down AI.)

Regarding the opportunity cost argument, it makes sense if you think AI does not pose a meaningful existential risk or if you heavily discount future generations. If there is a significant probability that future generations are ~equally valuable to current generations, then the opportunity cost argument does not work. The opportunity cost of delaying AI by (say) a few decades is easily dwarfed by the risk of extinction.

As to the non-AI x-risk argument, it is broadly (although not universally) accepted among people in the x-risk space that AI x-risk is 1–2 orders of magnitude higher than total x-risk from other sources (see Michael Aird's database of x-risk estimates or the Existential Risk Persuasion Tournament, although I don't put much weight on individual forecasts). Therefore, delaying AI development seems preferable as long as it buys us a meaningful reduction in AI risk.

See Quantitative model on AI x-risk vs. other x-risks under Future Work.

The tractability argument seems more concerning. Preventing AI extinction via technical research and preventing it via policy both seem unlikely to work. But I am more optimistic about policy because:

(My reasoning on tractability focused on US policy because that's where most of the top AI companies operate. The UK seems to be the current leader on AI policy, although it's not clear to what extent UK regulations matter for x-risk.[3])

That leaves the backfire argument. This is a real concern, but ultimately it's a risk you have to take at some point because you can't get policies passed if you don't advocate for them. It could make sense to delay advocacy if one has good reason to believe that future advocacy is less likely to backfire; to my knowledge this is not a common position, and it's more common for people to oppose advocacy unconditionally. For more on this topic, see Appendix: When is the right time for advocacy?

Also: I'm somewhat less concerned about this than many people. When I did research on protest outcomes, I found that peaceful protests increased public support, even though many people intuitively expect the opposite. Protests aren't the only type of advocacy, but it's a particularly controversial type of advocacy. If protests don't backfire, then it stands to reason—although I have no direct evidence—that other, tamer forms of advocacy are unlikely to backfire.

(There is some evidence that violent protests backfire, however.)

If policy-maker advocacy is similar to public advocacy, then probably the competence bar is not as high as many people think it is. Perhaps policy-makers are more discerning/critical than the general public; on the other hand, it's specifically their job to do what their constituents want, so it stands to reason that it's a good idea to tell them what you want.

My main concern comes from deference: some people whom I respect believe that advocacy backfires by default. I don't understand why they believe that, so I may be missing something important.

I do believe that much AI risk advocacy has backfired in the past, but I believe this was fairly predictable and avoidable. Specifically, talking about the importance of AI has historically encouraged people to build it, which increased x-risk. People should not advocate for AI being a big deal; they should advocate for AI being risky. (Which it is.) See Advocacy should emphasize x-risk and misalignment risk.

What kinds of policies might reduce AI x-risk?

There are many policies that could help. And many policy ideas are independent: we could have safety testing requirements AND frontier-model training restrictions AND on-chip monitoring AND export controls. Those could all be part of the same bill or separate bills.

It's beyond the scope of this report to come up with specific policy recommendations. I do, however, have some things I would like to see in policy proposals:

  1. They should be relevant to existential risk, especially misalignment risk.

  2. I would like to see work on policies that would help in the event of a global moratorium on frontier AI development (e.g. we'd need ways to enforce the moratorium).

  3. For an ideal policy proposal, it is possible to draw a causal arrow from "this regulation gets passed" to "we survive". Policies don't have to singlehandedly prevent extinction, but given that we may only have a few years before AGI, I believe we should be seriously trying to draft bills that are sufficient on their own to avert extinction.

(The closest thing I've seen to #3 is Barnett & Scher's AI Governance to Avoid Extinction: The Strategic Landscape and Actionable Research Questions. It does not propose a set of policies that would (plausibly) prevent extinction, but it does propose a list of research questions that (may) need to be answered to get us there.)

Some people are concerned about passing suboptimal legislation. I'm not overly concerned about this because the law changes all the time. If you pass some legislation that turns out to be less useful than expected, you can pass more legislation. For example, the first environmental protections were weak, and later regulations strengthened them.

Regulations could create momentum, or they could create "regulation fatigue". I did a brief literature review of historical examples, and my impression was that weak regulation begets strong regulation more often than not, but there are examples in both directions. See my reading notes.

Some AI policy ideas I like

I believe that a moratorium on frontier AI development is the best outcome for preventing x-risk (see Appendix for an explanation of why I believe that). None of my top project ideas depend on this belief, although it would inform the details of how I'd like to see some of those project ideas implemented.

I didn't specifically do research on policy ideas while writing this report, but I did incidentally come across a few ideas that I'd like to see get more attention. Since I didn't put meaningful thought into them, I will simply list them here.

  1. Operationalization of "pause frontier AI development until we can make it safe." For example, what infrastructure and operations are required to enforce a pause?

  2. Rules about when to enforce a pause on frontier AI development: something like "When warning sign X occurs, companies are required to stop training bigger AI systems until they implement mitigations Y/Z."

  3. Ban recursively self-improving AI.

    • Recursive self-improvement is the main way that AI capabilities could rapidly grow out of control, but banning it does not impede progress in the way that most people care about.

    • Some work needs to be done to operationalize this, but we shouldn't let the perfect be the enemy of the good.

  4. Require companies to publish binding safety policies (e.g. responsible scaling policies [RSPs] or similar). That is, if a company's policy says it will do something, then that constitutes a legally binding promise.

    • This would prevent the situation we have seen in the past, where, when a company fails to live up to a particular self-imposed requirement, it simply edits its safety policy to remove that requirement.

    • This sort of regulation isn't strong enough to prevent extinction, but it has the advantage that it should be easy to advocate for.

    • California bill SB 53 says something like this (see sb53.info for a summary), but its rules would not fully come into effect until 2030.

Maybe prioritizing post-TAI animal welfare

Making TAI go well for animals is probably less important than x-risk because:

However, AI-for-animals could still be highly cost-effective.

An extremely basic case for cost-effectiveness:

  1. There's a (say) 80% chance that an aligned(-to-humans) AI will be good for animals, but that still leaves a 20% chance of a bad outcome.

  2. AI-for-animals receives much less than 20% as much funding as AI safety.

  3. Cost-effectiveness maybe scales with the inverse of the amount invested. Therefore, AI-for-animals interventions are more cost-effective on the margin than AI safety.

Why not prioritize digital minds / S-risks / moral error / better futures / AI misuse x-risk / gradual disempowerment?

There are some topics on how to make the future go well that aren't specifically about AI alignment:

Call these "non-alignment risks".

Originally, I included each of these as separate project ideas, but I decided not to focus on any of them. This decision deserves much more attention than I gave it, but I will briefly explain why I did not spend much time on non-alignment risks.

All of these cause areas are extremely important and neglected (more neglected than AI misalignment risk), and (for the most part) very different from each other. And I am happy for the people who are working on them—there are some enormous issues in this space where only one person in the world is working on it. Nonetheless, I did not prioritize them.

My concern is that, if AI timelines are short, then there is virtually no chance that we can solve these problems before TAI arrives.

There is a dilemma:

  1. If TAI can help us solve these problems, then there isn't much benefit in working on them now.

  2. If we can't rely on TAI to help solve them (e.g. we expect value lock-in), then we have little hope of solving them in time.

(There is a way out of this dilemma: perhaps AI timelines are long enough that these problems are tractable, but short enough that we need to start working on them now—we can't wait until it becomes apparent that timelines are long. That seems unlikely because it's rather specific, but I didn't give much thought to this possibility.)

It looks like we have only two reasonable options for handling AI welfare / S-risks / moral error / etc.:

  1. Increase the probability that we end up in world #1, where TAI can help us solve these problems—for example, by increasing the probability that something like a Long Reflection happens.

  2. Slow down AI development.

I lean toward the second option. For more reasoning on this, see Appendix: Slowing down is a general-purpose solution to every non-alignment problem.

I'm quite uncertain about the decision not to focus on these cause areas. They are arguably as important as AI alignment, and much more neglected.

AI-for-animal-welfare could also be included on my list of non-alignment risks, but I did prioritize it because I can see some potentially tractable interventions in the space.

AI welfare seems more tractable than animal welfare in that AI companies care more about it, but it seems less tractable because it involves extremely difficult problems like "when are digital minds conscious?" There may be some tractable, short-timelines-compatible ideas out there, but I did not see any in the research agendas I read.

Perhaps I could identify tractable interventions by digging deeper into the space and maybe doing some original research, but that was out of scope for this article.

Who's working on them?

In each of my project idea sections, I included a list of orgs working on that idea (if any). I didn't write individual project ideas for non-alignment risks (other than AI-for-animals), but I still wanted to include lists of relevant orgs, so I've put them below.

There are also some individual researchers who have published articles on these topics in the past; I will not include those.

Some relevant research agendas

Although I decided not to prioritize this space, others have done work on preparing research agendas, which readers may be interested in. Here, I include a list of research agendas (or problem overviews, which can inform research agendas) with no added commentary.

General recommendations

Advocacy should emphasize x-risk and misalignment risk

I would like to make two assertions:

  1. AI x-risk is more important than non-existential AI risks.

  2. Advocates should say that.

Given weak longtermism, or even significant credence to weak longtermism on a moral-uncertainty system, x-risks dwarf non-existential AI risks in importance (except perhaps for S-risks, which are a whole can of worms that I won't get into in this section). See Bostrom's Existential Risk Prevention As Global Priority. Risks like "AI causes widespread unemployment" are bad, but given the fact that we have to triage, extinction risks should take priority over them.

(To my knowledge, people advocating for focusing on non-existential AI risks have never provided supporting cost-effectiveness estimates. I don't think such an estimate would give a favorable result. If you strongly discount AI x-risk/longtermism, then most likely you should be focusing on farm animal welfare (or similar), not AI risk.)

Historically, raising concerns about ASI has caused people to take harmful actions like:

  1. I need to be the one who builds ASI before anyone else, I think I'll start a new frontier AI company.

  2. AI is a big deal, so we need to race China.

I don't have a straightforward solution to this. You can't reduce x-risk by doing nothing, but if you do something, there's a risk that it backfires.

My best answer is that advocacy should emphasize misalignment risk and extinction risk. Many harmful actions were committed with the premise "TAI is dangerous if someone else builds it, but safe if I build it." When in fact it is dangerous, no matter who builds it. "If anyone builds it, everyone dies" is more the correct sort of message.

Misalignment isn't the only way AI could cause extinction, although it does seem to be the most likely way. I believe advocacy should focus on misalignment risk not only because it's the most concerning risk, but also it has historically been under-emphasized in favor of other risks (if you read Congressional testimonies by AI risk orgs, they mention other risks but rarely mention misalignment risk), and it is (in my estimation) less likely to backfire.

Many advocates are concerned that x-risk and misalignment risk sound too "out there". Two reasons why I believe advocates should talk about them:

  1. All else equal, it's better to say what you believe and ask for what you want. It's too easy to come up with galaxy-brained reasons why you will get what you want by not talking about what you want.

  2. Emphasizing the less important risks is more likely to backfire by increasing x-risk.

  3. There is precedent for talking about x-risk without being seen as too weird, for example, the CAIS Statement on AI Risk. If you're worried about x-risk, you're in good company.

Nate Soares says more about this in A case for courage, when speaking of AI danger.

Prioritize work that pays off if timelines are short

There is a strong possibility (25–75% chance) of transformative AI within 5 years. 80,000 Hours reviews forecasts and predicts AGI by 2030; Metaculus predicts 50% chance of AGI by 2032; AI company CEOs have predicted 2025–2035 (see Appendix, When do AI company CEOs expect advanced AI to arrive?); AI 2027 team predicts 2030ish (their scenario has AGI arriving in 2028, but that's their modal prediction, not median).

The large majority of today's AI safety efforts work best if timelines are long (2+ decades). Short-timelines work is neglected. It would be neglected even if there were only (say) a 10% chance of short timelines, but the probability is higher than that.

For example, that means there should be less academia-style long-horizon research, and more focus on activities that have a good chance of bearing fruit quickly.[4]

Top project ideas

After collecting a list of project ideas, I identified four that look particularly promising (at least given the limited scope of my investigation). This section presents the four ideas in no particular order.

Talk to policy-makers about AI x-risk

The way to get x-risk-reducing regulations passed is to get policy-makers on board with the idea. The way to get them on board is to talk to them. Therefore, we should talk to them.

Talking to them may entail advocating for specific legislative proposals, or it may just entail raising general concern for AI x-risk.

Theory of change:

Increases the chance that safety legislation gets passed or regulations get put in place. Likely also increases the chance of an international treaty.

Who's working on it?

Center for AI Safety / CAIS Action Fund (US); Control AI (UK/US); Encode AI (US/global); Good Ancestors (Australia); Machine Intelligence Research Institute (US/global); Palisade Research (US); PauseAI US (US).

(That's not as many as it sounds like because some of these orgs have one or fewer full-time-employee-equivalents talking to policy-makers.)

Various other groups do political advocacy on AI risk, but mainly on sub-existential risks. The list above only includes orgs that I know have done advocacy on existential risk specifically.

Pros:

Cons:

Some comments on political advocacy:

Write AI x-risk legislation

AI policy research is relatively well-funded, but little work has been done to convert the results of this research into fully fleshed-out bills. Writers can learn from AI policy researchers what sorts of regulation might work, and learn from advocates what regulations they want and what they expect policy-makers to support.

This work also requires prioritizing which policy proposals look most promising and converting those into draft legislation. I think of that as part of the same work, but it could also be separate—for example, a team of AI safety researchers could prioritize policy proposals and sketch out legislation, and a separate team of legal experts (who don't even necessarily need to know anything about AI) can convert those sketches into usable text.

Theory of change:

Many policy-makers care about AI risk and would support legislation, but there's a big difference between "would support legislation" and "would personally draft legislation". To get AI legislation passed, it helps if the legislation is already written. Instead of telling policy-makers

AI x-risk is a big deal, please write some legislation.

it's a much easier ask if you can say

AI x-risk is a big deal, here is a bill that I already wrote, would you be interested in sponsoring it?

Who's working on it?

Center for AI Policy (now mostly defunct); Center for AI Safety / CAIS Action Fund; Encode AI.

Others may be working on it as well, since legislation often doesn't get published. I spoke to someone who has been involved in writing AI risk legislation, and they said that few people are working on this, so I don't think the full list is much longer than the names I have.

(My contact also said that they wished more people were writing legislation.)

See my notes for a list of AI safety bills that have been introduced in the US, UK, and EU. Most of those bills were written by legislators, not by nonprofits, as far as I can tell.

Pros:

Cons:

Advocate to change AI training to make LLMs more animal-friendly

LLMs undergo post-training to make their outputs satisfy AI companies' criteria. For example, Anthropic post-trains its models to be "helpful, honest, and harmless". AI companies could use the same process to make LLMs give regard to animal welfare.

Animal advocates could use a few strategies to make this happen, for example:

Theory of change:

Insofar as the current alignment paradigm works at aligning AIs to human preferences, incorporating animal welfare into post-training would align LLMs to animal welfare in the same way.

Who's working on it?

Compassion in Machine Learning (CaML); Sentient Futures. (They mainly do research to develop animal-friendliness benchmarks and other related projects, but they have also worked with AI companies.)

For some useful background, see Road to AnimalHarmBench by Artūrs Kaņepājs and Constance Li.

Pros:

Cons:

Develop new plans / evaluate existing plans to improve post-TAI animal welfare

Some people have proposed plans for making TAI go well for animals, but I have reservations:

(Lizka Vaintrob and Ben West in A shallow review of what transformative AI means for animal welfare raised essentially the same reservations. See their article for more detailed reasoning on this topic.)

In light of these reservations, I would like to see research on post-TAI animal welfare interventions that look good (1) given short timelines and (2) without having to make strong predictions about what the future will look like for animals (e.g. without assuming that factory farming will exist).

Since I have pressed the importance of short timelines, I'm not envisioning a long-term research project. I expect it would be possible to come up with useful results in 3–6 months (maybe even less). The research should be laser-focused on finding near-term actions that take a few years at most, but still have a good chance of making the post-TAI future better for animals.

(I identified Advocate to change AI training to make LLMs more animal-friendly as potentially a top intervention after about a week of research, although to be fair, that's mostly because I talked to other people who had done more research than me.)

Theory of change:

AI-for-animals interventions are underexplored. I expect that a few months of well-targeted research could turn up useful information about how to make AI go well for animals.

Who's working on it?

Center on Long-Term Risk, Sentience Institute, and some individuals have written project proposals on AI-for-animals, but they almost always hinge on AI timelines being long. The closest thing I'm aware of is Max Taylor's Bringing about animal-inclusive AI, which does include short-timelines proposals, but they are not directly actionable. For example, one idea is "representation of animals in AI decision-making", which is an action an AI company could take, but AI companies are not the relevant actors. An actionable project would be something like "a nonprofit uses its connections at an AI company to persuade/pressure the company to include representation of animals in its AI decision-making".

NYU Center for Mind, Ethics, and Policy and Sentient Futures have done similar work, but nothing exactly like this project proposal. I expect they could do a good job at identifying/prioritizing AI-for-animals interventions that fit my criteria. I would be excited to see a follow-up to Max Taylor's Bringing about animal-inclusive AI focused on converting his ideas into actionable projects.

Pros:

Cons:

Honorable mentions

Directly push for an international AI treaty

The best kind of AI regulation is the kind that every country agrees to (or at least every country that has near-frontier AI technology).

If we need an international treaty to ensure that nobody builds a misaligned AI, then an obvious thing to do is to talk directly to national leaders about how we need an international treaty.

Theory of change:

An internationally-agreed moratorium on advanced AI would straightforwardly prevent advanced AI from killing everyone or otherwise destroying most of the value of the future.

One way to get an international treaty is to talk to governments and tell them you think they should sign an international treaty.

Who's working on it?

Global AI Risks Initiative. There are other orgs that are doing work with the ultimate goal of an international treaty, but to my knowledge, they're not directly pushing for a treaty. Those other orgs include: Future of Life Institute; Machine Intelligence Research Institute; PauseAI Global; PauseAI US; and Safe AI Forum.

Pros:

Cons:

This might be one of my top ideas if I knew how to do it, and I wish I could put it on my top-ideas list, but I don't know how to do it.

Organize a voluntary commitment by AI scientists not to build advanced AI

I heard this idea from Toby Ord on the 80,00 Hours podcast #219. He said,

If AI kills us and we end up standing front of St. Peter, and he asks, "Well did you try a voluntary agreement not to build it?" And we said, "No, we thought it wouldn't work", that's not a good look for us.

At the 1975 Asilomar Conference, the international community of biologists voluntarily agreed not to conduct dangerous experiments on recombinant DNA. Perhaps something similar could work for TAI. The dangers of advanced AI are widely recognized among top AI researchers; it may be possible to organize an agreement not to work on powerful AI systems.

(There are some details to be worked out as to exactly what sort of work qualifies as dangerous. As with my stance on what kinds of policies would be helpful, I believe there are many agreements we could reach that would be better than the status quo. I expect leading AI researchers can collectively work out an operationalization that's better than nothing.)

Theory of change:

If leading AI scientists all agree not to build advanced AI, then it does not get built. The question is whether a non-binding commitment will work. There have been similar successes in the past, especially in genetics with voluntary moratoriums on human cloning and human genetic engineering.

Who's working on it?

Toby Ord has raised this idea. According to a personal communication, he did some research on its plausibility, but he is not actively working on it.

The CAIS Statement on AI Risk and FLI Pause Letter are related but weaker.

FLI organized the 2017 Asilomar Conference on Beneficial AI, but to my knowledge, the goal was not to make any commitments regarding AI safety.

Pros:

Cons:

Peaceful protests

Organize peaceful protests to raise public concern and salience regarding AI risk. Historically, protests have asked for a pause on AI development, although that might not be the only reasonable ask.

(I don't have any other specific asks in mind. The advantage of "pause" is that it's a simple message that fits on a picket sign.)

Theory of change:

Protests may increase public support and salience via reaching people in person or via media (news reporting, etc.). They may also provide a signal to policy-makers about what their constituents want.

Who's working on it?

PauseAI Global; PauseAI US.

Stop AI organizes disruptive protests (e.g. blockading AI company offices), and the evidence is ambiguous as to whether disruptive protests work. See "When Are Social Protests Effective?" (Shuman et al. 2024), although I should note that I think the authors overstate the strength of evidence for their claims—see my notes on the paper.

Pros:

Cons:

Media about dangers of AI

Create media explaining why AI x-risk is a big deal and what we should do about it.

Things like:

Theory of change:

Media increase public concern, which makes policy-makers more likely to put good regulations in place.

Who's working on it?

80,000 Hours; AI Futures Project; AI Safety and Governance Fund; AI Safety Info; Centre pour la Sécurité de l'IA (CeSIA); CivAI; Existential Risk Observatory; Machine Intelligence Research Institute; Midas Project; various media projects by individual people.

Pros:

Cons:

Message testing

Many people have strong opinions about the correct way to communicate AI safety to a non-technical audience, but people's hypotheses have largely not been tested. A project could make a systematic attempt to compare different messages and survey listeners to assess effectiveness.

Theory of change:

We can't ultimately get good AI safety outcomes unless we communicate the importance of the problem, and having data on message effectiveness will help with that.

Who's working on it?

AI Safety and Governance Fund has an ongoing project (see Manifund) to test AI risk messages via online ads. The project is currently moving slowly due to lack of funding.

Some advocacy orgs have done small-scale message testing on their own materials.

Pros:

Cons:

Host a website for discussion of AI safety and other important issues

LessWrong and the Effective Altruism Forum are upstream of a large quantity of work (in AI safety as well as other EA cause areas). It is valuable that these websites continue to exist, and that moderators and web developers continue to work to preserve/improve the quality of discussion.

Realistically, it doesn't make sense to start a new discussion forum, so this idea amounts to "fund/support LessWrong and/or the EA Forum".

Theory of change:

On Lightcone Infrastructure's Manifund, Oliver Habryka lists some concrete outcomes that are attributable to the existence of LessWrong. You could probably find similar evidence of impact for the EA Forum.

Who's working on it?

Centre for Effective Altruism; Lightcone Infrastructure.

Pros:

Cons:

A toy model: Suppose that

  1. There are 10 categories of AI safety work.

  2. LessWrong makes each of them 20% better.

  3. The average AI safety work produces 1 utility point.

  4. Well-directed AI policy produces 5 utility points.

Then marginal work on LessWrong is worth 2 utility points, and my favorite AI policy orgs are worth 5 points.

List of other project ideas

A project not being a top idea doesn't mean it's bad. In fact, it's likely that at least one or two of these ideas should be on my top-ideas list; I just don't know which ones.

I sourced ideas from:

  1. reviewing other lists of project ideas and filtering for the relevant ones;

  2. looking at what existing orgs are working on;

  3. writing down any (sufficiently broad) idea I came across over the last ~6 months;

  4. writing down any idea I thought of.

Ideas are roughly ordered from broadest to most specific.

AI-for-animals ideas

Neartermist animal advocacy

There are various projects to improve current conditions for animals, particularly farm animals: cage-free campaigns, humane slaughter, vegetarian activism, etc. I will lump all these projects together for the purposes of this report.

Theory of change:

Animal advocacy increases concern for animals, which likely has positive flow-through effects into the future, by affecting future generations or by shaping the values of the transformative AI that will control the future.

Who's working on it?

Too many to list.

Pros:

Cons:

Using TAI to improve farm animal welfare

I'm concerned about how TAI could negatively impact non-human welfare. There are some proposals on how TAI could negatively impact farm animals (e.g. by making factory farming more efficient), and on how animal activists could use TAI to make their activism more effective. I will take these proposals as a broad category rather than discussing them individually.

Theory of change:

Depends on the specific proposal.

Who's working on it?

Electric Sheep; Hive (see Transformative AI and Animals: Animal Advocacy Under A Post-Work Society); Open Paws; Wild Animal Initiative (see Transformative AI and wild animals: An exploration).

Pros:

Cons:

Although two of my top ideas relate to post-TAI animal welfare (Develop new plans / evaluate existing plans to improve post-TAI animal welfare and Advocate to change AI training to make LLMs more animal-friendly), I don't think it's worth focusing on farm animal welfare in particular.

Lobby governments to include animal welfare in AI regulations

If governments put safety restrictions on advanced AI, they could also create rules about animal welfare.

Theory of change:

Getting regulations in place would force companies' AIs to respect animal welfare.

Pros:

Cons:

Traditional animal advocacy targeted at frontier AI developers

Animal advocacy orgs could use their traditional techniques, but focus on raising concern for animal welfare among AI developers. For example, buy billboards outside AI company offices or use targeted online ads.

Theory of change:

AI developers become more concerned for animal welfare, and they make AI development decisions that improve the likelihood that transformative AI is good for animals.

Pros:

Cons:

Research which alignment strategies are more likely to be good for animals

Some alignment strategies may be better or worse for non-human welfare. For example, I expect CEV would be better than the current paradigm of "teach the LLM to say things that RLHF judges like", which is better than "hard-code (GOFAI-style) whatever moral rules the AI company thinks are correct".

A research project could go more in-depth on which alignment techniques are most likely to be good for animals (or digital minds, etc.).

Theory of change:

Identify promising alignment techniques, in the hope that people use those techniques. There are enough animal-friendly alignment researchers at AI companies that this might happen.

Pros:

Cons:

AI policy/advocacy ideas

Improving US <> China relations / international peace

Theory of change:

US and China (and other countries) need to agree not to build dangerous AI. Generically improving international cooperation, especially between the US and China, increases the chance that nations cooperate on AI (non-)development.

Who's working on it?

Too many to list. Some examples: Asia Society's Center on US–China Relations; Carnegie Endowment for International Peace; Carter Center; National Committee on United States–China Relations; US-China Policy Foundation.

Orgs that work on international cooperation specifically on AI safety (although not necessarily existential risk) include: AI Governance Exchange; Global AI Risks Initiative; Safe AI Forum.

Pros:

Cons:

Talk to international peace orgs about AI

Theory of change:

Most international peace orgs probably aren't aware of how important AI regulation is, and they would likely help develop international treaties on AI if they knew it was important.

Who's working on it?

Nobody that I know of.

Pros and cons are largely the same as Improving US <> China relations / international peace. In addition:

Pros:

Cons:

Increasing government expertise about AI

Talk to policy-makers or create educational materials about how AI works, or help place AI experts in relevant policy roles.

Theory of change:

Policy-makers can do a more effective job of regulating AI if they understand it better.

Pros:

Cons:

Policy/advocacy in China

Take any of my ideas on AI policy/advocacy, and do that in China instead of in the West.

Theory of change:

The high-level argument for policy/advocacy in China is largely the same as the argument for prioritizing AI risk policy/advocacy in general. China is currently the #2 leading country in AI development, so it's important that Chinese AI developers take safety seriously.

Who's working on it?

AI Governance Exchange; Safe AI Forum (sort of); perhaps some Chinese orgs that I'm not familiar with.

Pros:

Cons:

A comment on my state of knowledge:

I don't know much about the current state of AI safety in China, or what sort of advocacy might work. I did not prioritize looking into it because my initial impression is that I would require high confidence before recommending any interventions (due to the cons listed above), and I would be unlikely to achieve the necessary level of confidence in a reasonable amount of time.

Corporate campaigns to advocate for safety

Run public campaigns to advocate for companies to improve safety practices and call out unsafe behavior.

Theory of change:

Companies may improve their behavior in the interest of maintaining a good public image.

Who's working on it?

Midas Project; More Light.

Pros:

Cons:

Overall, this seems similar to political advocacy, but worse.

Develop AI safety/security/evaluation standards

Work inside a company, as a nonprofit, or with a governmental body (NIST, ISO, etc.) to develop AI safety standards.

Theory of change:

Sufficiently well-written standards can define under what conditions a frontier AI is safe, and potentially enforce those conditions.

Who's working on it?

AI Standards Lab; various AI companies; various governmental bodies.

Pros:

Cons:

Slow down Chinese AI development via ordinary foreign policy

Both the "slow down AI" crowd and the "maintain America's lead" crowd agree that it is good for China's AI development to slow down. The American government could accomplish this using foreign policy levers such as:

Theory of change:

On my view, slowing down Chinese AI development is good because it gives the leading AI developers in the United States more room to slow down. It also makes it less likely that a Chinese company develops misaligned TAI (although right now, US companies are more likely to develop TAI first).

Slowing down Chinese AI development looks good on the "maintain America's lead" view, although I believe this view is misguided—making TAI safe is much more important than making one country build it before another.

Who's working on it?

Center for Security and Emerging Technology has written memos recommending similar interventions. I expect there are some other orgs doing similar activities, including orgs that are more concerned about national security than AI risk.

Pros:

Cons:

Whistleblower protection/support

Provide legal support for whistleblowers inside AI companies and assist in publicizing whistleblowers' findings (e.g. setting up press interviews).

Theory of change:

A few ways this could help:

  1. Whistleblowers can force companies to change their unsafe behavior.

  2. The possibility of whistleblowers incentivizes companies to be safe.

  3. Publicizing companies' bad behavior can raise public concern about AI safety.

Who's working on it?

More Light.

Pros:

Cons:

Opinion polling

Run polls to learn public opinion on AI safety.

Theory of change:

Polls can inform policy-makers about what their constituents want. They also inform people working on AI safety about where their views most align with the public, which can help them prioritize.

Who's working on it?

AI Policy Institute; traditional polling agencies (Pew and YouGov have done polls on people's views on AI).

Pros:

Cons:

Help AI company employees improve safety within their companies

Work with people in AI companies (by organizing conferences, peer support, etc.) to help them learn about good safety practices.

Theory of change:

AI company employees can push leadership to implement stronger internal safety standards.

Who's working on it?

AI Leadership Collective.

Pros:

Cons:

Working with AI company employees might be sufficiently high-leverage to make up for my concerns. Answering that question would require going more in-depth, so I will go with my intuition and say it's probably not as cost-effective as my top ideas.

Direct talks with AI companies to make them safer

If you can talk to policy-makers about AI x-risk, then maybe you can also talk to AI company executives about AI safety.

Theory of change:

AI company execs have significant control over the direction of AI. If they started prioritizing safety to a significantly greater extent, they could probably do a lot to decrease x-risk.

Who's working on it?

To my knowledge, nobody is systematically working on this.

Pros:

Cons:

People in relevant positions should push companies to be safer in ways that they can, but I don't see any way to support this intervention as a philanthropist.

Monitor AI companies on safety standards

Track how well each frontier AI company does model risk assessments, security, misuse prevention, and other safety-relevant behaviors.

Theory of change:

Monitoring AI companies can inform policy-makers and the public about the state of company safety. Monitoring could have a similar effect as corporate campaigns, where it pushes companies to be safer, or it could even directly inform corporate campaigns.

Who's working on it?

AI Lab Watch & AI Safety Claims Analysis; Future of Life Institute's AI Safety Index; Midas Project; Safer AI.

Pros:

Cons:

Create a petition or open letter on AI risk

Write a petition raising concern about AI risk or calling for action (such as a six-month pause or an international treaty). Get respected figures (AI experts, etc.) to sign the petition.

Theory of change:

A petition can make it apparent to policy-makers and the public that many people/experts are concerned about AI risk, while also creating common knowledge among concerned people that they are in good company if they speak up about it.

Who's working on it?

aitreaty.org; Center for AI Safety / CAIS Action Fund; Future of Life Institute / FLI Action and Research, Inc.

Pros:

Cons:

My sense is that the existing petitions have been quite helpful, but there isn't clear value in creating more petitions. There may be some specific call to action that a new petition ought to put forward, but I'm not sure what that would be.

Create demonstrations of dangerous AI capabilities

Make AI risk concrete by building concrete demonstrations of how AI can be dangerous.

Theory of change:

Many people don't find it plausible that AI could cause harm; concrete demonstrations may change their minds. It can also serve to make AI risk more visceral.

Who's working on it?

FAR.AI; Palisade Research; some one-off work by others (e.g. Apart Research).

Pros:

Cons:

I'm most optimistic about demonstrations where there is a clear plan for how to use them, e.g. Palisade Research builds its demos specifically to show to policy-makers. I think Palisade is doing a particularly good version of this idea, but for the most part, I think other ideas are better.

Sue OpenAI for violating its nonprofit mission

OpenAI:

Our mission is to ensure that artificial general intelligence […] benefits all of humanity.

OpenAI has more-or-less straightforwardly violated this mission in various ways. Humanity plausibly has grounds to sue OpenAI.

Theory of change:

A lawsuit would change OpenAI's incentives and may force OpenAI to actually put humanity's interest first, depending on how well the lawsuit goes.

Who's working on it?

In 2024, Elon Musk filed a lawsuit against OpenAI on this basis. The lawsuit is set to go to trial in 2026.

Given that there is already an ongoing lawsuit, it may be better to support the existing suit (e.g. by writing an amicus brief or by offering expert testimony) than to start a new one.

Pros:

Cons:

Send people AI safety books

Books are a tried-and-true method of explaining complex ideas. One could mail books on AI risk to Congress people, or staffers, or AI company execs, or other relevant people.

Theory of change:

A book can explain in detail why AI risk is a big deal and thus persuade people that it's a big deal.

Who's working on it?

Pros:

Cons:

This idea might be really good, but it's high-variance. I would not recommend it without significantly investigating the possible downsides first.

AI research ideas

I said I wasn't going to focus on technical research or policy research, but I did incidentally come up with a few under-explored ideas. These are research projects that I'd like to see more work on, although I still believe advocacy is more important.

Research on how to get people to extrapolate

A key psychological mistake: "superintelligent AI has never caused extinction before, therefore it won't happen." Or: "AI is not currently dangerous, therefore it will never be dangerous."

Compare: "Declaring a COVID emergency is silly; there are currently zero cases in San Francisco." (I am slightly embarrassed to say that that is a thought I had in February 2020.)

Relatedly, some people expect there will not be much demand for AI regulation until we see a "warning shot". Perhaps, but I'm concerned we will run into this failure-to-extrapolate phenomenon. AI has already demonstrated alignment failures (Bing Sydney comes to mind; or GPT 4o's absurd sycophancy; or numerous xAI/Grok incidents). But clear examples of misalignment get fixed (because AI is still dumb enough for us to control). So people may draw the lesson that misalignment is fixable, and there may keep being progressively bigger incidents until we finally build an AI powerful enough to kill everyone.

I am concerned that the concept of AI x-risk will never be able to get sufficient attention due to this psychological mistake. Therefore, we need to figure out how to get people to stop making this mistake.

Theory of change:

Many people ignore AI x-risk because of this mistake. If we knew how to get people to extrapolate, we could use that knowledge to improve communication on AI risk.

Who's working on it?

Some academic psychologists have done related research.

Pros:

Cons:

Rather than spending $50 million on psychology research and then $50 million on a variety of psychologically-motivated media projects informed by that research, I would rather just spend $100 million on media projects.

Investigate how to use AI to reduce other x-risks

An important argument against slowing down AI development is that we could use advanced AI to reduce other x-risks (climate change, nuclear war, etc.).

But an aligned AI wouldn't automatically reduce x-risk. It may increase technological risks (e.g. synthetic biology) if offensive capabilities outscale defensive ones. An aligned AI could reduce nuclear risk by improving global coordination, but it's not obvious that it would.

Therefore, it may be worth asking: Are there some paths of AI development that differentially reduce non-AI x-risk?

Theory of change:

Research on how to direct AI may inform efforts by the developers of advanced AI, which may ultimately reduce x-risk.

Who's working on it?

Nobody, to my knowledge.

Pros:

Cons:

My initial thoughts on this line of research:

A short-timelines alignment plan that doesn't rely on bootstrapping

To my knowledge, every major AI alignment plan depends on alignment bootstrapping, i.e., using AI to solve AI alignment. I am skeptical that bootstrapping will work, and even if you think it will probably work (with, say, 90% credence), you should still want a contingency plan.

Write a research agenda for how to solve AI alignment without using bootstrapping.

Theory of change:

If people come up with sufficiently good plans, then we might solve alignment.

Who's working on it?

Peter Gebauer is running a contest for short-timelines AI safety plans, but the plans are allowed to depend on bootstrapping (e.g. Gebauer favorably cites DeepMind's plan).

There are some plans that say something like "stop developing AI until we solve alignment" (ex: MIRI; Narrow Path), which is valid (and I agree), but it's not a technical plan.

The closest thing I've seen is John Wentworth's research agenda, but it specifically invokes the underpants gnome meme, i.e., the plan has a huge hole in the middle.

Pros:

Cons:

Rigorous analysis of the various ways alignment bootstrapping could fail

I'm pessimistic about the prospects of alignment bootstrapping, and I've seen various AI safety researchers express similar skepticism, but I've never seen a rigorous analysis of the concerns with alignment bootstrapping.

Theory of change:

The AI companies with AI safety plans are all expecting bootstrapping to work. A thorough critique could convince AI companies to develop better plans, or create more of a consensus among ML researchers that bootstrapping is inadequate, or convince policy-makers that they need to make AI companies be safer.

Who's working on it?

Nobody, to my knowledge.

Pros:

Cons:

Future work

Pros and cons of slowing down AI development, with numeric credences

I addressed this to some extent, but I could've gone into more detail, and I didn't do any numeric analysis.

I would like to see a more formal model that includes how the tradeoff changes based on considerations like:

Relatedly, how does P(doom) change what actions you're willing to take? I often see people assume that at a P(doom) of, say, 25%, pausing AI development is bad. That seems wrong to me. I believe that at 25% you should be about as aggressive (about pushing for mitigations) as you would be at 95%, although I haven't put in the work to come up with a detailed justification for this position. The basic argument is that x-risk looks very bad on longtermist grounds, and a delay of even (say) 100 years doesn't look like as big a deal.

Quantitative model on AI x-risk vs. other x-risks

There is an argument that we need TAI soon because otherwise we are likely to kill ourselves via some other x-risk. I have a rough idea for how I could build a quantitative model to test under what assumptions this argument works. Building that model wasn't a priority for this report, but I could do it without much additional effort.

The basic elements the model needs are

  1. X-risk from AI

  2. X-risk from other sources

  3. How much we can reduce AI x-risk by delaying development

  4. X-risk from other sources, conditional on TAI

    • For an aligned totalizing TAI singleton that quickly controls the world, x-risk would be ~0.

    • TAI doesn't trivially decrease other x-risks; it could increase x-risk by accelerating technological growth (which means it's easier to build dangerous technology—see The Vulnerable World Hypothesis). The mechanism of decreasing x-risk isn't that TAI is smarter; the mechanism is that it could increase global coordination / centralize the ability to make dangerous technology.

Deeper investigation of the AI arms race situation

Some open questions:

Does slowing down/pausing AI help solve non-alignment problems?

Pausing (or at least slowing down) clearly gives us more time to solve alignment, and has clear downsides in terms of opportunity cost. But other effects are less clear. Pausing may help with some other big problems: animal-inclusive AI; AI welfare; S-risks from conflict; gradual disempowerment; risks from malevolent actors; moral error. There are some arguments for and against pausing being useful for these non-alignment problems; for more on this topic, see Slowing down is a general-purpose solution to every non-alignment problem.

I have never seen an attempt to analyze why pausing AI development might or might not help with non-alignment problems; this seems like an important question.

Some considerations:

Determine when will be the right time to push for strong restrictions on AI (if not now)

A common view: "We should push for strong restrictions on AI, but now is not the right time."

I disagree with this view; I think now is the right time. But suppose it isn't. When will be the right time?

Consider the tradeoff wherein if you do advocacy later, then

  1. the risks of AI will be more apparent;

  2. but there's a greater chance that you're too late to do anything about it.

Is there some inflection point where the first consideration starts outweighing the second?

And how do you account for uncertainty? (Uncertainty means you should do advocacy earlier, because being too late is much worse than being too early.)

I don't think this question is worth trying to answer because I am sufficiently confident that now is the right time. But I think this is an important question from the view that now is too early.

Supplements

I have written two supplements in separate docs:

  1. Appendix: Some miscellaneous topics that weren't quite relevant enough to include in the main text.

  2. List of relevant organizations: A reference list of orgs doing work in AI-for-animals or AI policy/advocacy, with brief descriptions of their activities.


  1. For example, almost every frontier AI company opposed SB-1047; Anthropic supported the bill conditional on amendment, and Elon Musk supported it but xAI did not take any public position. See ChatGPT for a compilation of sources. ↩︎

  2. I asked ChatGPT Deep Research to tally up funding for research vs. policy and it found ~2x as many researchers as policy people and also 2x the budget, although it miscounted some things; most of what it counted as "AI policy" is (1) unrelated to x-risk and (2) policy research, not policy advocacy; and it only included big orgs (e.g. it missed the long tail of independent alignment researchers). So I believe the true ratio is even more skewed than 2:1. ↩︎

  3. According to my research, it's not difficult to find examples of times when UK policy influenced US policy, but it's still unclear to me how strong this effect is. There are also some theoretical arguments for and against the importance of UK AI policy, but I didn't find any of them particularly compelling, so I remain agnostic. ↩︎

  4. I find this state of affairs confusing given how many people profess belief in short timelines. I think part of the reason is that people involved in AI safety tend to be intellectual researcher-types (like me, for example) who are more likely to orient their work toward "what is going to improve the state of knowledge?" rather than "what is likely to pay off in the near future?" ↩︎


Jelle Donders @ 2025-09-25T12:14 (+15)

Key Findings

1. Implementation-focused policy advocacy and communication are neglected

Both AI-safety technical research and policy research only reduce risk when governments and AI companies actually adopt their recommendations. Adoption hinges on clear, persuasive communication and sustained advocacy that turn safety ideas into things like tangibly implementable legal clauses and enforcement mechanisms. Ultimately, that is the job of implementation-focused work including communications: turning promising ideas into action.

Yet funding, talent, and attention remain skewed toward technical research and policy research; quick estimates suggest safety implementation funding is less than 10% of that of all research.


This has appeared to me as a serious bottleneck in the AI safety space for a while now. Does anyone know why this kind of work is rarely funded?

Josh Thorsteinson 🔸 @ 2025-09-28T15:46 (+9)

If leading AI scientists all agree not to build advanced AI, then it does not get built. The question is whether a non-binding commitment will work. There have been similar successes in the past, especially in genetics with voluntary moratoriums on human cloning and human genetic engineering.


It may be worth trying, but I don't think a voluntary moratorium for AI would work. 

The Asilomar Conference was preceded by a seven-month voluntary pause on dangerous DNA splicing research, but I think this would be much harder to do for AI. 

Some differences:
1. The molecular biology community in 1975 was small and mostly concentrated in the US, but the AI research community today is massive and globally dispersed.
2. Despite the pause being effective in the West, it didn't stop the USSR's DNA splicing research in its secret biological weapons program
3. The voluntary DNA splicing pause was only for a few months, and the scientists believed their research would resume after the Asilomar Conference. An effective AI pause would ideally be much longer than that, probably without a defined end date. 
4. FLI already called for a six-month pause in 2023 - it was ignored. 
5. There are far greater incentives for TAI than recombinant DNA research. 

I haven't looked into other historical voluntary moratoriums in depth but I don't think the bottom line would be different. 

Thanks for this post, I think it's great! Just adding my perspective on this part since I've researched this topic before. 

Sean_o_h @ 2025-09-28T09:37 (+9)

I agree especially with the first of these (third is outside my scope of expertise). There's a lot of work that feels chronically underserved and unsupported in 'implementation-focused policy advocacy and communication'.

E.g. standards are correctly mentioned. Participation in standards development processes is an unsexy, frequently tedious and time-consuming processes. The big companies have people whose day jobs is basically sitting in standards bodies like ISO making sure the eventual standards reflect company priorities. AI safety/governance has a few people fitting it in alongside a million other priorities.

e.g. the EU AI office is still struggling to staff the AI Office, one of the most important safety implementation offices out there. e.g. I serve on an OECD AI working group, an international working group with WIC, have served on GPAI, PAI, and regularly advise UN and national governments. You can make a huge difference on these things especially if you have the time to develop proposals/recommendations and follow up between meetings. But I see the same faces from academia & cvil society at all of these - all of us exhausted, trying to fit in as much as we can, alongside research, management, fundraising, teaching+student mentorship + essays (for the academics). 

Some of this is that it takes time as an independent to reach the level of seniority/recognition to be invited to these working groups. But my impression from being on funding review panels that many of the people who are well-placed to do so still have an uphill battle to get funding for themselves or the support they need. It helps if you've had something flashy and obviously impactful (e.g. AI-2027) but there's a ton of behind-the-scenes work (that I think is much harder for funders to assess, and sometimes harder for philanthropists to get excited about) that an ecosystem with a chance of actually steering this properly and acting as the necessary counterweight to commercial pressures needs. Time and regular participation at national government level (a bunch of them!) plus US, EU, UN, OECD, G7/G20/G77, Africa, Belt&Road/ANSO, GPAI, ITU, ISO, things like Singapore SCAI & much more. Great opportunities for funders (including small funders).

SummaryBot @ 2025-09-18T21:38 (+2)

Executive summary: The author reviews the AI safety landscape and argues that neglected areas—especially AI existential-risk (x-risk) policy advocacy and ensuring transformative AI (TAI) goes well for animals—deserve more attention, highlighting four priority projects: engaging policymakers, drafting legislation, making AI training more animal-friendly, and developing short-timeline plans for animal welfare.

Key points:

  1. Technical safety research is comparatively well-funded, while AI x-risk advocacy is neglected; Dickens prioritizes advocacy over research, despite risks of backfire or slowing progress.
  2. Short timelines (25–75% chance of TAI within 5 years) make quick-payoff advocacy more urgent than long-horizon research.
  3. Top recommended projects: (a) talk to policymakers about AI x-risk, (b) draft AI safety legislation, (c) advocate for LLM training that includes animal welfare, and (d) design/evaluate short-timeline animal welfare interventions.
  4. Post-TAI animal welfare may be less critical than human survival but remains cost-effective and underfunded relative to its importance.
  5. Non-alignment issues (digital minds, S-risks, moral error, gradual disempowerment) are highly important but judged intractable under short timelines, so not prioritized here.
  6. General recommendations: advocacy should explicitly emphasize extinction and misalignment risks, prioritize work useful under short timelines, and consider slowing AI development as a cross-cutting solution.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.