Concrete projects to prepare for superintelligence

By Forethought, William_MacAskill, finm @ 2026-03-27T20:03 (+46)

This is a linkpost to https://www.forethought.org/research/concrete-projects-in-agi-preparedness

Introduction

There are lots of good, neglected, and pretty concrete projects people could set up to make the transition to superintelligence go better. This document describes some that readers might not have thought much about before. They are ordered roughly by how excited we are about them.[1] Of these, Forethought is actively working on AI character evaluation and space governance, and we are very interested in automating macrostrategy.

Summary

AI character evaluation. Start an independent org to evaluate and stress-test AI character traits (epistemic integrity, prosociality, appropriate refusals), hold developers accountable against their own model specs / constitutions, and suggest and incentivise improvements to the specs.

Automated macrostrategy. Create evaluations and benchmarks, collect human-generated training data, and build scaffolds to improve AI competence at big-picture strategic and philosophical reasoning.

AI security assessment. Start an independent org that evaluates AI models for sabotage and backdoors, and makes recommendations about AI constitutions.

Enabling deals. Start an independent organisation to broker deals with potentially misaligned AI models in order to incentivise early schemers to disclose misalignment and cooperate with alignment efforts.

AI for improving collective epistemics. E.g. build an AI chief of staff that helps users act in line with the better angels of their nature.

AI tools for coordination. Build AI for enabling coordination, like confidential monitoring and verification bots, and negotiation facilitators.

A space governance institute, like a “CSET for space”, both to work on important near-term space issues (e.g. data centres in space) and become a place of expertise for longer-term space governance issues.

Coalition of concerned ML scientists. Create a coalition of ML researchers (like an informal union) who commit to coordinated action (e.g. boycotts, conditions on participation in government projects) if AI developers cross minimal, uncontroversial red lines.

AI character evaluation

AI character[2] is a big deal, affecting most other cause areas.

There’s a lot of work to do on AI character:

In particular, someone could set up an independent organisation to evaluate AIs based on traits like epistemic integrity, prosociality, and behaviour (including appropriate refusals) in very high-stakes cases. It could cross-reference the published model specs with observed behaviours in realistic, stress-testing conditions (e.g. multi-agent dynamics, long conversations with real people), to hold developers accountable. It could also give qualitative reviews of model specs.

Automated macrostrategy

The basic argument is that:

  1. It would be extremely useful to have AI that can do macrostrategy and conceptual reasoning earlier than otherwise — even 3-6 months earlier could be a huge deal. This includes:
    1. Designing governance structures (e.g. rights and institutions for digital beings).
    2. Scoping emerging technological risks.
    3. Generating novel insights necessary to reach a great future (like the idea of acausal trade).
  2. We could potentially make this happen 3-6 months earlier through some combination of:
    1. Creating training data and evals / benchmarks for AI macrostrategy.
    2. Building scaffolds to improve AI macrostrategy performance.
    3. Creating infrastructure to enable AI researchers to build on each other (e.g. an improvement on journals + peer review).
    4. Getting human managers trained in how to get the most juice out of the latest AIs, knowing in advance how to use them.
    5. Being prepared and willing to spend large amounts of money (≫$100m) on inference.

Work on this now could include:

On the last bullet: We think training data and evals could potentially meaningfully improve the prospects for automated macrostrategy when it matters. It’s especially important to find people to work on it with good judgement, and it could be a big lift, so worth starting early.

We’re not sure about the technical details, but it seems like competence and good judgement in philosophy and strategic thinking already do and will continue to lag behind other skills which are cheaper to train. One reason is that ground truth answers are hard to generate, so we might need more examples generated by hand. It’s also less clear whether we can trust the judgement of typical RLHF evaluations, because human competence is also rare. And there just aren’t many examples of great macrostrategic thinking in the training data.

So we should think about collecting training data, evals, and benchmarks (e.g. to train reward models to use to train reasoning models). Oesterheld et al. put together a dataset of rated conceptual arguments based on ratings from thoughtful people. We’d love to see more of that kind of thing, but we’ll note that we’d probably need dozens of times more human evaluations to generate enough data to be meaningfully useful in training itself.

We could imagine an org which tries to collect evaluations or examples from (for example) grad students in fields like philosophy, and constructs benchmarks aimed at separating good reasoning from e.g. sycophancy, mere agreeableness, or avoiding taboo conclusions.

AI security evaluations

AI-enabled concentration of power is a major risk, and there is loads to do. A new organisation (or project within an existing organisation) could:

An organisation with US national security expertise and credibility could be particularly valuable, by emphasising the risk of nation-state sabotage and the importance of AI that’s aligned with the US constitution.

Enabling deals with AIs

We could get into a situation where the newest AIs are misaligned, very capable, but not capable enough to successfully execute a takeover attempt on their own. If we don’t uncover evidence of misalignment, though, successors to these models could succeed in takeover. One solution would be to make a deal with the early scheming models, to incentivise them to disclose their misalignment and help with alignment efforts. Read more here, here, and here.

To make this happen, we could create an independent org focused on enabling credible precommitments and deals with AIs. This org could:

There are also a bunch of other things people could do, like:

Tools for collective epistemics

There’s a ton of low-hanging fruit for building socially useful tools on top of more-or-less existing LLM capabilities.

We’re especially interested in “epistemic tools” for increasing the general level of honesty and reasoning ability in society.

A key point here is that most of the impact from the most promising tools won’t come from helping individual users, but from changing the overall incentive landscape: e.g. if public actors know their claims will be automatically checked and their track records will be visible, they’ll be less inclined to write misleading content in the first place. Hence the focus on tools for collective over individual epistemics.

This piece (and the articles in the series) gives a few concrete ideas. A couple of examples of epistemic tools:

A “better angel” AI chief of staff. Within the next year or two, we expect “AI chiefs of staff” to become widespread. These would be AI agents that manage your life, acting like a chief of staff, executive assistant, and personal and work advisor all in one. The design of these, and how they present information and nudge their users, could have major impacts on user behaviour. We could try to get ahead of this, building the best AI chief of staff, and designing it so that it helps users act in accordance with their more reflective and enlightened preferences.

Reliability tracking: a system that compiles a public actor’s past statements, classifies them (factual claims, predictions, promises), scores them against what actually happened, and aggregates the results into a reliability rating. A reasonable starting point could be to audit the prediction track-record of well-known pundits, aiming to make high accuracy a point of pride, while still celebrating attempts to make predictions in the first place. A source of profit could be selling reliability assessments of corporate statements to finance companies that trade on them.

Epistemic tools for strategic awareness

We’ll also highlight tools for strategic awareness: tools to surface information for making better-informed decisions, and to distribute access to that information. For example:

Ambient superforecasting: a platform which uses the best forecasting models to generate publicly available forecasts on important questions, so users can query it and get back superforecaster-level probability assessments.

Scenario planning: a platform built to generate likely implications of different courses of action, making it easier for users to analyse and choose between them.

Automated open-source intelligence: automated researchers which process huge amounts of publicly available information, to surface insights to the public which are normally hidden behind paywalls or trust networks. This project should be careful to choose areas where open-source intelligence is a public good (e.g. verifying compliance with treaties and sanctions, tracking corporate promise-breaking or law-breaking), rather than potentially destabilising areas (e.g. revealing military capabilities or vulnerabilities in ways that could increase conflict risk, or relatively benefitting bad actors).

Tools for coordination

As well as epistemic tools, we’re excited about tools for coordination, many of which could again be built with existing capabilities.

Some tools could enable cooperation where deals would otherwise go unmade, consensus exists but isn’t discovered, or people with aligned interests never find each other. We’ll highlight:

Negotiation facilitation: a platform to moderate negotiations or discussion between people (e.g. public consultations), to quickly surface key points of consensus, and suggest plans everyone can live with. Finding ways to automate complex negotiation is most promising where the space of possible compromises is huge and hard to search manually, such as multi-issue diplomatic or commercial negotiations.

Within tools for coordination, we’re especially excited about tools for assurance and privacy. In principle, LLMs let people show they have certain information without disclosing the information itself to other parties. This can unlock deals where information asymmetry, mutual distrust, or sensitivity of information normally blocks them. For example:

Confidential monitoring and verification: systems which act as trusted intermediaries, enabling actors to make deals that require sharing highly sensitive information without disclosing it directly. This is especially relevant for arms control, trade secret licensing, and other settings where verification is essential but full disclosure is unacceptable to all parties.

Structured transparency for democratic accountability: independent auditing systems which allow people to hold institutions to account in a fine-grained way without compromising legitimately sensitive information, by processing potentially sensitive information to produce publicly shareable audits.

Space governance institute

Space governance could be a big deal for a few reasons:

There’s also a lot of change happening in the space world at the moment (primarily driven by SpaceX dramatically reducing launch costs), so now is an unusually influential time.

Forethought is currently running a 6-month research fellowship on space governance, with 3 full-time scholars, and 1–2 additional FTEs of support and research, including experts in space law.

Compared to other ideas in this list, we’re much less confident that space governance turns out to be important right now, because space might become relevant only late into an intelligence explosion. The hope is to reach more certainty about some crux-y questions, and get a better sense of concrete action.

One potential practical project is to set up a “CSET for space”: a think tank that analyses the interaction between AI and space (in particular), and, perhaps, advocates in ways that are counter to corporate interests. Total lobbying in the space industry is apparently on the order of $10s of m/year, so even small amounts of investment could go a long way.

Some policy ideas that seem tentatively promising include:

What’s more, this organisation could become the go-to source for excellent non-corporate analysis on space-related policy; which could become increasingly important over the course of the intelligence and industrial explosions.

Coalition of concerned ML scientists

Currently, ML engineers and other technical staff at AI companies: (i) have prosocial motivations, often more than their leadership; (ii) have a lot of leverage over company policy, because they are crucial and hard to replace; (iii) will eventually lose much or most of their leverage after we get to fully automated AI R&D; and (iv) aren’t currently using their leverage as well as they could because, overall, there haven’t been serious efforts at coordination. Probably that’s a missed opportunity.

Someone could create a coalition (like an informal union) of ML researchers, who agree to act en masse when needed, by loudly talking about the idea, setting out the core tenets, and getting commitments to join from influential early people. Doing this all via individual pledges could keep it legally safe from antitrust. The organising body could then:

As well as actually taking actions, the mere existence of the coalition could improve things, just by making the threat of coordinated action salient to the AI companies.

This project would be a good fit for a former ML researcher, perhaps combined with someone with campaign and coalition-building experience. Some next steps on this would be to spec out the plan further, to investigate other examples of formal and informal unions (e.g. Tech Workers Coalition) and how they operate, and to build up a starting seed coalition of researchers. Whoever sets up this project should be careful about how it could backfire, or become less relevant through mission creep.

This article was created by Forethought. Read the original on our website.

  1. ^

    Thanks to Max Dalton, Stefan Torges, and everyone else at Forethought for the background behind this list. Others at Forethought disagree somewhat with what items should be in the top-tier list, as well as prioritisation within that tier.

  2. ^

    Desired propensities for a model, which can be explicitly described or at least gestured towards in a model spec.