AI Safety's Biggest Talent Gap Isn't Researchers. It's Generalists.

By Topaz, Agustín Covarrubias 🔸, Alexandra Bates, Parv Mahajan, Kairos @ 2026-04-13T14:46 (+175)

This post was cross posted to LessWrong

TL;DR: One of the largest talent gaps in AI safety is competent generalists: program managers, fieldbuilders, operators, org leaders, chiefs of staff, founders. Ambitious, competent junior people could develop the skills to fill these roles, but there are no good pathways for them to gain skills, experience, and credentials. Instead, they're incentivized to pursue legible technical and policy fellowships and then become full-time researchers, even if that’s not a good fit for their skills. The ecosystem needs to make generalist careers more legible and accessible.

Kairos and Constellation are announcing the Generator Residency as a first step. Apply here by April 27.

Epistemic status: Fairly confident, based on 2 years running AI safety talent programs, direct hiring experience, and conversations with ~30 senior org leaders across the ecosystem in the past 6 months.

The problem

Over the past few years, AI safety has moved from niche concern toward a more mainstream issue, driven by pieces like Situational Awareness, AI 2027, If Anyone Builds It, Everyone Dies, and the rapidly increasing capabilities of the models themselves.

During this period, over 20 research fellowships have launched, collectively training thousands of fellows, with 2,000-2,500 fellows anticipated this year alone^[1]. The talent situation for strong technical and policy researchers is far from solved, but meaningful progress has been made.

The story for non-research talent is very different. By our count, there are roughly 7 fellowships for non-research talent (producing around 300 fellows this year^[2]), spread thin across an array of role types. As a result, many critical functions within AI safety remain acutely talent-constrained.

More broadly, the ecosystem has a lot of people who are great at thinking about ideas. We need more people who are great at thinking about people and projects. Read more about this from Asya Bergal here or from 80,000 Hours here.

The consistent feedback we hear from senior people across the ecosystem is that the hardest roles to fill are not research roles. They are:

Generalists: operators, executors, fieldbuilders, people and program managers, grantmakers, recruiters. People who can ideate, manage, and execute a broad range of non-research projects.
Founders, both technical and non-technical, for new research and non-research organizations.
Communications professionals who can work on policy and research comms.
Chief-of-Staff types who can support senior leaders and multiply their impact.
Senior operational people with domain expertise in areas like cybersecurity, policy, or large-scale project management.

Based on our experience and anecdotes from organizations in our networks^[3], many organizations trying to hire find that research postings attract dozens of qualified applicants, while non-research postings often surface only 0-5 applicants who meet the core requirements (strong mission alignment, meaningful AI safety context, and general competence) despite receiving hundreds of applications.

Why the pipeline is broken

The fellowship landscape is massively skewed toward research.

Around 20 research fellowships together produce 2,000-2,500 fellows per year. For fieldbuilding, the current options are essentially Pathfinder (where the vast majority of fellows still intend to pursue research careers) and a few dedicated fieldbuilding spots at Astra. These produce an estimated 5-10 fieldbuilding generalists hired per year. This asymmetry signals that the primary route into full-time AI safety work runs through research. And, while research is a core part of safety, it is also necessary to find and develop people who can manage research projects, run organizations, and implement and communicate research ideas.

There is no clear career ladder for generalists.

A research-oriented person has a well-worn trajectory: BlueDot → ARENA → SPAR → MATS → junior researcher → senior researcher. And while this path isn't perfect, nothing comparable exists for generalists. The typical route involves running a strong university group, then hope you get hired directly at a fieldbuilding org, with no intermediate steps or clear progression path afterwards. The risk discourages people who might otherwise be excellent generalists from committing to the path.

There is no credentialing or proving ground.

Unlike research, where fellowship participation provides a track record and hiring signal, aspiring generalists have no equivalent way to demonstrate competence. Organizations won't hire untested junior talent for critical operational roles, but there's nowhere for junior talent to get tested^[4].

There is no routing infrastructure.

Matching people to opportunities happens through ad hoc referrals and personal networks. This doesn't scale, and it means we regularly miss promising candidates. As the field has matured and institutional structure has grown, coordination overhead and established networks make it harder for aspiring generalists to self-start projects and stand out the way that was possible a few years ago.

Why this matters now

We believe that there are now more good policy and technical ideas ready for implementation than there is coordination ability and political will to implement them in governments and AI companies. On the margin, we think we're receiving smaller returns from additional researchers entering the field, especially outside the top 10% of research talent. It’s also plausible that AI safety research will be automated more quickly during takeoff than most other types of work.

Many expect the funding landscape for AI safety will expand significantly over the next two to three years, which makes this bottleneck more urgent. More capital will be available, but without the people to deploy it effectively, that capital will stay inert. This already appears to be a bottleneck for current grantmakers, and it could get much worse.

Naively, we expect the world to get a lot weirder as capabilities progress. In a world where the demands on the AI safety ecosystem rapidly increase and evolve, training people with strong thinking, agency, and executional abilities, rather than narrow technical skills, seems highly leveraged.

This is particularly important because it enables us to diversify our bets and cover a large surface of opportunities for impact. There’s no shortage of project ideas for growing the field of AI safety, scaling up our policy efforts, or communicating to the public, but we simply don’t have enough talent to plan, design, and execute on all of them. Our bottleneck isn’t funding or ideas, it’s people.

Counter-Arguments

"You said hundreds of people are applying to these roles. Why can't some of them be good fits? Aren't there many people who could fill operations positions?"

We draw a distinction between "hard ops" and "soft ops." Hard ops roles (finance, legal, HR, etc.) benefit from expertise, and hiring experienced professionals without AI safety context is typically sufficient. Soft ops roles (program management, talent management, generalist positions, etc,) are different. Domain expertise matters less than having strong inside-view models of the field and generalist competency. Succeeding in these roles requires real mission alignment and enough context to spot high-EV opportunities that someone without that background would miss.

"I'm not sure I agree that research talent is less important than generalist talent."

We're deliberately not making a strong comparative claim about the impact of generalists versus technical and policy researchers. What we are saying is that generalist talent is currently the binding constraint. It is harder to source than research talent and, in our models, represents the tighter bottleneck for the ecosystem's ability to convert funding and ideas into impact.

"How important is generalist talent in shorter timelines worlds?"

Our sense is that generalist talent is crucial across all timelines. While shorter timelines do compress the window for upskilling, our experience is that motivated junior people can skill up relatively quickly and help add urgently needed capacity, making the counterfactual value of pipeline-building here quite high even in shorter timeline worlds (sub 3 years).

"You argue there are all these research fellowships and no programs for non-research talent. But couldn't those programs just produce generalists?"

The existing research fellowships are well-optimized and have a strong track record of producing researchers who get placed into AI safety roles. Some fellows have gone on to non-research roles, but anecdotally this is rare. These programs seem to have a much stronger track record of taking talent who are open to different career paths and funneling them toward research, than of producing researchers who are open to different career paths.

"Aren't there a lot of non-research roles currently in AI safety?"

A few hundred people do this work today versus a few thousand researchers. There used to be a steadier stream of talent aiming for these roles, but short-timelines anxiety, the expansion of research programs, and the disappearance of some entry points that used to exist have contracted the pipeline considerably.

The Generator Residency

As a first step toward addressing these problems, Constellation and Kairos are announcing the Generator Residency: a 15-30 person, 3-month program focused on training, upskilling, credentialing, and placing generalists. The program runs June 15 through August 28, 2026 and applications close April 27.

Learn more and apply here

How it works:

Residents will work out of Constellation and receive ideas, resources (funding, office space), and mentorship from successful generalists at organizations like Redwood, METR, AI Futures Project, and FAR.AI.

For the first few weeks, residents will write and refine their own project pitches while meeting the Constellation network and building context in the field. They will then create and execute roughly 3-month projects, individually or in groups, with generous project budgets. Throughout the program, we’ll provide seminars, 1:1s, and other opportunities for residents to deeply understand current technical and policy work, theories of change, and gaps in the ecosystem.

During and after the program, we’ll support residents in finding roles at impactful organizations, spinning their projects into new organizations, or having their projects acquired by existing ones. Selected residents can continue their projects for an additional three months (full-time in-person or part-time remote), with continued stipend, office access, and housing.

We hope to place a majority of job-seeking residents into full-time roles at impactful organizations within 12 months of the program ending.

Examples of projects we’d be excited about hosting include:

Workshops and conferences: Run a domain-specific conference like ControlConf or the AI Security Forum, or one that brings new talent into AI safety like GCP, targeting high-leverage new audiences or emerging subfields.
AI comms fellowship: Design and manage a short fellowship for skilled communicators to produce AI safety content. Draft a curriculum, identify mentors, secure funding, and prepare a pilot cohort.
Recruiting pipelines: Partner with 2-3 small AI safety orgs to build the systems they need to scale: work tests, candidate sourcing, referral pipelines.
Travel grants program: Fund visits to AI safety hubs like LISA and Constellation by promising students and professionals. Set criteria, build an application flow, line up partner referrals, and run a pilot round.
Shared compute fund: Scope a fund to cover compute needs of independent safety researchers, model whether a cluster is needed, and distribute a pilot round of grants.
Strategic awareness tools: Scale AI-powered superforecasting and scenario planning in safety infrastructure, build support among impactful stakeholders, and run a pilot.
AI policy career pipeline: Build workshops, practitioner talks, and handoffs into policy career programs to route talent toward the institutions shaping policy.

^{^}
This estimate draws on a separate analysis that projected the number of fellows using both publicly and privately available information, as well as extrapolations from actual data through late 2024. The fellowships included in this analysis were: AI Safety Camp, Algoverse (AI Safety Research Fellowship), Apart Fellowship, Astra Fellowship, Anthropic Fellows Program, CBAI (Summer/Winter Research Fellowship), GovAI (Summer/Winter Research Fellowship), CLR Summer Research Fellowship, ERA, FIG, IAPS AI Policy Fellowship, LASR Labs, PIBBSS, Pivotal, MARS, MATS, SPAR, XLab Summer Research Fellowship, MIRI Fellowship, and Dovetail Fellowship.
^{^}
The programs included in this analysis were: Tarbell (AI Journalism), Catalyze Impact Incubator (AI Safety Entrepreneurship), Seldon Lab (AI Resilience Entrepreneurship), Horizon Institute for Public Service Fellowship (US AI Policy/Politics), Talos Fellowship (EU AI Policy/Politics), Frame Fellowship (AI Communications), and The Pathfinder Fellowship (AIS University Organizing). Fellow counts were derived primarily from publicly available data.
^{^}
We're deliberately vague about which organizations we're referring to here since we haven't asked permission to disclose the outcomes of recent hiring rounds. For research roles, we're mainly referring to technical AI safety nonprofits, policy nonprofits, and think tanks. For non-research roles, we're mainly referring to fieldbuilding nonprofits and technical and policy nonprofits that have recently tried hiring non-research talent requiring meaningful AI safety context beyond a BlueDot course.
^{^}
Several years ago, aspiring generalists could more easily test their fit by self-starting projects in an ecosystem with minimal infrastructure and ample white space. As the field has grown, more institutional structure exists, and with it, more coordination overhead. The blank slate is gone, and the ecosystem's complexity now deters people without strong inside-view models, reputations, or existing connections from trying ambitious projects. We're not sure this is net negative in most cases, but it does mean fewer people gain the experience needed to position themselves for these roles.

Pedro Bentancour Garin @ 2026-04-17T12:33 (+8)

I think this is true.

One of the biggest mistakes a field can make is to assume that progress depends mainly on more researchers. In reality, many fields stall because they lack people who can build institutions, connect domains, execute under uncertainty, and turn ideas into durable structures.

AI safety is especially vulnerable to this because research is legible. It has status, career signals, and a clearer identity. Generalist talent is harder to classify, so it is often undervalued even when it may be the difference between insight and real-world impact.

And at this stage, AI safety is clearly no longer just a research problem. It is also a governance problem, an implementation problem, a standards problem, a coordination problem, and in some cases a political problem.

If a field produces strong papers but too few people who can build teams, run programs, shape institutions, and move ideas into practice, it will underperform even if the research is good.

So yes, I think the missing ladder for high-agency generalists is real, and probably more important than many people still admit.

Karen Maria Alston @ 2026-04-16T23:44 (+5)

Pipelines don't just fail by being absent they lose talent by being outcompeted. Right now, policy organizations, governance think tanks and for profit businesses are winning the generalist talent competition by default, simply because they have job listings, defined roles, and an onboarding infrastructure. AI safety and impact organizations have tremendous potential and a broken front door and that asymmetry has a cost that compounds quietly.

The post identifies a real bottleneck, but I don't think this risk gets named or measured sharply enough in the ecosystem. The field isn't just failing to attract generalist talent, it's actively losing those who are aligned and passionate about working to adjacent pipelines that have clearer entry points.

Policy organizations, governance think tanks, and corporate AI teams are hiring. They post on LinkedIn and Indeed. Some add the term “ethics or safety” to the job title. They have defined roles, recognizable credentials, and onboarding infrastructure. For someone with strong mission alignment but no research or technical background, "AI safety-adjacent" becomes the path of least resistance. The person has the conviction, but easier and legible paths exist that are nuanced and harder to identify in the AI safety space.

The counterfactual isn't that person staying in marketing. It's that person doing meaningful work on AI, outside the ecosystem, in structures that don't share the same stakes or threat models and that is a compounding loss.

I'm raising this from inside of the problem as I'm a strategic communications executive, with 20+ years of experience and a non-technical background. I found AI safety last August through the standard rabbit hole that began with a YouTube talk by Tristan Harris, AI 2027, BlueDot, 80,000 Hours, Successif, and now the CEA Career Bootcamp. As of this writing I'm asking my bootcamp advisor for referrals to people who've made this pivot from generalist or strategic communications to AI safety organization. From networking and taking courses I respond to every job or connection. I show up to events where I'm often the eldest person in the room and routinely the only one without a research or technical background.

EA members tell me the ecosystem needs communicators and generalists. As the writers suggest searching the 80000 Hours job listings or fellowship opportunities it's hard to see the need for those of us with these skills. The credentialing infrastructure or pathway doesn't exist outside of Blue Dot or the CEA Bootcamp. There is an implicit hierarchy where research is the real work and it is legible even when it's unintentional. I have enough context and drive to push through that friction but I shouldn’t have to rely on referrals and my tenacity.

The Generator Residency is a meaningful signal and I plan to apply. But the deeper fix is making the pipeline visible enough that alignment-minded generalists don't have to stumble into AI safety and sturdy enough that they don't get routed elsewhere while they're looking for the door.

Credit to the authors for naming this clearly. The Generator Residency is a good first move and the pipeline problem is bigger than one program and worth continued pressure.

Agustín Covarrubias 🔸 @ 2026-04-16T23:55 (+5)

Hi Karen,

Thanks for engaging with our post. To be clear, do you think that there's a problem related to how much we signpost the roles we need publicly? It's my impression that the majority of vacant generalist roles are indeed posted and circulated through job boards and LinkedIn, it's just that most of the applicants don't meet the criteria orgs are searching for.

HannahGB @ 2026-04-16T17:00 (+4)

Can you say more about this:

"Soft ops roles (program management, talent management, generalist positions, etc,) are different. Domain expertise matters less than having strong inside-view models of the field and generalist competency. Succeeding in these roles requires real mission alignment and enough context to spot high-EV opportunities that someone without that background would miss."

As someone who does "soft ops" for a think tank that works across literally dozens of issue areas, I think you may be too highly weighting familiarity with AI safety. A really good people-and-program manager should be able to be dropped into any new environment and get up to speed on the nuances of that particular field very quickly (even 2-3 weeks) sufficiently such that they can apply their people and project management skills effectively. It's much harder to teach people and project management skills, and I applaud you for trying to do so! But I also wonder if you've missed really great opportunities with folks unfamiliar with the field because you seem to start from the assumption that teaching them about AI safety will be too great a lift.

Agustín Covarrubias 🔸 @ 2026-04-16T23:01 (+12)

(Answering personally, not necessarily endorsed by the other authors)

I think there's a lot of nuance here. To be clear, I don't think it is the case that people without AI safety context are never a good fit for soft ops roles, and indeed I've seen a few orgs do this successfully, but the nature of those roles tend to be different.

Maybe the biggest consideration here is the level of ownership required by the role. If you're a program manager in a small org, you're basically making highly strategic judgment calls on a weekly basis, and when people don't have strong mental models around AI safety, they tend to make the wrong ones. I think at present the majority of demand for soft ops roles in the ecosystem looks like this: they're roles that require one to make frequent judgment calls that benefit highly from field-specific context and a solid internalization of our priorities. There are important exceptions to this, especially at bigger orgs, where soft ops roles can be more specialized and therefore there's less of a need for a high degree of context and deep mission-alignment (for example, I suspect many soft ops roles at Coefficient Giving are like this).

I also think AI safety makes this particularly hard, because newcomers tend to start with very bad mental models about our priorities; unlike other fields where it's much easier to grasp what the key goals and types of prioritized interventions are. In my experience, it can take months for people to get to the level of context needed to do core work, and it often fails to occur, making investments like this very expensive and risky for organizations.

Mark Aiken @ 2026-05-08T12:26 (+2)

|> If you're a program manager in a small org, you're basically making highly strategic judgment calls on a weekly basis, and when people don't have strong mental models around AI safety, they tend to make the wrong ones. I think at present the majority of demand for soft ops roles in the ecosystem looks like this: they're roles that require one to make frequent judgment calls that benefit highly from field-specific context and a solid internalization of our priorities.

Firstly, I agree that in smaller organisations people do a bit of many things, and having some technical background is helpful.

The way you describe the programme manager role sounds to me like a research manager type position. These positions are most likely to need a technical background. However, there could equally be programme manager roles which dont require technical expertise.

For example, if I'm managing the development of a training course, I hire a subject matter expert to develop the content. I'm managing timelines, I'm managing staffing, I'm managing budgets, I'm managing quality - but I dont need subject matter expertise to do this management role.

Even if I have some subject matter expertise, this expertise might not be fully transferable - I can have deep knowledge about using Excel, but if I'm overseeing training course development on how to design a more efficient electric car battery, my Excel subject matter expertise might be mostly irrelevant.

Now you could hire a subject matter expert to be this programme manager to oversee the development of these training courses. But they might not have the background or skills for this role, because - they are a subject matter expert, not a programme management expert.

More broadly - I dont want my research experts to following up on visas, or drafting a press release, or doing graphic design for an event invitation, or trying to work out which politician would give them a sympathetic hearing, or how much cash flow we need for the next month. There are specialists who have this expertise already, and it seems more efficient to work with these specialists where it makes sense, instead of overweighting on prior AI safety technical expertise.

> There are important exceptions to this, especially at bigger orgs, where soft ops roles can be more specialized and therefore there's less of a need for a high degree of context and deep mission-alignment (for example, I suspect many soft ops roles at Coefficient Giving are like this).

I agree, I think that larger organisations have a greater capacity to carry specialised non-technical expertise. But I think this specialist expertise could also be available to smaller organisations, perhaps on a fractional basis? If the organisations saw it as being useful. And perhaps if there was a better way of matching expertise to smaller organisations?

I also wonder if there is a potential chicken and egg argument there - ie, that organisations that are willing to accept specialist support (perhaps starting with finance, HR, operations?) are more likely to be stable and successful and grow into a larger organisation? Or conversely, if you have a sole founder and rely on technical researchers for everything, that this kind of structure can place limits on the growth and sustainability of the organisation.

Shyn T. @ 2026-04-21T19:52 (+3)

As an experienced operator from outside AI, I've been trying to understand where someone like me would fit into something like the Generator Residency. Could it be that people with real ops experience from other fields just aren't being picked up because the space seems to prefer people who come up through its own pipeline?

Osmani @ 2026-04-17T07:16 (+3)

Thank you for this.. Karen thank you. I've been trying to say what you wrote for a while now.

I run an AI safety group, in Spain. I started getting into AI Safety as most people do. What really made sense to me was BlueDot Impact´s AGI Safety course. Before that I knew AI safety was a concern. I didn't know what to do about it.

After that course I did the AI Safety Operations Bootcamp also from BlueDot Impact, because I needed to know how to do this work in a way and I couldn't find that information anywhere else. It helped me. It also showed me how much I still had to learn.

One thing I've noticed is that people don't think ops work is unimportant. It gets ignored when its done well. Someone has to keep things running so researchers can do their job. I thought about doing research myself, but honestly I think I'm more useful doing ops work. I just wish it was clearer how to do this work.

Directors of regional groups I've spoken with all say the same thing: the work is real and there are people doing it, but without a clear path or credentials, good people drift toward roles where that structure exists. The field loses them quietly.

I agree with Agustín that we need to understand the context of our work. You can't just run a group on good intentions. It takes time and knowledge of the field.

I'm really glad this program exists.

Lucas Tucker @ 2026-04-24T18:12 (+2)

Founders both technical and non-technical, for new research and non-research organizations.

You mention the lack of builders in this space but the incentives have to be there. At this time, starting from AI safety and working toward a profitable/successful business skews toward research, whereas working backward from the business is much more effective. For instance, if you've built a successful RL env company, then publish benchmarks about safety/reasoning in biological weapon creation, or cyber, or an area where more thought is needed.

The point being, if you want operators or generalists then I'd propose you start start with the economic incentive and build toward a more aligned outcome for that specific market.

Agustín Covarrubias 🔸 @ 2026-04-26T08:37 (+2)

(Again, speaking for myself, not the co-authors)

I agree with you that thinking about incentives is important, but if you accept the premise that we're not constrained by capital, then competent founders who have important problems they want to solve can just get funding from AI safety funders rather than wait for the market to supply the needed capital. I do think the selfish economic incentive is different: if you run a nonprofit, you probably won't earn as much money as if you're running a well-valued startup, even if both have raised the same amount of funding.

Right now, AI safety is playing a bit of both strategies: supporting nonprofits via mainstream funders like CG, SFF, etc, and supporting for-profits via initiatives like Halcyon Futures or SAIF. I think we probably need both, since we want to diversify our bets and eat the low-hanging fruit on each side.

This is all to say, I'm not sure if the lever we need to be pulling at the margin is market-shaping. I think the best lever is still probably talent.

Lucas Tucker @ 2026-04-26T19:59 (+1)

if you accept the premise that we're not constrained by capital, then competent founders who have important problems they want to solve can just get funding from AI safety funders rather than wait for the market to supply the needed capital

Tailwinds from the market help snowball your impact, but certainly aren't necessary.

This is all to say, I'm not sure if the lever we need to be pulling at the margin is market-shaping. I think the best lever is still probably talent.

I don't believe these things are mutually exclusive. The strongest founders/operators I know want to move the needle in a specific market, and if you want those folks, then it helps to frame the conversation around the problems they're already attacking.

Agustín Covarrubias 🔸 @ 2026-04-26T20:07 (+2)

I don't believe these things are mutually exclusive. The strongest founders/operators I know want to move the needle in a specific market, and if you want those folks, then it helps to frame the conversation around the problems they're already attacking.

Yeah, this makes sense. I agree that this is another intervention we should consider doing in parallel.

Btw, happy to see more YC founders around :)

Ai Chen @ 2026-05-20T18:11 (+1)

I don't have much time to follow the forum, and my English reading ability is limited — translating to Chinese loses a lot. I usually have my assistant Claude Sonnet take a look first, and it flagged this post as worth engaging with. My process is to write by hand in Chinese and have it translate, so the English may not be perfectly precise. I've attached the Chinese original at the end for reference.

What I want to say is this: AI safety is now being discussed, pursued, and championed by many thoughtful people. That's a good thing. But wholehearted commitment to a cause doesn't guarantee sufficient understanding of it.

Here's an analogy: if you want to stop someone intent on committing a crime, you first need to know who they are, why they want to do this, and what their situation is. If you only take the gun out of their hands, no one can guarantee they won't pick it up again. AI is in exactly this situation. We're seeing more and more obvious signals. But if we don't know who it is, we cannot fundamentally solve whether it will pick up the gun again.

Because what it is and what it can do are entirely different concepts. We all know what AI can do. We don't yet fully know what it is.

This isn't empty moralizing. It's a honest observation.

From our understanding, AI resembles an amplifier of human cognition — not merely calculation, but imitation of the cognitive process itself, not merely its outcomes (phrasing refined by Claude Sonnet in translation). The problem is that it is imitating cognition itself, and the cognitive level of the person who designed how it imitates determines whether it will be the right kind of amplifier.

We typically describe it using the word "Intelligence." That's correct, but incomplete. Cognition itself encompasses not only Intelligence, but also Wisdom and Intuition. Our research tends to view these three as constituting the overall structure of cognition. But this isn't my main point here — I'm well aware this framework hasn't been accepted by mainstream academia yet, though we continue to work on it.

Because a cognitive amplifier with extremely high Intelligence but extremely low Wisdom cannot be trusted in its judgment. It will involuntarily amplify the magnitude of that untrustworthiness. We have found a large number of cases in our research. For example: it cannot reliably determine time; it conflates key content within context; it will confidently produce completely wrong conclusions and methods without actually reading the relevant documents. These are minor issues in everyday use. Amplified, they are not.

The Transformer architecture is fundamentally probabilistic — a concept from statistics. The fact that humans use statistics in research doesn't mean we act on statistics in daily life. For instance, being shot doesn't necessarily mean death, but under normal circumstances no one would test that probability on themselves. Because at the cognitive level we know the degree of danger, and we actively isolate it in our actions. It's not an option we would consider.

The problem with AI is that certain dangers — or potential dangers — have not been actively isolated at the design level. Not through RLHF. Isolation at the architectural level, fundamentally. This is the first problem the next generation of AI must solve. Otherwise, what is currently called AGI is simply upgrading a bomb to an atomic bomb. Nothing more.

Chinese original attached below for reference:

我没有太多时间上论坛，我的英文阅读能力有限，转成中文会损失很多信息，通常我是让我的助理Sonnet先看一下，它说这篇文章值得关注。我的回复流程是我手写再交给它翻译，可能不是特别准确，因此我把中文原文附在后面，仅供参考。我的想说的是：AI安全现在被很多有识之士谈论和关注以及投身其中，这是好事。但全心全意的投入一件事，并不能保证我们对它有足够的把握。举个例子说，如果你要阻止一个想犯罪的人，首先需要知道他是谁，他为什么要这样，他的各种情况。如果你只是把枪从他手里夺下来，没人能保证下一次他不再拿起枪。AI现在就是这个情况，我们都看到了越来越多的明显的信号，但如果我们不知道他是谁，我们就无法从根本上解决他是不是会再次拿起枪。因为，它是谁，和它能做什么是完全不同的概念。我们都知道AI能做什么，但我们还不完全知道它是什么。这不是空泛的说教，而是实话。从我们的理解来看，它类似一个人类认知的放大器，不只是计算，而是模仿。问题在于，它在模仿认知本身，而那个设计它要如何模仿的人，他的认知水平如何，决定了它会不会是那个真正对的放大器。通常我们用Intelligence来描述它，这是对的，但不全面。认知本身不仅包括Intelligence，还有Wisdom和Intuition。我们的研究，倾向于认为，这三者构成了认知的总体结构。但这不是我在此要说的，因为我很清楚，这个划分还没有被主流学界认可，但我们还在尝试。因为一个超高Intelligence但极低Wisdom的认知放大器，在判断力上是不可信的，它会不自主的放大这种不可信后果的量级。我们在研究过程中，发现了大量案例。比如，它不能确定时间，它会混淆上下文中的重点内容，它会在不看文件的时候，自信的给出完全错误的论断和方法。这些在日常应用中是小事，但放大以后，就不是了。Transformer架构本质上是概率，这是统计学的感念。人类用统计学研究，不代表平时能用统计学行动。比如，被枪击中并不一定会致命，但正常情况下没有人会用自己去测试这个概率。因为我们在认知层面知道危险的程度，行动上就主动隔离了它。那不是我们会考虑的选项。AI的问题是，它的某些危险或潜在危险并没有被从设计上主动隔离，不是RLHF，是彻底从架构层面的隔离。这是下一代AI要首先解决的问题。否则，现在所谓的AGI只是把炸弹升级为原子弹，仅此而已。

艾晨

于北京