The Game Board has been Flipped: Now is a good time to rethink what you’re doing

By LintzA @ 2025-01-28T21:20 (+288)

Introduction

Several developments over the past few months should cause you to re-evaluate what you are doing. These include:

  1. Updates toward short timelines
  2. The Trump presidency
  3. The o1 (inference-time compute scaling) paradigm
  4. Deepseek
  5. Stargate/AI datacenter spending
  6. Increased internal deployment
  7. Absence of AI x-risk/safety considerations in mainstream AI discourse

Taken together, these are enough to render many existing AI governance strategies obsolete (and probably some technical safety strategies too). There's a good chance we're entering crunch time and that should absolutely affect your theory of change and what you plan to work on.

In this piece I try to give a quick summary of these developments and think through the broader implications these have for AI safety. At the end of the piece I give some quick initial thoughts on how these developments affect what safety-concerned folks should be prioritizing. These are early days and I expect many of my takes will shift, look forward to discussing in the comments! 

Implications of recent developments

Updates toward short timelines

There’s general agreement that timelines are likely to be far shorter than most expected. Both Sam Altman and Dario Amodei have recently said they expect AGI within the next 3 years. Anecdotally, nearly everyone I know or have heard of who was expecting longer timelines has updated significantly toward short timelines (<5 years). E.g. Ajeya’s median estimate is 99% automation of fully-remote jobs in roughly 6-8 years, 5+ years earlier than her 2023 estimate. On a quick look, prediction markets seem to have shifted to short timelines (e.g. Metaculus[1] & Manifold appear to have roughly 2030 median timelines to AGI, though haven’t moved dramatically in recent months).

We’ve consistently seen performance on benchmarks far exceed what most predicted. Most recently, Epoch was surprised to see OpenAI’s o3 model achieve 25% on its Frontier Math dataset (though there’s some controversy). o3 also had surprisingly good performance in coding. In many real-world domains we’re already seeing AI match top experts, they seem poised to exceed them soon. 

With AGI looking so close, it's worth remembering that capabilities are unlikely to stall around human level. We may see far more capable systems potentially very soon (perhaps months, perhaps years) after achieving systems capable of matching or exceeding humans in most important domains. 

While nothing is certain, and there’s certainly potential for groupthink, I believe these bits of evidence should update us toward timelines being shorter.

Tentative implications:

The Trump Presidency

My sense is that many in the AI governance community were preparing for a business-as-usual case and either implicitly expected another Democratic administration or else built plans around it because it seemed more likely to deliver regulations around AI. It’s likely not enough to just tweak these strategies for the new administration - building policy for the Trump administration is a different ball game.

We still don't know whether the Trump administration will take AI risk seriously. During the first days of the administration, we've seen signs on both sides with Trump pushing Stargate but also announcing we may levy up to 100% tariffs on Taiwanese semiconductors. So far Elon Musk has apparently done little to push for action to mitigate AI x-risk (though it’s still possible and could be worth pursuing) and we have few, if any, allies close to the administration. That said, it’s still early and there's nothing  partisan about preventing existential risk from AI (as opposed to, e.g., AI ethics) so I think there’s a reasonable chance we could convince Trump or other influential figures that these risks are worth taking seriously (e.g. Trump made promising comments about ASI recently and seemed concerned in his Logan Paul interview last year).

Tentative implications:

Important caveat: Democrats could still matter a lot if timelines aren’t extremely short or if we have years between AGI & ASI.[4] Dems are reasonably likely to take back control of the House in 2026 (70% odds), somewhat likely to win the presidency in 2028 (50% odds), and there's a possibility of a Democratic Senate (20% odds). That means the AI risk movement should still be careful about increasing polarization or alienating the Left. This is a tricky balance to strike and I’m not sure how to do it. Luckily, the community is not a monolith and, to some extent, some can pursue the long-game while others pursue near-term change.

The o1 paradigm

Alongside scaling up training runs, it appears that inference compute will be key to attaining human-level AI and beyond. Compared to the previous paradigm, compute can be turned directly into capabilities much faster by simply running the models for longer.

Tentative implications:

Deepseek

Deepseek is highly compute efficient and they’ve managed to replicate the o1 paradigm at far lower cost (though not as low as it initially seemed). It seems possible that merely scaling up what they have could yield enormous returns beyond what they already have (though this is unclear).

Deepseek’s methods are, for the most part, open source. That means anyone with a solid base model can now build an impressive reasoner on top of it with barely any additional cost

Tentative implications:

Stargate/AI data center spending

OpenAI and partners intend to invest $100 billion in 2025 and $500 billion over the coming 4 years.[6] Microsoft intends to spend $80 billion on building data centers this year, other companies seem similarly keen to dump money into compute.

The US government has gotten increasingly involved in AI and Sam Altman had a prominent place at Trump’s inauguration. So far, actual government involvement has mostly been in the form of helping companies get through the permitting process quickly. (more detail here)

Tentative implications:

Increased internal deployment

This is more speculative, but I expect we’ll see less and less of what labs are producing and may have less access to the best models. I expect this due to a number of factors including:

Tentative implications:

Absence of AI x-risk/safety considerations in mainstream AI discourse

For a while, after ChatGPT, it looked like AI risk would be a permanent part of the discourse going forward, largely thanks to efforts like the CAIS AI Extinction Letter getting high profile signatories and news coverage. For the past year though, AI x-risk concerns have have not had much airtime in the major media cycles around AI. There haven't been big safety-oriented stories in mainstream outlets in regards to recent AI events with strong implications for AGI timelines and existential risk (e.g. Deepseek, Stargate). A notable example of the AI safety community's lack of ability to affect the media was the decisive loss of the media game during the OpenAI board drama.

That said, we do have more people writing directly about AI safety and governance issues across a variety of Substacks and on Twitter/X now. We’ve also got plenty of prominent people capable of getting into the news if we made a concerted effort to do so (e.g. Yoshua Bengio, Geoff Hinton).

Tentative implications:

Implications for strategic priorities

Broader implications for US-China competition

Recent developments call into question any strategy built on the idea that the US will have a significant lead over China which it could use to e.g. gain a decisive advantage or to slow down and figure out safety. This is because:

Overall, the idea that the US could unilaterally win an AI race and impose constraints on other actors appears less likely now. I suspect this means an international agreement is far more important than we’d thought, though I'm not sure whether I think recent developments make that more or less likely. 

Note: The below takes are far more speculative and I have yet to dive into them in any depth. It still seems useful to give some rough thoughts on what I think looks better and worse given recent developments, but in the interest of getting this out quickly I’ll defer going into more detail until a later post.

What seems less likely to work?

What should people concerned about AI safety do now?

Acknowledgements

Many people commented on an earlier version of this post and were incredibly helpful for refining my views! Thanks especially to Trevor Levin, John Croxton, as well as several others who would rather not be named. Thanks also to everyone who came to a workshop I hosted on this topic!

 

  1. ^

     That market predicted roughly 2040 timelines until early 2023, then dropped down significantly to around 2033 average and is now down to 2030.

  2. ^

     I have an old write-up on this reasoning here which also talks about how to think about tradeoffs between short and long timelines.

  3. ^

     That said, given things could shift dramatically in 2028 (and 2026 to some extent) it could be worth having part of the community focus on the left still.

  4. ^

     E.g. Perhaps we get human-level research AIs in 2027 but don’t see anything truly transformative until 2029.

  5. ^

     See OpenAI’s Pro pricing plan of $200 per month. To the extent frontier models like o3 can be leveraged for alignment or governance work, it’s possible funders should subsidize their use. Another interesting implication is that, to the extent companies and individuals can pay more money to get smarter models/better answers, we could see increased stratification of capabilities which could increase rich-get-richer dynamics.

  6. ^

     Note that ‘intend’ is important here! They do not have the money lined up yet.

  7. ^

     That said, I don’t think we’ll ever know this to the level of confidence we’d need to convince decision-makers that slowing down wouldn’t hand the race to China.


Buck @ 2025-01-28T23:55 (+35)

Tentative implications:

  • People outside of labs are less likely to have access to the very best models and will have less awareness of where the state of the art is.
  • Warning shots are somewhat less likely as highly-advanced models may never be deployed externally.
  • We should expect to know less about where we’re at in terms of AI progress.
  • Working at labs is perhaps more important than ever to improve safety and researchers outside of labs may have little ability to contribute meaningfully.
  • Whistleblowing and reporting requirements could become more important as without them government would have little ability to regulate frontier AI.
  • Any regulation based solely on deployment (which has been quite common) should be adjusted to take into account that the most dangerous models may be used internally long before they're deployed. 

For what it's worth, I think that the last year was an update against many of these claims. Open source models currently seem to be closer to state of the art than they did a year ago or two years ago. Currently, researchers at labs seem mostly in worse positions to do research than researchers outside labs.

I very much agree that regulations should cover internal deployment, though, and I've been discussing risks from internal deployment for years.

quinn @ 2025-01-29T03:07 (+16)

nitpick: you say open source which implies I can read it and rebuild it on my machine. I can't really "read" the weights in this way, I can run it on my machine but I can't compile it without a berjillion chips. "open weight" is the preferred nomenclature, it fits the situation better. 

(epistemic status: a pedantry battle, but this ship has sailed as I can see other commenters are saying open source rather than open weight). 

Davidmanheim @ 2025-01-29T23:16 (+3)

Thank you for fighting the good fight!

zdgroff @ 2025-01-29T13:54 (+15)

What seems less likely to work?

  • Work with the EU and the UK
    • Trump is far less likely to take regulatory inspiration from European countries and generally less likely to regulate. On the other-hand perhaps under a 2028 Dem administration we would see significant attention on EU/UK regulations.
    • The EU/UK are already scaling back the ambitions of their AI regulations out of fear that Trump would retaliate if they put limits on US companies.

 

Interesting—I've had the opposite take for the EU. The low likelihood of regulation in the US seems like it would make EU regulation more important since that might be all there is. (The second point still stands, but it's still unclear how much that retaliation will happen and what impact it will have.)

It depends on aspects of the Brussels' effect, and I guess it could be that a complete absence of US regulation means companies just pull out of the EU in response to regulation there. Maybe recent technical developments make that more likely. On net, I'm still inclined to think these updates increase the importance of EU stuff.

For the UK, I think I'd agree—UK work seems to get a lot of its leverage from the relationship with the US.

Manuel Allgaier @ 2025-01-30T09:06 (+4)

Anthropic released Claude everywhere but the EU first, and their EU release happened only months later, so to some extend labs are already deprioritizing the EU market. I guess this trend would continue? Not sure. 

PabloAMC 🔸 @ 2025-01-30T12:02 (+2)

I think this had to do more with GDPR than the AI act, so the late release in the EU might be a one-off case. Once you figure out how to comply with data collection, it should be straightforward to extend to new models, if they want to.

Manuel Allgaier @ 2025-01-30T14:42 (+4)

I did not say that this was due to the EU AI Act, agree that GDPR seems more likely. I mentioned it as an example of EU regulation leading to an AI Lab delaying their EU launch / deprioritizing the EU. 

Holly Elmore ⏸️ 🔸 @ 2025-01-29T07:11 (+15)

Great piece— great prompt to rethink things and good digests of implications.

If you agree that mass movement building is a priority, check out PauseAI-US.org (I am executive director), or donate here: https://www.zeffy.com/donation-form/donate-to-help-pause-ai

One implication I strongly disagree with is that people should be getting jobs in AI labs. I don’t see you connecting that to actual safety impact, and I sincerely doubt working as a researcher gives you any influence on safety at this point (if it ever did). There is a definite cost to working at a lab, which is capture and NDA-walling. Already so many EAs work at Anthropic that it is shielded from scrutiny within EA, and the attachment to “our player” Anthropic has made it hard for many EAs to do the obvious thing by supporting PauseAI. Put simply: I see no meaningful path to impact on safety working as an AI lab researcher, and I see serious risks to individual and community effectiveness and mission focus.

Ebenezer Dukakis @ 2025-01-30T08:58 (+16)

I see no meaningful path to impact on safety working as an AI lab researcher

This is a very strong statement. I'm not following technical alignment research that closely, but my general sense is that exciting work is being done. I just wrote this comment advertising a line of research which strikes me as particularly promising.

I noticed the other day that the people who are particularly grim about AI alignment also don't seem to be engaging much with contemporary technical alignment research. That missing intersection seems suspicious. I'm interested in any counterexamples that come to mind.

My subjective sense is there's a good chance we lose because all the necessary insights to build aligned AI were lying around, they just didn't get sufficiently developed or implemented. This seems especially true for techniques like gradient routing which would need to be baked in to a big, expensive training run.

(I'm also interested in arguments for why unlearning won't work. I've thought about this a fair amount, and it seems to me that sufficiently good unlearning kind of just oneshots AI safety, as elaborated in the comment I linked.)

Holly Elmore ⏸️ 🔸 @ 2025-01-31T02:36 (+4)

Here’s our crux:

My subjective sense is there's a good chance we lose because all the necessary insights to build aligned AI were lying around, they just didn't get sufficiently developed or implemented.

For both theoretical and empirical reasons, I would assign a probably as low as 5% to there being alignment insights just laying around that could protect us at the superintelligence capabilities level and don’t require us to slow or stop development to implement in time. 

I don’t see a lot of technical safety people engaging in advocacy, either? It’s not like they tried advocacy first and then decided on technical safety. Maybe you should question their epistemology.

 

Holly Elmore ⏸️ 🔸 @ 2025-01-29T07:12 (+12)

What you seem to be hinting at, essentially espionage, may honestly be the best reason to work in a lab. But of course those people need to be willing to break NDAs and there are better ways to get that info than getting a technical safety job.

(Edited to add context for bringing up "espionage" and implications elaborated.)

OllieBase @ 2025-01-29T16:26 (+7)

Already so many EAs work at Anthropic that it is shielded from scrutiny within EA

 

What makes you think this? Zach's post is a clear counterexample here (though comments are friendlier to Anthropic) and I've heard of criticism of the RSPs (though I'm not watching closely). 

Maybe you think there should be much more criticism?

Holly Elmore ⏸️ 🔸 @ 2025-01-29T22:22 (+11)

There should be protests against them (PauseAI US will be protesting them in SF 2/28) and we should all consider them evil for building superintelligence when it is not safe!  Dario is now openly calling for recursive self-improvement. They are the villains-- this is not hard. The fact that you would think Zach's post with "maybe" in the title is scrutiny is evidence of the problem. 

Antoine de Scorraille @ 2025-01-29T18:05 (+6)

The post is an excellent summary of the current situation, yet ignoring the elephant in the room: we need to urge for the Pause! The incoming summit, in France, is the ideal place to do so. That's why I'm calling on you, the AI-worried reader, to join our Paris demonstration on the 10th: https://lu.ma/vo3354ab
Each additional participant would be a tremendous asset, as we are a handful of people.
Assemble!

Joe Rogero @ 2025-01-29T18:35 (+5)

Honestly this writeup did update me somewhat in favor of at least a few competent safety-conscious people working at major labs, if only so the safety movement has some access to what's going on inside the labs if/when secrecy grows. The marginal extra researcher going to Anthropic, though? Probably not. 

Holly Elmore ⏸️ 🔸 @ 2025-01-29T22:25 (+11)

Connect the rest of the dots for me-- how does that researcher's access become community knowledge? How does the community do anything productive with this knowledge? How do you think people working at the labs detracts from other strategies?

Buck @ 2025-01-30T00:53 (+4)

I agree with you that people seem to somewhat overrate getting jobs in AI companies.

However, I do think there's good work to do inside AI companies. Currently, a lot of the quality-adjusted safety research happens inside AI companies. And see here for my rough argument that it's valuable to have safety-minded people inside AI companies at the point where they develop catastrophically dangerous AI.

Holly Elmore ⏸️ 🔸 @ 2025-01-30T07:46 (+8)

What you write there makes sense but it's not free to have people in those positions, as I said. I did a lot of thinking about this when I was working on wild animal welfare. It seems superficially like you could get the right kind of WAW-sympathetic person into agencies like FWS and the EPA and they would be there to, say, nudge the agency in a way no one else cared about to help animals when the time came. I did some interviews and looked into some historical cases and I concluded this is not a good idea. 

  1. The risk of being captured by the values and motivations of the org where they spend most of their daily lives before they have the chance to provide that marginal difference is high. Then that person is lost the Safety cause or converted into further problem. I predict that you'll get one successful Safety sleeper agent in, generously, 10 researchers who go to work at a lab. In that case your strategy is just feeding the labs talent and poisoning the ability of their circles to oppose them.
  2. Even if it's harmless, planting an ideological sleeper agent in firms is generally not the best counterfactual use of the person because their influence in a large org is low. Even relatively high-ranking people frequently have almost no discretion about what happens in the end. AI labs probably have more flexibility than US agencies, but I doubt the principle is that different. 

Therefore I think trying to influence the values and safety of labs by working there is a bad idea that would not be pulled off. 

 

OscarD🔸 @ 2025-01-30T13:50 (+4)

My sense is that of the many EAs who have taken EtG jobs quite a few have remained fairly value-aligned? I don't have any data on this and am just going on vibes, but I would guess significantly more than 10%. Which is some reason to think the same would be the case for AI companies. Though plausibly the finance company's values are only orthogonal to EA, while the AI company's values (or at least plans) might be more directly opposed.

Ebenezer Dukakis @ 2025-01-30T10:30 (+4)

In that case your strategy is just feeding the labs talent and poisoning the ability of their circles to oppose them.

It seems like your model only has such influence going one way. The lab worker will influence their friends, but not the other way around. I think two-way influence is a more accurate model.

Another option is to ask your friends to monitor you so you don't get ideologically captured, and hold an intervention if it seems appropriate.

Holly Elmore ⏸️ 🔸 @ 2025-01-31T02:50 (+4)

I think you, and this community, have no idea how difficult it is to resist value/mission drift in these situations. This is not a friend:friend exchange. It’s a small community of nonprofits and individuals:the most valuable companies in the world. They aren’t just gonna pick up the values of a few researchers by osmosis.


From your other comment it seems like you have already been affected by the lab’s influence via the technical research community. The emphasis on technical solutions only benefits them, and it just so happens that to work on the big models you have to work with them. This is not an open exchange where they have been just as influenced by us. Sam and Dario sure want you and the US government to think they are the right safety approach, though.

Aaron Bergman @ 2025-01-29T21:46 (+2)

I’ll just highlight that it seems particularly crush whether to view such NDAs as covenants or contracts that are not intrinsically immoral to break

It’s not obvious to me that it should be the former, especially when the NDA comes with basically a monetary incentive for not breaking

Holly Elmore ⏸️ 🔸 @ 2025-01-29T22:16 (+2)

If the supposed justification for taking these jobs is so that they can be close to what's going on, and then they never tell and (I predict) get no influence on what the company does, how could this possibly be the right altruistic move?

Joe Rogero @ 2025-01-29T18:32 (+14)

E.g. Ajeya’s median estimate is 99% automation of fully-remote jobs in roughly 6-8 years, 5+ years earlier than her 2023 estimate.

This seems more extreme than the linked comment suggests? I can't find anything in the comment justifying "99% automation of fully-remote jobs". 

Frankly I think we get ASI and everyone dies before we get anything like 99% automation of current remote jobs, due to bureaucratic inertia and slow adoption. Automation of AI research comes first on the jagged frontier. I don't think Ajeya disagrees? 

Josh Thorsteinson @ 2025-01-30T04:16 (+6)

They were referring to this quote, from the linked post: "Median Estimate for when 99% of currently fully remote jobs will be automatable."

Automatable doesn't necessarily imply that the jobs are actually automated. 

OscarD🔸 @ 2025-01-30T13:43 (+4)

The comment that Ajeya is replying to is this one from Ryan, who says his timelines are roughly the geometric mean of Ajeya's and Daniel's original views in the post. That is sqrt(4*13) = 7.2 years from the time of the post, so roughly 6 years from now.

As Josh says, the timelines in the original post were answering the question "Median Estimate for when 99% of currently fully remote jobs will be automatable".

So I think it was a fair summary of Ajeya's comment.

Jobst Heitzig (vodle.it) @ 2025-01-28T21:53 (+9)

I'm confused by your seeming "we vs China" viewpoint - who is this "we" that you are talking about?

yams @ 2025-01-28T22:14 (+3)

I wasn't really reading anything in this post as favoring the US over China in an "us vs them" way, so much as I was reading a general anxiety about multipolar scenarios. Also, very few of us are based in China relative to the US or EU. 'The country with the community of people who have been thinking about this the longest' is probably the country you want the thing to happen in, if it has to happen (somewhere).

fwiw I'm usually very anxious about this us vs them narrative, and didn't really feel this post was playing into it very strongly, other than in the above ways, which are (to me) reasonable.

Jobst Heitzig (vodle.it) @ 2025-01-28T22:55 (+3)

The author is using "we" in several places and maybe not consistently. Sometimes "we" seems to be them and the readers, or them and the EA community, and sometimes it seems to be "the US". Now you are also using an "us" without it being clear (at least to me) who that refers to.

Who do you mean by 'The country with the community of people who have been thinking about this the longest' and what is your positive evidence for the claim that other communities (e.g., certain national intelligence communities) haven't thought about that for at least as long?

yams @ 2025-01-29T00:54 (+4)

I took your comment to have two parts it was critiquing:

  1. The floating use of the word 'we' (which I agree is ambiguous, but basically don't care about).
  2. The 'us vs China' frame as a whole (which has some surface area with the floating antecedent of the word 'we', but is a distinct and less pedantic point).

My response was addressing point 2, not point 1, and this was intentional. I will continue not to engage on point 1, because I don't think it matters. If you're devoted to thinking Lintz is a race-stoking war hawk due to antecedent ambiguity in a quickly-drafted post, and my bid to dissuade you of this was ineffective, that's basically fine with me.

"Who do you mean by 'The country with the community of people who have been thinking about this the longest'"

The US — where ~all the labs are based, where ~all the AI safety research has been written, where Chinese ML engineers have told me directly that, when it comes to AI and ML, they think exclusively in English. Where a plurality of users of this forum live, where the large models that enabled the development of DeepSeek were designed to begin with, where the world's most powerful military resides, where....

Yes, a US-centric view is an important thing to inspect in all contexts, but I don't think it's unreasonable to think that the US is the most important actor in the current setting (could change but probably not), and to (for a US-based person who works on policy... in the US, as Lintz does, speaking to a largely US-based audience) use 'we' to refer to the US.

"What is your positive evidence for the claim that other communities (e.g., certain national intelligence communities) haven't thought about that for at least as long?"

I want to point out that this is unfair, since meeting this burden of proof would require comprehensive privileged knowledge of the activities over the past half century of every intelligence agency on the planet. My guess is you know that I don't have that! 

Things I do know:

  1. People who've been in touch with US intelligence agencies since ~2017 have reported that getting them to take AI seriously has been an uphill battle (not only from an x-risk perspective, but as 'a thing that will happen at all').
  2. US intelligence agencies are currently consulting with senior folks in our sphere, at least occasionally, which indicates that they may lack confident inside-view takes on the situation.
  3. The US is currently dropping the ball relative to its stated goals (i.e. Taiwan tariffs), which indicates that the more informed parts of government either don't have much power or aren't all that informed actually! (probably mostly the former, but I'm sure the latter has weight, too)

Finally: "Positive evidence.... have not" is a construction I would urge you to reconsider — positive evidence for a negative assertion is a notoriously difficult thing to get.

Patrick Hoang @ 2025-01-29T21:42 (+8)

What should EAs not in a position to act under short AI timelines do? You can read my response here but not all of us are working in AI labs nor expect to break in anytime soon.

You also suggested having a short-timeline model to discount things after 5+ years:

Plans relying on work that takes several years should be heavily discounted - e.g. plans involving ASL5-level security of models if that’s 5+ years away

But I wouldn't apply such a huge discount rate if one still believes for a chance of longer AGI timelines. For example, if you believe AGI only has a 25% chance of occurring by 2040, you should discount 15+ year plans only by 25%. The real reason to discount certain long-term plans is because they are not tractable. (i.e. I think executing a five-year career plan is tractable, but ALS5-level security is probably not due to government's slow speed)

IWumbo @ 2025-01-29T21:23 (+7)
  • Some bottlenecks on improving communications:
    • What are we pushing for? E.g. What policies are good to aim for/realistic? What should we push labs to do?
    • What is a general idea we think the public could get behind and which sufficiently tracks AI risk prevention?

 

Big questions! It seems like the first is a question for everyone working on this, not just communications people. I'd be interested to know what comms can offer ahead of these questions being settled. What kind of priming/key understandings are most important to get across to people to make the later convincing easier?

Who is doing this kind of comms work right now and running into these bottlenecks? Through what orgs? Are they the same organizations doing the policy development work?

JuliaHP @ 2025-01-29T15:56 (+6)

Most of the "recent developments" which are supposed to point to an update in the target audience were quite predictable in advance (pretty much all of these things were predicted / assigned substantial probability by many people years in advance). To the extent that these cause an update in the reader, it is worth asking yourself; how could I have thought that faster. And more importantly, once you figure out what models would have let you predict the present, what further do these models predict into the future?

Manuel Allgaier @ 2025-01-30T14:49 (+5)

This is the best summary of recent developments I've seen so far, thanks a lot for writing this up! 

I've shared it with people in my AI Safety / AI Gov network, and we might discuss it in our AI Gov meetup tonight. 

Esben Kran @ 2025-01-29T07:56 (+5)

Great post, thank you for laying out the realities of the situation.

In my view, there are currently three main strategies pursued to solve X-risk:

  1. Slow / pause AI: Regulation, international coordination, and grassroots movements. Examples include UK AISI, EU AI Act, SB1047, METR, demonstrations, and PauseAI.
  2. Superintelligence security: From infrastructure hardening, RSPs, security at labs, and new internet protocols to defense of financial markets, defense against slaughterbots, and civilizational hedging strategies. Examples include UK ARIA, AI control, and some labs.
  3. Hope in AGI: Developing the aligned AGI and hoping it will solve all our problems. Examples include Anthropic and arguably most other AGI labs.

No. (3) seems weirdly overrated in AI safety circles. (1) seems incredibly important now and something radically under-emphasized. And in my eyes, (2) seems like the direction most new technical work should go. I will refer to Anthropic's safety researchers on whether the labs have a plan outside of (3).

Echoing @Buck's point that you now have less need to be inside a lab for model access reasons. And if it's to guide the organization, that has historically been somewhat futile in the face of capitalist incentives.

PabloAMC 🔸 @ 2025-01-29T08:16 (+4)

I don’t think the goal of regulation or evaluations is to slow down AGI development. Rather, the goal of regulation is to standardise minimal safety measures (some AI control, some security etc across labs) and create some incentives for safer AI. With evaluations, you can certainly use them for pausing lobbying, but I think the main goal is to feed in to regulation or control measures.

Esben Kran @ 2025-01-29T17:50 (+1)

The main effect of regulation is to control certain net negative outcomes and hence slowing down negative AGIs. RSPs that require stopping developing at ASL-4 or otherwise are also under the pausing agenda. It might be a question of semantics due to how Pause AI and the Pause AI Letter have become the memetic sink for the term pause AI?

PabloAMC 🔸 @ 2025-01-29T23:12 (+2)

My point is that slowing AI down is often an unwanted side effect, from the regulator perspective. Thus, the main goal is raising the bar for safety practices across developers.

Ebenezer Dukakis @ 2025-01-30T08:08 (+3)

it's possible Trump is better placed to negotiate a binding treaty with China (similar to the idea that 'only Nixon could go to China'), even if it's not clear he'll want to do so.

Some related points:

I suppose one possible approach would be to try to get some sort of back-channel dialogue going, to start drafting a treaty which can be invoked if political momentum appears.

yams @ 2025-01-28T22:41 (+3)

I'd like to say I'm grateful to have read this post, it helped explicate some of my own intuitions, and has drawn my attention to a few major cruxes. I'm asking a bunch of questions because knowing what to do is hard and we should all figure it out together, not because I'm especially confident in the directions the questions point.

Should we take the contents of the section "What should people concerned about AI safety do now?" to be Alex Lintz's ordinal ranking of worthwhile ways to spend time? If so, would you like to argue for this ranking? If not, would you like to provide an ordinal ranking and argue for it?

"The US is unlikely to be able to prevent China (or Russia) from stealing model weights anytime soon given both technological constraints and difficulty/cost of implementing sufficient security protocols."

Do you have a link to a more detailed analysis of this? My guess is there's precedent for the government taking security seriously in relation to some private industry and locking all of their infrastructure down, but maybe this is somewhat different for cyber (and, certainly, more porous than would be ideal, anyway). Is the real way to guarantee cybersecurity just conventional warfare? (yikes)

What's the theory of change behind mass movement building and public-facing comms? We agitate the populace and then the populist leader does something to appease them? Or something else?

You call out the FATE people as explicitly not worth coalition-building with at this time; what about the jobs people (their more right-coded counterpart)? Historically we've been hesitant to ally with either group, since their models of 'how this whole thing goes' are sort of myopic, but that you mention FATE and not the jobs people seems significant.

"It would be nice to know if China would be capable of overtaking the US if we were to slow down progress or if we can safely model them as being a bit behind no matter how fast the US goes."

I think compute is the crux here. Dario was recently talking about OOMs of chips matter, and the 10s of thousand necessary for current DeepSeek models would be difficult to scale to the 100s of thousands or millions that are probably necessary at some point in the chain. (Probably the line of 'reasoning models' descended from ~GPT-4 has worse returns per dollar spent than the line of reasoning models descended from GPT-5, esp. if the next large model is itself descended from these smaller reasoners). [<70 percent confident]

If that's true, then compute is the mote, and export controls/compute governance still get you a lot re: avoiding multipolar scenarios (and so shouldn't be deprioritized, as your post implies).

I'm also not sure about this 'not appealing to the American Left' thing. Like, there's some subset of conservative politicians that are just going to support the thing that enriches their donors (tech billionaires), so we can't Do The Thing* without some amount of bipartisan support, since there are bad-faith actors on both sides actively working against us.

"Developing new policy proposals which fit with the interests of Trump’s faction"

I'd like to point out that there's a middle ground between trying to be non-partisan and convincing people in good faith of the strength of your position (while highlighting the ways in which it synergizes with their pre-existing concerns), and explicitly developing proposals that fit their interests. This latter thing absolutely screws you if the tables turn again (as they did last time when we collaborated with FATE on, i.e., the EO), and the former thing (while more difficult!) is the path to more durable (and bipartisan!) wins.

*whatever that is

Ebenezer Dukakis @ 2025-01-30T08:33 (+2)

With regard to Deepseek, it seems to me that the success of mixture-of-experts could be considered an update towards methods like gradient routing. If you could localize specific kinds of knowledge to specific experts in a reliable way, you could dynamically toggle off / ablate experts with unnecessary dangerous knowledge. E.g. toggle off experts knowledgeable in human psychology so the AI doesn't manipulate you.

I like this approach because if you get it working well, it's a general tool that could help address a lot of different catastrophe stories in a way that seems pretty robust. E.g. to mitigate a malicious AI from gaining root access to its datacenter, ablate knowledge of the OS it's running on. To mitigate sophisticated cooperation between AIs that are supposed to be monitoring one another, ablate knowledge of game theory. Etc. (The broader point is that unlearning seems very generally useful. But the "Expand-Route-Ablate" style approach from the gradient routing paper strikes me as a particularly promising, and could harmonize well with MoE.)

I think a good research goal would be to try to eventually replicate Deepseek's work, except with highly interpretable experts. The idea is to produce a "high-assurance" model which can be ablated so undesired behaviors, like deception, are virtually impossible to jailbreak out of it (since the weights that perform the behavior are inaccessible). I think the gradient routing paper is a good start. To achieve sufficient safety we'll need new methods that are more robust and easier to deploy, which should probably be prototyped on toy problems first.

Aaron Bergman @ 2025-01-29T21:43 (+2)

Here is a kinda naive LLM prompt you may wish to use for inspiration and iterate on:

“List positions of power in the world with the highest ratio of power : difficulty to obtain. Focus only on positions that are basically obtainable by normal arbitrary US citizens and are not illegal or generally considered immoral

I’m interested in positions of unusually high leverage over national or international systems”
 

Vasco Grilo🔸 @ 2025-01-29T23:15 (+1)

Thanks for the post, Lintz. I would be happy to bet 10 k$ against short AI timelines.

yams @ 2025-01-30T02:07 (+5)

Two years ago short timelines to superintelligence meant decades. That you would structure this bet such that it resolves in just a few years is itself evidence that timelines are getting shorter.

That you would propose the bet at even odds also does not gesture toward your confidence.

Finally, what money means after superintelligence is itself hotly debated, especially for worst-case doomers (the people most likely to expect three year timelines to ASI are also the exact people who don't expect to be alive to collect their winnings).

I think it's possible for a betting structure to prove some kind of point about timelines, but this isn't it.

Vasco Grilo🔸 @ 2025-01-30T09:09 (+2)

Thanks, yams.

I am also potentially open to a bet where I transfer money to the person bullish on AI timelines now. I bet Greg Coulbourn 10 k$ this way. However, I would have to trust the person betting with me more than in the case of the bet I linked to above. On this, money being less valuable after superintellgence (including due to supposedly higher risk of death) has the net effect of moving the break-even resolution date forward. As I say in the post I linked to, "We can agree on another resolution date such that the bet is good for you". The resolution date I proposed (end of 2028) was supposed to make the bet just slightly positive for people bullish on AI timelines. However, my views are closer to those of the median expert in 2023, whose median date of full automation was 2073.

CDF of ESPAI survey showing median and central 50% of expert responses.