My highly personal skepticism braindump on existential risk from artificial intelligence.

By NunoSempere @ 2023-01-23T20:08 (+431)

This is a linkpost to https://nunosempere.com/blog/2023/01/23/my-highly-personal-skepticism-braindump-on-existential-risk/

Summary

This document seeks to outline why I feel uneasy about high existential risk estimates from AGI (e.g., 80% doom by 2070). When I try to verbalize this, I view considerations like 

as real and important. 

I still think that existential risk from AGI is important. But I don’t view it as certain or close to certain, and I think that something is going wrong when people see it as all but assured. 

Discussion of weaknesses

I think that this document was important for me personally to write up. However, I also think that it has some significant weaknesses:

  1. There is some danger in verbalization leading to rationalization.
  2. It alternates controversial points with points that are dead obvious.
  3. It is to a large extent a reaction to my imperfectly digested understanding of a worldview pushed around the ESPR/CFAR/MIRI/LessWrong cluster from 2016-2019, which nobody might hold now.

In response to these weaknesses:

  1. I want to keep in mind that I do want to give weight to my gut feeling, and that I might want to update on a feeling of uneasiness rather than on its accompanying reasonings or rationalizations.
  2. Readers might want to keep in mind that parts of this post may look like a bravery debate. But on the other hand, I've seen that the points which people consider obvious and uncontroversial vary from person to person, so I don’t get the impression that there is that much I can do on my end for the effort that I’m willing to spend.
  3. Readers might want to keep in mind that actual AI safety people and AI safety proponents may hold more nuanced views, and that to a large extent I am arguing against a “Nuño of the past” view.

Despite these flaws, I think that this text was personally important for me to write up, and it might also have some utility to readers.

Uneasiness about chains of reasoning with imperfect concepts

Uneasiness about conjunctiveness

It’s not clear to me how conjunctive AI doom is. Proponents will argue that it is very disjunctive, that there are lot of ways that things could go wrong. I’m not so sure. 

In particular, when you see that a parsimonious decomposition (like Carlsmith’s) tends to generate lower estimates, you can conclude:

  1. That the method is producing a biased result, and trying to account for that
  2. That the topic under discussion is, in itself, conjunctive: that there are several steps that need to  be satisfied. For example, “AI causing a big catastrophe” and “AI causing human exinction given that it has caused a large catastrophe” seem like they are two distinct steps that would need to be modelled separately,

I feel uneasy about only doing 1.) and not doing 2.) I think that the principled answer might be to split some probability into each case. Overall, though, I’d tend to think that AI risk is more conjunctive than it is disjunctive 

I also feel uneasy about the social pressure in my particular social bubble. I think that the social pressure is for me to just accept Nate Soares’ argument here that Carlsmith’s method is biased, rather than to probabilistically incorporate it into my calculations. As in “oh, yes, people know that conjunctive chains of reasoning have been debunked, Nate Soares addressed that in a blogpost saying that they are biased”.

I don’t trust the concepts

My understanding is that MIRI and others’ work started in the 2000s. As such, their understanding of the shape that an AI would take doesn’t particularly resemble current deep learning approaches. 

In particular, I think that many of the initial arguments that I most absorbed were motivated by something like an AIXI (Somolonoff induction + some decision theory). Or, alternatively, by imagining what a very buffed-up Eurisko would look like. This seems to be like a fruitful generative approach which can generate things that could go wrong, rather than demonstrating that something will go wrong, or pointing to failures that we know will happen.

As deep learning attains more and more success, I think that some of the old concerns port over. But I am not sure which ones, to what extent, and in which context. This leads me to reduce some of my probability. Some concerns that apply to a more formidable Eurisko but which may not apply by default to near-term AI systems:

Uneasiness about in-the-limit reasoning

One particular form of argument, or chain of reasoning, goes like:

  1. An arbitrarily intelligent/capable/powerful process would be of great danger to humanity. This implies that there is some point, either at arbitrary intelligence or before it, such that a very intelligent process would start to be and then definitely be a great danger to humanity.
  2. If the field of artificial intelligence continues improving, eventually we will get processes that are first as intelligent/capable/powerful as a single human mind, and then greatly exceed it.
  3. This would be dangerous

The thing is, I agree with that chain of reasoning. But I see it as applying in the limit, and I am much more doubtful about it being used to justify specific dangers in the near future. In particular, I think that dangers that may appear in the long-run may manifest in limited and less dangerous form in earlier on.

I see various attempts to give models of AI timelines as approximate. In particular:

AGI, so what? 

For a given operationalization of AGI, e.g., good enough to be forecasted on, I think that there is some possibility that we will reach such a level of capabilities, and yet that this will not be very impressive or world-changing, even if it would have looked like magic to previous generations. More specifically, it seems plausible that AI will continue to improve without soon reaching high shock levels which exceed humanity’s ability to adapt.

This would be similar to how the industrial revolution was transformative but not that transformative. One possible scenario for this might be a world where we have pretty advanced AI systems, but we have adapted to that, in the same way that we have adapted to electricity, the internet, recommender systems, or social media. Or, in other words, once I concede that AGI could be as transformative as the industrial revolution, I don't have to concede that it would be maximally transformative.

I don’t trust chains of reasoning with imperfect concepts

The concerns in this section, when combined, make me uneasy about chains of reasoning that rely on imperfect concepts. Those chains may be very conjunctive, and they may apply to the behaviour of an in-the-limit-superintelligent system, but they may not be as action-guiding for systems in our near to medium term future.

For an example of the type of problem that I am worried about, but in a different domain, consider Georgism, the idea of deriving all government revenues from a land value tax. From a recent blogpost by David Friedman: “since it is taxing something in perfectly inelastic supply, taxing it does not lead to any inefficient economic decisions. The site value does not depend on any decisions made by its owner, so a tax on it does not distort his decisions, unlike a tax on income or produced goods.”

Now, this reasoning appears to be sound. Many people have been persuaded by it. However, because the concepts are imperfect, there can still be flaws. One possible flaw might be that the land value would have to be measured, and that inefficiency might come from there. Another possible flaw was recently pointed out by David Friedman in the blogpost linked above, which I understand as follows: the land value tax rewards counterfactual improvement, and this leads to predictable inefficiencies because you want to be rewarding Shapley value instead, which is much more difficult to estimate.

I think that these issues are fairly severe when attempting to make predictions for events further in the horizon, e.g., ten, thirty years. The concepts shift like sand under your feet.

Uneasiness about selection effects at the level of arguments

I am uneasy about what I see as selection effects at the level of arguments. I think that there is a small but intelligent community of people who have spent significant time producing some convincing arguments about AGI, but no community which has spent the same amount of effort looking for arguments against.

Here is a neat blogpost by Phil Trammel on this topic.

Here are some excerpts from a casual discussion among Samotsvety Forecasting team members:

The selection effect story seems pretty broadly applicable to me. I'd guess most Christian apologists, Libertarians, Marxists, etc. etc. etc. have a genuine sense of dialectical superiority: "All of these common objections are rebutted in our FAQ, yet our opponents aren't even aware of these devastating objections to their position", etc. etc.

You could throw in bias in evaluation too, but straightforward selection would give this impression even to the fair-minded who happen to end up in this corner of idea space. There are many more 'full time' (e.g.) Christian apologists than anti-apologists, so the balance of argumentative resources (and so apparent balance of reason) will often look slanted.

This doesn't mean the view in question is wrong: back in my misspent youth there were similar resources re, arguing for evolution vs. creationists/ID (https://www.talkorigins.org/). But it does strongly undercut "but actually looking at the merits clearly favours my team" alone as this isn't truth tracking (more relevant would be 'cognitive epidemiology' steers: more informed people tend to gravitate to one side or another, proponents/opponents appear more epistemically able, etc.)


An example for me is Christian theology. In particular, consider Aquinas' five proofs of good (summarized in Wikipedia), or the various ontological arguments. Back in the day, in took me a bit to a) understand what exactly they are saying, and b) understand why they don't go through. The five ways in particular were written to reassure Dominican priests who might be doubting, and in their time they did work for that purpose, because the topic is complex and hard to grasp.


You should be worried about the 'Christian apologist' (or philosophy of religion, etc.) selection effect when those likely to discuss the view are selected for sympathy for it. Concretely, if on acquaintance with the case for AI risk your reflex is 'that's BS, there's no way this is more than 1/million', you probably aren't going to spend lots of time being a dissident in this 'field' versus going off to do something else.

This gets more worrying the more generally epistemically virtuous folks are 'bouncing off': e.g. neuroscientists who think relevant capabilities are beyond the ken of 'just add moar layers', ML Engineers who think progress in the field is more plodding than extraordinary, policy folks who think it will be basically safe by default etc. The point is this distorts the apparent balance of reason - maybe this is like Marxism, or NGDP targetting, or Georgism, or general semantics, perhaps many of which we will recognise were off on the wrong track.

(Or, if you prefer being strictly object-level, it means the strongest case for scepticism is unlikely to be promulgated. If you could pin folks bouncing off down to explain their scepticism, their arguments probably won't be that strong/have good rebuttals from the AI risk crowd. But if you could force them to spend years working on their arguments, maybe their case would be much more competitive with proponent SOTA).


It is general in the sense there is a spectrum from (e.g.) evolutionary biology to (e.g.) Timecube theory, but AI risk is somewhere in the range where it is a significant consideration.

It obviously isn't an infallible one: it would apply to early stage contrarian scientific theories and doesn't track whether or not they are ultimately vindicated. You rightly anticipated the base-rate-y reply I would make.

Garfinkel and Shah still think AI is a very big deal, and identifying them at the sceptical end indicates how far afield from 'elite common sense' (or similar) AI risk discussion is. Likewise I doubt that there are some incentives to by a dissident from this consensus means there isn't a general trend in selection for those more intuitively predisposed to AI concern.

There are some possible counterpoints to this, and other Samotsvety Forecasting team members made those, and that’s fine. But my individual impression is that the selection effects argument packs a whole lot of punch behind it.

One particular dynamic that I’ve seen some gung-ho AI risk people mention is that (paraphrasing): “New people each have their own unique snowflake reasons for rejecting their particular theory of how AI doom will develop. So I can convince each particular person, but only by talking to them individually about their objections.”

So, in illustration, the overall balance could look something like:

Whereas the individual matchup could look something like:

And so you would expect the natural belief dynamics stemming from that type of matchup. 

What you would want to do is to have all the evidence for and against, and then weigh it. 

I also think that there are selection effects around which evidence surfaces on each side, rather than only around which arguments people start out with.

It is interesting that when people move to the Bay area, this is often very “helpful” for them in terms of updating towards higher AI risk. I think that this is a sign that a bunch of social fuckery is going on. In particular, I think it might be the case that Bay area movement leaders identify arguments for shorter timelines and higher probability of x-risk with “the rational”, which produces strong social incentives to be persuaded and to come up with arguments in one direction.

More specifically, I think that “if I isolate people from their normal context, they are more likely to agree with my idiosyncratic beliefs” is a mechanisms that works for many types of beliefs, not just true ones. And more generally, I think that “AI doom is near” and associated beliefs are a memeplex, and I am inclined to discount their specifics.

Miscellanea

Difference between in-argument reasoning and all-things-considered reasoning

I’d also tend to differentiate between the probability that an argument or a model gives, and the all-things considered probability. For example, I might look at Ajeya’s timeline, and I might generate a probability by inputting my curves in its model. But then I would probably add additional uncertainty on top of that model.

My weak impression is that some of the most gung-ho people do not do this.

Methodological uncertainty

It’s unclear whether we can get good accuracy predicting dynamics that may happen across decades. I might be inclined to discount further based on that. One particular uncertainty that I worry about is that we can get “AI will be a big deal and be dangerous”,  but that danger taking a different shape than what we expected.

For this reason, I am more sympathetic to tools other than forecasting for long-term decision-making, e.g., as outlined here

Uncertainty about unknown unknowns

I think that unknown unknowns mostly delay AGI. E.g., covid, nuclear war, and many other things could lead to supply chain disruptions. There are unknown unknowns in the other direction, but the higher one's probability goes, the more unknown unknowns should shift one towards 50%.

Updating on virtue

I think that updating on virtue is a legitimate move. By this I mean to notice how morally or epistemically virtuous someone is, to update based on that about whether their arguments are made in good faith or from a desire to control, and to assign them more or less weight accordingly.

I think that a bunch of people around the CFAR cluster that I was exposed to weren't particularly virtuous and willing to go to great lengths to convince people that AI is important. In particular, I think that isolating people from the normal flow of their lives for extended periods has an unreasonable effectiveness at making them more pliable and receptive to new and weird ideas, whether they are right or wrong. I am a bit freaked out about the extent to which ESPR, a rationality camp for kids in which I participated, did that.

(Brief aside: An ESPR instructor points out that ESPR separated itself from CFAR after 2019, and has been trying to mitigate these factors. I do think that the difference is important, but this post isn't about ESPR in particular but about AI doom skepticism and so will not be taking particular care here.)

Here is a comment from a CFAR cofounder, which has since left the organization, taken from this Facebook comment thread (paragraph divisions added by me):

Question by bystander: Around 3 minutes, you mention that looking back, you don't think CFAR's real drive was _actually_ making people think better. Would be curious to hear you elaborate on what you think the real drive was.

Answer: I'm not going to go into it a ton here. It'll take a bit for me to articulate it in a way that really lands as true to me. But a clear-to-me piece is, CFAR always fetishized the end of the world. It had more to do with injecting people with that narrative and propping itself up as important. 

We did a lot of moral worrying about what "better thinking" even means and whether we're helping our participants do that, and we tried to fulfill our moral duty by collecting information that was kind of related to that, but that information and worrying could never meaningfully touch questions like "Are these workshops worth doing at all?" We would ASK those questions periodically, but they had zero impact on CFAR's overall strategy. 

The actual drive in the background was a lot more like "Keep running workshops that wow people" with an additional (usually consciously (!) hidden) thread about luring people into being scared about AI risk in a very particular way and possibly recruiting them to MIRI-type projects. 

Even from the very beginning CFAR simply COULD NOT be honest about what it was doing or bring anything like a collaborative tone to its participants. We would infantilize them by deciding what they needed to hear and practice basically without talking to them about it or knowing hardly anything about their lives or inner struggles, and we'd organize the workshop and lectures to suppress their inclination to notice this and object. 

That has nothing to do with grounding people in their inner knowing; it's exactly the opposite. But it's a great tool for feeling important and getting validation and coercing manipulable people into donating time and money to a Worthy Cause™ we'd specified ahead of time. Because we're the rational ones, right? 😛 

The switch Anna pushed back in 2016 to CFAR being explicitly about xrisk was in fact a shift to more honesty; it just abysmally failed the is/ought distinction in my opinion. And, CFAR still couldn't quite make the leap to full honest transparency even then. ("Rationality for its own sake for the sake of existential risk" is doublespeak gibberish. Philosophical summersaults won't save the fact that the energy behind a statement like that is more about controlling others' impressions than it is about being goddamned honest about what the desire and intention really is.)

The dynamics at ESPR, a rationality camp I was involved with, were at times a bit more dysfunctional than that, particularly before 2019. For that reason, I am inclined to update downwards. I think that this is a personal update, and I don’t necessarily expect it to generalize.

I think that some of the same considerations that I have about ESPR might also hold for those who have interacted with people seeking to persuade, e.g., mainline CFAR workshops, 80,000 hours career advising calls, ATLAS, or similar. But to be clear I haven't interacted much with those other groups myself and my sense is that CFAR—which organized the iterations of ESPR up to 2019— went off the guardrails but that these other organizations haven't.

Industry vs AI safety community

It’s unclear to me what the views of industry people are. In particular, the question seems a bit confused. I want to get at the independent impression that people get from working with state-of-the-art AI models. But industry people may already be influenced by AI safety community concerns, so it’s unclear how to isolate the independent impression. Doesn’t seem undoable, though. 

Suggested decompositions

The above reasons for skepticism lead me to suggest the following decompositions for my forecasting group, Samotsvety, to use when forecasting AGI and its risks:

Very broad decomposition

I: 

II: 

  1. AI capabilities will continue advancing
  2. The advancement of AI capabilities will lead to social problems
  3. … and eventually to a great catastrophe
  4. … and eventually to human extinction

Are we right about this stuff decomposition

  1. We are right about this AGI stuff 
  2. This AGI stuff implies that AGI will be dangerous
  3. … and it will lead to human extinction

Inside the model/outside the model decomposition

I:

I: 

Implications of skepticism

I view the above as moving me away from certainty that we will get AGI in the short term. For instance, I think that having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect. 

Anecdotally, I recently met someone at an EA social event that a) was uncalibrated, e.g., on Open Philanthropy’s calibration tool, but b) assigned 96% to AGI doom by 2070. Pretty wild stuff.

Ultimately, I’m personally somewhere around 40% for "By 2070, will it become possible and financially feasible to build advanced-power-seeking AI systems?", and somewhere around 10% for doom. I don’t think that the difference matters all that much for practical purposes, but:

  1. I am marginally more concerned about unknown unknowns and other non-AI risks
  2. I would view interventions that increase civilizational robustness (e.g., bunkers) more favourably, because these are a bit more robust to unknown risks and could protect against a wider range or risks
  3. I don’t view AGI soon as particularly likely
  4. I view a stance which “aims to safeguard humanity through the 21st century” as more appealing than “Oh fuck AGI risk”

Conclusion

I’ve tried to outline some factors about why I feel uneasy with high existential risk estimates. I view the most important points as:

  1. Distrust of reasoning chains using fuzzy concepts
  2. Distrust of selection effects at the level of arguments
  3. Distrust of community dynamics

It’s not clear to me whether I have bound myself into a situation in which I can’t update from other people’s object-level arguments. I might well have, and it would lead to me playing in a perhaps-unnecessary hard mode.

If so, I could still update from e.g.:

Lastly, I would loathe it if the same selection effects applied to this document: If I spent a few days putting this document together, it seems easy for the AI safety community to easily put a few cumulative weeks into arguing against this document, just by virtue of being a community.

This is all.

Acknowledgements

I am grateful to the Samotsvety forecasters that have discussed this topic with me, and to Ozzie Gooen for comments and review. The above post doesn't necessarily represent the views of other people at the Quantified Uncertainty Research Institute, which nonetheless supports my research.


MichaelStJules @ 2023-01-30T20:53 (+45)

One of my main high-level hesitations with AI doom and futility arguments is something like this, from Katja Grace:

My weak guess is that there’s a kind of bias at play in AI risk thinking in general, where any force that isn’t zero is taken to be arbitrarily intense. Like, if there is pressure for agents to exist, there will arbitrarily quickly be arbitrarily agentic things. If there is a feedback loop, it will be arbitrarily strong. Here, if stalling AI can’t be forever, then it’s essentially zero time. If a regulation won’t obstruct every dangerous project, then is worthless. Any finite economic disincentive for dangerous AI is nothing in the face of the omnipotent economic incentives for AI. I think this is a bad mental habit: things in the real world often come down to actual finite quantities. This is very possibly an unfair diagnosis. (I’m not going to discuss this later; this is pretty much what I have to say.)

"Omnipotent" is the impression I get from a lot of the characterization of AGI.

Another recent specific example here.

Similarly, I've had the impression that specific AI takeover scenarios don't engage enough with the ways they could fail for the AI. Some are based primarily on nanotech or engineered pathogens, but from what I remember of the presentations and discussions I saw, they don't typically directly address enough of the practical challenges for an AI to actually pull them off, e.g. access to the materials and a sufficiently sophisticated lab/facility with which to produce these things, little or poor verification of the designs before running them through the lab/facility (if done by humans), attempts by humans to defend ourselves (e.g. the military) or hide, ways humans can disrupt power supplies and electronics, and so on. Even if AI takeover scenarios are disjunctive, so are the ways humans can defend ourselves and the ways such takeover attempts could fail, and we have a huge advantage through access to and control over stuff in the outside world, including whatever the AI would "live" on and what powers it. Some of the reasons AI could fail across takeover plans could be common across significant shares of otherwise promising takeover plans, potentially placing a limit on how far an AI can get by considering or trying more and more such plans or more complex plans.

I've seen it argued that it would be futile to try to make the AI more risk-averse (e.g. sharply decreasing marginal returns), but this argument didn't engage with how risks for the AI from human detection and possible shutdown, threats by humans or the opportunity to cooperate/trade with humans would increasingly disincentivize such an AI from taking extreme action the more risk-averse it is.

I've also heard an argument (in private, and not by anyone working at an AI org or otherwise well-known in the community) that AI could take over personal computers and use them, but distributing computations that way seems extremely impractical for computations that run very deep, so there could be important limits on what an AI could do this way.

That being said, I also haven't personally engaged deeply with these arguments or read a lot on the topic, so I may have missed where these issues are addressed, but this is in part because I haven't been impressed by what I have read (among other reasons, like concerns about backfire risks, suffering-focused views and very low probabilities of the typical EA or me in particular making any difference at all).

Ardenlk @ 2023-01-25T10:29 (+34)

[writing in my personal capacity, but asked an 80k colleague if it seemed fine for me to post this]

Thanks a lot for writing this - I agree with a lot of (most of?) of what's here.

One thing I'm a bit unsure of is the extent to which these worries have implications for the beliefs of those of us who are hovering more around 5% x-risk this century from AI, and who are one step removed from the bay area epistemic and social environment you write about. My guess is that they don't have much implication for most of us, because (though what you say is way better articulated) some of this is already naturally getting into people's estimates.

e.g. in my case, basically I think a lot of what you're writing about is sort of why for my all-things-considered beliefs I partly "defer at a discount" to people who know a ton about AI and have high x-risk estimates. Like I take their arguments, find them pretty persuasive, end up at some lower but still middlingly high probability, and then just sort of downgrade everything because of worries like the ones you cite, which I think is part of why I end up near 5%.

This kind of thing does have the problematic effect probably of incentivising the bay area folks to have more and more extreme probabilities - so that, to the extent that they care, quasi-normies like me will end up with a higher probability - closer to the truth in their view - after deferring at a discount.

titotal @ 2023-01-30T11:46 (+13)

The 5% figure seems pretty common, and I think this might also be a symptom of risk inflation. 

There is a huge degree of uncertainty around this topic. The factors involved in any prediction very by many orders of magnitude, so it seems like we should expect the estimates to vary by orders of magnitude as well. So you might get some people saying the odds are 1 in 20, or 1 in 1000, or 1 in a million, and I don't see how any of those estimates can be ruled out as unreasonable. Yet I hardly see anyone giving estimates of 0.1% or 0.001%. 

I think people are using 5% as a stand in for "can't rule it out". Like why did you settle at 1 in 20 instead of 1 in a thousand? 

philipkd @ 2023-08-14T17:26 (+3)

It looks like we landed on the same thought. User Muster the Squirrels quoted your comment in a reply to my comment on ACX. 

NunoSempere @ 2023-01-25T11:33 (+6)

Hey, 

  • Your last point about exaggeration incentives seems like an incentive that could exist, but I don't see it playing out
  • For 80kh itself, considerations such as in this post might apply to career advisors, who have the tricky job of balancing charismatic persuasion with just providing evidence and stepping back when they try to help people make better career decisions.
alexrjl @ 2023-01-25T13:02 (+17)

[context: I'm one of the advisors, and manage some of the others, but am describing my individual attitude below]

FWIW I don't think the balance you indicated is that tricky, and think that conceiving of what I'm doing when I speak to people as 'charismatic persuasion' would be a big mistake for me to make. I try to:

 

  • Say things I think are true, and explain why I think them (both the internal logic and external evidence if it exists) and how confident I am.

     
  • Ask people questions in a way which helps them clarify what they think is true, and which things they are more or less sure of.

     
  • Make tradeoffs (e.g. between a location preference and a desire for a particular job) explicit to people who I think might be missing that they need to make one, but usually not then suggesting which tradeoff to make, but instead that they go and think about it/talk to other people affected by it.

     
  • Encourage people to think through things for themselves, usually suggesting resources which will help them do that/give a useful perspective as well as just saying 'this seems worth you taking time to think about'.

     
  • To the extent that I'm paying attention to how other people perceive me[1],  I'm usually trying to work out how to stop people deferring to me when they shouldn't without running into the "confidence all the way up" issue.
     
  1. ^

    in a work context, that is. I'm unfortunately usually pretty anxious about, and therefore paying a bunch of attention to, whether people are angry/upset with me, though this is getting better, and easy to mostly 'switch off' on calls because the person in front of me takes my full attention. 

RobBensinger @ 2023-01-25T10:16 (+31)

I think writing this sort of thing up is really good; thanks for this, Nuno. :)

I also feel uneasy about the social pressure in my particular social bubble. I think that the social pressure is for me to just accept Nate Soares’ argument here that Carlsmith’s method is biased, rather than to probabilistically incorporate it into my calculations. As in “oh, yes, people know that conjunctive chains of reasoning have been debunked, Nate Soares addressed that in a blogpost saying that they are biased”.

It sounds like your social environment might be conflating four different claims:

  1. "I personally find Nate's arguments for disjunctiveness compelling, so I have relatively high p(doom)."
  2. "Nate's arguments have debunked the idea that AI risk is conjunctive, in the sense that he's given a completely ironclad argument for this that no remotely reasonable person could disagree with."
  3. "Nate and Eliezer have debunked the idea that multiple-stages-style reasoning is generically reliable (e.g., in the absence of very strong prior reasons to think that a class of scenarios is conjunctive / unlikely on priors)."
  4. "Nate has shown that it's unreasonable to treat anything as conjunctive, and that we can therefore take for granted that AI risk is non-conjunctive without even thinking about the real world or the problem structure."

I agree with 1 and 3, but very much disagree with 2 and 4:

If your social environment is promoting 2 and 4, then kudos for spotting this and resisting the pressure to say false stuff.

As deep learning attains more and more success, I think that some of the old concerns port over. But I am not sure which ones, to what extent, and in which context. This leads me to reduce some of my probability.

On net I think the deep learning revolution increases p(doom), mostly because it's a surprisingly opaque and indirect way of building intelligent systems, that gives you relatively few levers to control internal properties of the reasoner you SGDed your way to.

Deep learning also increases the probability of things like "roughly human-level AGIs run around for a few years before we see anything strongly superhuman", but this doesn't affect my p(doom) much because I don't see a specific path for leveraging this to prevent the world from being destroyed when we do reach superintelligence. Not knowing what's going on in your AGI system's brain continues to be a real killer.

 Some concerns that apply to a more formidable Eurisko but which may not apply by default to near-term AI systems:

  • Alien values
  • Maximalist desire for world domination
  • Convergence to a utility function
  • Very competent strategizing, of the “treacherous turn” variety
  • Self-improvement
  • etc.

Some of those points may be true, but they don't update me a ton on p(doom) unless I see a plausible, likely-to-happen-in-real-life path from "nearer-term systems have less-scary properties" to "this prevents us from building the scary systems further down the road".

I think the usual way this "well, GPT-3 isn't that scary" reasoning goes wrong is that it mistakes a reason to have longer timelines ("current systems and their immediate successors seem pretty weak") for a reason to expect humanity to never build AGI at all. It's basically repeating the same structural mistake that caused people to not seriously think about AGI risk in 1980, 1990, 2000, and 2010: "AI today isn't scary, so it seems silly to worry about future AI".

Longer timelines do give us more time to figure out alignment, which lowers p(doom); but you still have to actually do the alignment in order to get out the microhopes, so it matters how much more hopeful you feel with an extra (e.g.) five years of time for humanity to work on the alignment problem.

For a given operationalization of AGI, e.g., good enough to be forecasted on, I think that there is some possibility that we will reach such a level of capabilities, and yet that this will not be very impressive or world-changing, even if it would have looked like magic to previous generations.

If this means, e.g., "AI that can do everything a human physicist or heart surgeon can do, that can't do other sciences and can't outperform humans", then I'd be really, really surprised if we ever see a state of affairs like that.

What, concretely, are some examples of world-saving things you think AI might be able to do 5+ years before world-endangering AGI is possible?

(Or, if you think the pre-danger stuff won't be world-saving, then why does it lower your p(doom)? Just by giving humanity a bit more time to notice the smoke?)

Or, in other words, once I concede that AGI could be as transformative as the industrial revolution, I don't have to concede that it would be maximally transformative.

This is true, but I don't use the industrial revolution as a frame for thinking about AGI. I don't see any reason to think AGI would be similar to the industrial revolution, except insofar as they're both "important technology-mediated things that happened in history". AGI isn't likely to have a similar impact to steam engines because science and reasoning aren't similar capabilities to a steam engine.

no community which has spent the same amount of effort looking for arguments against.

I think there might have been more effort spent seeking arguments against, but it's shallower effort: the kinds of arguments you find when you're trying to win a water-cooler argument or write a pop-sci editorial are different from the kinds of arguments you find when you're working full-time on the thing.

But my individual impression is that the selection effects argument packs a whole lot of punch behind it.

I don't think it has that much punch, partly just because I don't think EAs and rationalists are as unreasonable as Christian apologists (e.g., we're much more inclined to search for flaws in our own arguments). And partly because the field is just pretty old at this stage, and a lot of effort has gone into arguments in both directions by now.

Even if there were 10x as much effort going into "seek out new reasons to worry about AGI" as there were going into "seek out new reasons to relax about AGI", this matters a lot less when you're digging for the 1000th new argument than when you're digging for the 10th. There just aren't as many stones left unturned, and you can look at the available arguments yourself rather than wondering whether there are huge considerations the entire planet is missing.

It is interesting that when people move to the Bay area, this is often very “helpful” for them in terms of updating towards higher AI risk. I think that this is a sign that a bunch of social fuckery is going on.

I think there's some social and psychological fuckery going on, but less than occurs in the opposite direction in the non-EA, non-rationalist, etc. superculture.

Partly I just think that because of my own experience. The idea of AIs rising up to overthrow humanity and kill us all just sounds really weird to me. I think this is inherently a really hard risk to take seriously on an emotional level -- because it's novel, because it sounds like science fiction, because it triggers our "anthropomorphize this" mental modules while violating many of the assumptions that we're used to taking for granted in human-like reasoners.

Sitting with the idea for sufficiently long, properly chewing on it and metabolizing it, considering and analyzing concrete scenarios, and having peers to talk about this stuff with -- all of that makes me feel much more equipped to weigh the arguments for and against the risks without leaning on shallow pattern-matching.

But that's just autobiography; if you had an easy time seriously entertaining AI risk before joining the community, and now you feel "pressured to accept AGI risk" rather than "free to agree or disagree", then by default I assume you're just right about your own experience.

[cont.]

RobBensinger @ 2023-01-25T10:45 (+17)

The actual drive in the background was a lot more like "Keep running workshops that wow people" with an additional (usually consciously (!) hidden) thread about luring people into being scared about AI risk in a very particular way and possibly recruiting them to MIRI-type projects. 

Even from the very beginning CFAR simply COULD NOT be honest about what it was doing or bring anything like a collaborative tone to its participants. We would infantilize them by deciding what they needed to hear and practice basically without talking to them about it or knowing hardly anything about their lives or inner struggles, and we'd organize the workshop and lectures to suppress their inclination to notice this and object. 

That paints a pretty fucked up picture of early-CFAR's dynamics. I've heard a lot of conflicting stories about CFAR in this respect, usually quite vague (and there are nonzero ex-CFAR staffers who I just flatly don't trust to report things accurately). I'd be interested to hear from Anna or other early CFAR staff about whether this matches their impressions of how things went down. It unfortunately sounds to me like a pretty realistic way this sort of thing can play out.

On my view, CFAR workshops can do a lot of useful things even if they aren't magic bullets that instantly make all participants way more rational. E.g.:

  • Community-building is just plain useful, and shared culture and experiences is legit just an excellent way to build it. There's no reason this has to be done in a dishonest way, and I suspect that some of the pressure to misclassify "community-building" stuff as "rationality-enhancing" stuff comes from the fact that the benefits of the former sound squishier.
  • I've been to two CFAR workshops (a decent one in late 2013, and an excellent one in late 2018; neither seemed dishonest or trying-to-wow-me), and IME a lot of the benefit came from techniques, language, and felt affordances to try lots of things in the months and years after the workshop -- along with a bunch of social connections where it was common knowledge "it's normal to try to understand and solve your problems, X Y Z are standard ways to get traction on day-to-day problems, etc.".

The latter also doesn't seem like a thing that requires any lying, and I'm very skeptical of the idea that it's useful-for-epistemics-on-net to encourage workshop participants to fuzz out parts of their models, heavily defer to Mysterious Unquestionable Rationality Experts, treat certain questions as inherently bad to think about or discuss, etc. In my experience, deliberate fuzziness in one narrow part of people's models often leaks out to distort thinking about many other things.

The switch Anna pushed back in 2016 to CFAR being explicitly about xrisk was in fact a shift to more honesty; it just abysmally failed the is/ought distinction in my opinion.

I don't understand this part; how does it fail the is/ought distinction?

"Rationality for its own sake for the sake of existential risk" is doublespeak gibberish.

I think this is awkwardly/cutely phrased, but is contentful (and actually a good idea) rather than being doublespeak.

The way I'd put it (if I'm understanding the idea right) is: "we're trying to teach rationality in a high-integrity, principled way, because we think rationality is useful for existential risk reduction".

"Rationality for its own sake" is a different sort of optimization target than "rationality when it feels locally useful for an extrinsic goal", among other things because it's easy to be miscalibrated about when rationality is more or less useful instrumentally. Going all-in on rationality (and not trying to give yourself wiggle room to be more or less rational at different times) can be a good idea, even if at base we're justifying policies like this for instrumental reasons on the meta level, as opposed to justifying the policy with rationales like "it's just inherently morally good to always be rational, don't ask questions, it just is".

Philosophical summersaults won't save the fact that the energy behind a statement like that is more about controlling others' impressions than it is about being goddamned honest about what the desire and intention really is.

Hm, this case seems like a stretch to me, which makes me a bit more skeptical about the speaker's other claims. But  it's also possible I'm misunderstanding what the intent behind "Rationality for its own sake for the sake of existential risk" originally was, or how it was used in practice?

For instance, I think that having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect. 

I think you're wrong about this, though the "within our lifetime" part is important: I think the case for extremely high risk is very strong, but reasoning about timelines seems way more uncertain to me.

(And way less important: if the human race is killed a few years after I die of non-AGI causes, that's no less of a tragedy.)

If we bracket the timelines part and just ask about p(doom), I think https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security and https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/ makes it quite easy to reach extremely dire forecasts about AGI. Getting extremely novel software right on the first try is just that hard. (And we have to do this eventually, even if we luck out and get a few years to play with weak AGIs before strong AGIs arrive.)

Rohin Shah @ 2023-01-25T16:16 (+19)

If we bracket the timelines part and just ask about p(doom), I think https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security and https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/ makes it quite easy to reach extremely dire forecasts about AGI. Getting extremely novel software right on the first try is just that hard.

Surely not. Neither of those make any arguments about AI, just about software generally. If you literally think those two are sufficient arguments for concluding "AI kills us with high probability" I don't see why you don't conclude "Powerpoint kills us with high probability".

RobBensinger @ 2023-01-25T17:51 (+7)

Yep! To be explicit, I was assuming that general intelligence is very powerful, that you can automate it, and that it isn't (e.g.) friendly by default.

sphor @ 2023-01-27T15:23 (+5)

I'm not sure I understand what statements like "general intelligence is very powerful" mean even though it seems to be a crucial part of the argument. Can you explain more concretely what you mean by this? E.g. What is "general intelligence"? What are the ways in which it is and isn't powerful? 

RobBensinger @ 2023-01-30T04:14 (+10)

By "general intelligence" I mean "whatever it is that lets human brains do astrophysics, category theory, etc. even though our brains evolved under literally zero selection pressure to solve astrophysics or category theory problems".

Human brains aren't perfectly general, and not all narrow AIs/animals are equally narrow. (E.g., AlphaZero is more general than AlphaGo.) But it sure is interesting that humans evolved cognitive abilities that unlock all of these sciences at once, with zero evolutionary fine-tuning of the brain aimed at equipping us for any of those sciences. Evolution just stumbled into a solution to other problems, that happened to generalize to billions of wildly novel tasks.

To get more concrete:

  • AlphaGo is a very impressive reasoner, but its hypothesis space is limited to sequences of Go board states rather than sequences of states of the physical universe. Efficiently reasoning about the physical universe requires solving at least some problems (which might be solved by the AGI's programmer, and/or solved by the algorithm that finds the AGI in program-space; and some such problems may be solved by the AGI itself in the course of refining its thinking) that are different in kind from what AlphaGo solves.
    • E.g., the physical world is too complex to simulate in full detail, unlike a Go board state. An effective general intelligence needs to be able to model the world at many different levels of granularity, and strategically choose which levels are relevant to think about, as well as which specific pieces/aspects/properties of the world at those levels are relevant to think about.
    • More generally, being a general intelligence requires an enormous amount of laserlike strategicness about which thoughts you do or don't think: a large portion of your compute needs to be ruthlessly funneled into exactly the tiny subset of questions about the physical world that bear on the question you're trying to answer or the problem you're trying to solve. If you fail to be ruthlessly targeted and efficient in "aiming" your cognition at the most useful-to-you things, you can easily spend a lifetime getting sidetracked by minutiae / directing your attention at the wrong considerations / etc.
    • And given the variety of kinds of problems you need to solve in order to navigate the physical world well / do science / etc., the heuristics you use to funnel your compute to the exact right things need to themselves be very general, rather than all being case-specific. (Whereas we can more readily imagine that many of the heuristics AlphaGo uses to avoid thinking about the wrong aspects of the game state, or thinking about the wrong topics altogether, are Go-specific heuristics.)
  • GPT-3 is a very impressive reasoner in a different sense (it successfully recognizes many patterns in human language, including a lot of very subtle or conjunctive ones like "when A and B and C and D and E and F and G and H and I are all true, humans often say X"), but it too isn't doing the "model full physical world-states and trajectories thereof" thing  (though an optimal predictor of human text would need to be a general intelligence, and a superhumanly capable one at that).
  • Some examples of abilities I expect humans to only automate once we've built AGI (if ever):
    • The ability to perform open-heart surgery with a high success rate, in a messy non-standardized ordinary surgical environment.
    • The ability to match smart human performance in a specific hard science field, across all the scientific work humans do in that field.
  • In principle, I suspect you could build a narrow system that is good at those tasks while lacking the basic mental machinery required to do par-human reasoning about all the hard sciences. In practice, I very strongly expect humans to find ways to build general reasoners to perform those tasks, before we figure out how to build narrow reasoners that can do them. (For the same basic reason evolution stumbled on general intelligence so early in the history of human tech development.)
    • (Of course, if your brain has all the basic mental machinery required to do other sciences, that doesn't mean that you have the knowledge required to actually do well in those sciences. An artificial general intelligence could lack physics ability for the same reason many smart humans can't solve physics problems.)

When I say "general intelligence is very powerful", a lot of what I mean is that science is very powerful, and that having all the sciences at once is a lot more powerful than the sum of each science's impact.

(E.g., because different sciences can synergize, and because you can invent new scientific fields and subfields, and more generally chain one novel insight into dozens of other new insights that critically depended on the first insight.)

Another large piece of what I mean is that general intelligence is a very high-impact sort of thing to automate because AGI is likely to blow human intelligence out of the water immediately, or very soon after its invention.

80K gives the (non-representative) example of how AlphaGo and its immediate successors compared to the human ability range on Go:

In the span of a year, AI had advanced from being too weak to win a single [Go] match against the worst human professionals, to being impossible for even the best players in the world to defeat.

I expect "general STEM AI" to blow human science ability out of the water in a similar fashion. Reasons for this include:

  • Software (unlike human intelligence) scales with more compute.
  • Current ML uses far more compute to find reasoners than to run reasoners. This is very likely to hold true for AGI as well.
  • We probably have more than enough compute already, and are mostly waiting on new ideas for how to get to AGI efficiently, as opposed to waiting on more hardware to throw at old ideas.
  • Empirically, humans aren't near a cognitive ceiling, and even narrow AI often suddenly blows past the human reasoning ability range on the task it's designed for. It would be weird if scientific reasoning were an exception.
  • Empirically, human brains are full of cognitive biases and inefficiencies. It's doubly weird if scientific reasoning is an exception even though it's visibly a mess with tons of blind spots, inefficiencies, motivated cognitive processes, and historical examples of scientists and mathematicians taking decades to make technically simple advances.
  • Empirically, human brains are extremely bad at some of the most basic cognitive processes underlying STEM. E.g., consider that human brains can barely do basic mental math at all.
  • Human brains underwent no direct optimization for STEM ability in our ancestral environment, beyond things like "can distinguish four objects in my visual field from five objects". In contrast, human engineers can deliberately optimize AGI systems' brains for math, engineering, etc. capabilities.
    • More generally, the sciences (and many other aspects of human life, like written language) are a very recent development. So evolution has had very little time to refine and improve on our reasoning ability in many of the  ways that matter.
  • Human engineers have an enormous variety of tools available to build general intelligence that evolution lacked. This is often noted as a reason for optimism that we can align AGI to our goals, even though evolution failed to align humans to its "goal". It's additionally a reason to expect AGI to have greater cognitive ability, if engineers try to achieve great cognitive ability.
  • The hypothesis that AGI will outperform humans has a disjunctive character: there are many different advantages that individually suffice for this, even if AGI doesn't start off with any other advantages. (E.g., speed, math ability, scalability with hardware, skill at optimizing hardware...)
Mauricio @ 2023-01-27T01:56 (+2)

Nitpick: doesn't the argument you made also assume that there'll be a big discontinuity right before AGI? That seems necessary for the premise about "extremely novel software" (rather than "incrementally novel software") to hold.

RobBensinger @ 2023-01-27T22:07 (+5)

I do think that AGI will be developed by methods that are relatively novel. Like, I'll be quite surprised if all of the core ideas are >6 years old when we first achieve AGI, and I'll be more surprised still if all of the core ideas are >12 years old.

(Though at least some of the surprise does come from the fact that my median AGI timeline is short, and that I don't expect us to build AGI by just throwing more compute and data at GPT-n.)

Separately and with more confidence, I'm expecting discontinuities in the cognitive abilities of AGI. If AGI is par-human at heart surgery and physics, I predict that this will be because of "click" moments where many things suddenly fall into place at once, and new approaches and heuristics (both on the part of humans and on the part of the AI systems we build), not just because of a completely smooth, incremental, and low-impact-at-each-step improvement to the knowledge and thought-habits of GPT-3.

"Superhuman AI isn't just GPT-3 but thinking faster and remembering more things" (for example) matters for things like interpretability, since if we succeed shockingly well at finding ways to reasonably thoroughly understand what GPT-3's brain is doing moment-to-moment, this is less likely to be effective for understanding what the first AGI's brain is doing moment-to-moment insofar as the first AGI is working in very new sorts of ways and doing very new sorts of things.

I'm happy to add more points like these to the stew so they can be talked about. "Your list of reasons for thinking AGI risk is high didn't explicitly mention X" is a process we can continue indefinitely long if we want to, since there are always more background assumptions someone can bring up that they disagree with. (E.g., I also didn't explicitly mention "intelligence is a property of matter rather than of souls imparted into particular animal species by God", "AGI isn't thousands of years in the future", "most random goals would produce bad outcomes if optimized by a superintelligence"...)

Which specific assumptions should be included depends on the conversational context. I think it makes more sense to say "ah, I personally disagree with [X], which I want to flag as a potential conversational direction since your comment didn't mention [X] by name", as opposed to speaking as though there's an objectively correct level of granularity.

Like, the original thing I said was:

If we bracket the timelines part and just ask about p(doom), I think https://www.lesswrong.com/posts/Ke2ogqSEhL2KCJCNx/security-mindset-lessons-from-20-years-of-software-security and https://intelligence.org/2017/11/25/security-mindset-ordinary-paranoia/ makes it quite easy to reach extremely dire forecasts about AGI. Getting extremely novel software right on the first try is just that hard. (And we have to do this eventually, even if we luck out and get a few years to play with weak AGIs before strong AGIs arrive.)

Which was responding to a claim in the OP that no EA can rationally have a super high belief in AGI risk:

For instance, I think that having 70 or 80%+ probabilities on AI catastrophe within our lifetimes is probably just incorrect, insofar as a probability can be incorrect. 

The challenge the OP was asking me to meet was to point at a missing model piece (or a disagreement, where the other side isn't obviously just being stupid) that can cause a reasonable person to have extreme p(AGI doom), given other background views the OP isn't calling obviously stupid. (E.g., the OP didn't say that it's obviously stupid for anyone to have a confident belief that AGI will be a particular software project built at a particular time and place.)

The OP didn't issue a challenge to list all of the relevant background views (relative to some level of granularity or relative to some person-with-alternative-views, which does need to be specified if there's to be any objective answer), so I didn't try to explicitly write out obvious popularly held beliefs like "AGI is more powerful than PowerPoint". I'm happy to do that if someone wants to shift the conversation there, but hopefully it's obvious why I didn't do that originally.

Mauricio @ 2023-01-31T23:20 (+6)

Fair! Sorry for the slow reply, I missed the comment notification earlier.

I could have been clearer in what I was trying to point at with my comment. I didn't mean to fault you for not meeting an (unmade) challenge to list all your assumptions--I agree that would be unreasonable.

Instead, I meant to suggest an object-level point: that the argument you mentioned seems pretty reliant on a controversial discontinuity assumption--enough that the argument alone (along with other, largely uncontroversial assumptions) doesn't make it "quite easy to reach extremely dire forecasts about AGI." (Though I was thinking more about 90%+ forecasts.)

(That assumption--i.e. the main claims in the 3rd paragraph of your response--seems much more controversial/non-obvious among people in AI safety than the other assumptions you mention, as evidenced by researchers criticizing it and researchers doing prosaic AI safety work.)

MaxRa @ 2023-01-25T14:01 (+4)

(Minor: I really liked your top-level comment but almost didn't read this second comment because I didn't immediately realize you split up your comment due to (I suppose) running out of space. Maybe worth it to add a "[cont.]" or something in such cases in future.)

RobBensinger @ 2023-01-25T17:58 (+3)

Added!

NunoSempere @ 2023-01-25T12:22 (+11)

Ok, so thinking about this, one trouble with answering your comment this is that you have a self-consistent worldview which has contrary implications to some of the stuff I hold, but I feel that you are not giving answers with reference to stuff that I already hold, but rather to stuff that further references that worldview.

Let me know if this feels way off.

So I'm going to just pick one object-level argument and dig in to that:

As deep learning attains more and more success, I think that some of the old concerns port over. But I am not sure which ones, to what extent, and in which context. This leads me to reduce some of my probability.

On net I think the deep learning revolution increases p(doom), mostly because it's a surprisingly opaque and indirect way of building intelligent systems, that gives you relatively few levers to control internal properties of the reasoner you SGDed your way to.

Well, I think that the question is, increased p(doom) compared to what, e.g., what were your default expectations before the DL revollution?

  • Compared to equivalent progress in a seed AI which has a utility function
    • Deep learning seems like it has some advantages, e.g,.: it is [doing the kinds of things that were reinforced during its training in the past], which seems safer than [optimizing across a utility function programmed into its core, where we don't really know how to program utility functions.] 
      • E.g., GPT-3 seems wildly more safe than a seed AGI that had already reached that level of capabilities
        • "Oh yes, we just had to put in a desire to predict the world, an impulse for curiosity, in addition to the standard self-preservation drive and our experimental caring for humans module, and then just let it explore the internet" sounds fairly terrifying.
  • Compared to no progress in deep learning at all
    • Sure, I agree
  • Compared to something else
    • Depends on the something else.
RobBensinger @ 2023-01-29T03:34 (+7)

I feel that you are not giving answers with reference to stuff that I already hold, but rather to stuff that further references that worldview.

Sounds right to me! I don't know your worldview, so I'm mostly just reporting my thoughts on stuff, not trying to do anything particularly sophisticated.

what were your default expectations before the DL revollution?

I personally started thinking about ML and AGI risk in 2013, and I didn't have much of a view of "how are we likely to get to AGI?" at the time.

My sense is that MIRI-circa-2010 wasn't confident about how humanity would get to AI, but expected it would involve gaining at least some more (object-level, gearsy) insight into how intelligence works. "Just throw more compute at a slightly tweaked version of one of the standard old failed approaches to AGI" wasn't MIRI's top-probability scenario.

From my perspective, humanity got "unlucky" in three different respects:

  • AI techniques started working really well early, giving us less time to build up an understanding of alignment.
  • Techniques started working for reasons other than us acquiring and applying gearsy new insights into how reasoning works, so the advances in AI didn't help us understand how to do alignment.
  • And the specific methods that worked are more opaque than most pre-deep-learning AI, making it hard to see how you'd align the system even in principle.

E.g., GPT-3 seems wildly more safe than a seed AGI that had already reached that level of capabilities

Seems like the wrong comparison; the question is whether AGI built by deep learning (that's at the "capability level" of GPT-3) is safer than seed AGI (that's at the "capability level" of GPT-3).

I don't think GPT-3 is an AGI, or has the same safety profile as baby AGIs built by deep learning. (If there's an efficient humanly-reachable way to achieve AGI via deep learning.) So an apples-to-apples comparison would either think about hypothetical deep-learning AGI vs. hypothetical seed AGI, or it would look at GPT-3 vs. hypothetical narrow AI built on the road to seed AGI.

If we can use GPT-3 or something very similar to GPT-3 to save the world, then it of course matters that GPT-3 is way safer than seed AGI. But then the relevant argument would look something like "maybe the narrow AI tech that you get on the path to deep-learning AGI is more powerful and/or more safe than the narrow AI tech that you get on the path to seed AGI", as opposed to "GPT-3 is safer than a baby god" (the latter being something that's true whether or not the baby god is deep-learning-based).

Lizka @ 2023-01-25T10:48 (+9)

And partly because the field is just pretty old at this stage, and a lot of effort has gone into arguments in both directions by now.

Could you list some of your favorite/the strongest-according-to-you (collections of) arguments against AI risk? 

RobBensinger @ 2023-01-28T21:39 (+13)

Sure! It would depend on what you mean by "an argument against AI risk":

  • If you mean "What's the main argument that makes you more optimistic about AI outcomes?", I made a list of these in 2018.
  • If you mean "What's the likeliest way you think it could turn out that aligning AGI is unnecessary in order to do a pivotal act / initiate an as-long-as-needed reflection?", I'd currently guess it's using strong narrow-AI systems to accelerate you to Drexlerian nanotechnology (which can then be used to build powerful things like "large numbers of fast-running human whole-brain emulations").
  • If you mean "What's the likeliest way you think it could turn out that humanity's current trajectory is basically OK / no huge actions or trajectory changes are required?", I'd say that the likeliest scenario is one where AGI kills all humans, but this isn't a complete catastrophe for the future value of the reachable universe because the AGI turns out to be less like a paperclip maximizer and more like a weird sentient alien that wants to fill the universe with extremely-weird-but-awesome alien civilizations. This sort of scenario is discussed in Superintelligent AI is necessary for an amazing future, but far from sufficient.
  • If you mean "What's the likeliest way you think it could turn out that EAs are focusing too much on AI and should focus on something else instead?", I'd guess it's if we should focus more on biotech. E.g., this conjunction could turn out to be true: (1) AGI is 40+ years away; (2) by default, it will be easy for small groups of crazies to kill all humans with biotech in 20 years; and (3) EAs could come up with important new ways to avoid disaster if we made this a larger focus (though it's already a reasonably large focus in EA).
RobBensinger @ 2023-01-29T06:12 (+5)

Another way it could be bad that EAs are focusing on AI is if EAs are accelerating AGI capabilities / shortening timelines way more than we're helping with alignment (or otherwise increasing the probability of good outcomes).

sphor @ 2023-01-29T16:17 (+3)

Here are a few non-MIRI perspectives if you're interested:

What does it mean to align AI with human values?

The implausibility of intelligence explosion

Against the singularity hypothesis

Book Review: Reframing Superintelligence

RobBensinger @ 2023-02-07T16:52 (+14)

What does it mean to align AI with human values?

This article is... really bad.

It's mostly a summary of Yudkowsky/Bostrom ideas, but with a bunch of the ideas garbled and misunderstood.

Mitchell says that one of the core assumptions of AI risk arguments is "that any goal could be 'inserted' by humans into a superintelligent AI agent". But that's not true, and in fact a lot of the risk comes from the fact that we have no idea how to'insert' a goal into an AGI system.

The paperclip maximizer hypothetical here is a misunderstanding of the original idea. (Though it's faithful to the version Bostrom gives in Superintelligence.) And the misunderstanding seems to have caused Mitchell to misunderstood a bunch of other things about the alignment problem. Picking one of many examples of just-plain-false claims:

"And importantly, in keeping with Bostrom’s orthogonality thesis, the machine has achieved superintelligence without having any of its own goals or values, instead waiting for goals to be inserted by humans."

The article also says that "research efforts on alignment are underway at universities around the world and at big AI companies such as Google, Meta and OpenAI". I assume Google here means DeepMind, but what alignment research at Meta does Mitchell have in mind??

Also: "Many researchers are actively engaged in alignment-based projects, ranging from attempts at imparting principles of moral philosophy to machines, to training large language models on crowdsourced ethical judgments."

... That sure is a bad picture of what looks difficult about alignment.

The implausibility of intelligence explosion

This essay is quite bad. A response here: A reply to Francois Chollet on intelligence explosion

I disagree with Thorstad and Drexler, but those resources seem much better.

NunoSempere @ 2023-01-25T13:29 (+7)

But my individual impression is that the selection effects argument packs a whole lot of punch behind it.

I don't think it has that much punch, partly just because I don't think EAs and rationalists are as unreasonable as Christian apologists (e.g., we're much more inclined to search for flaws in our own arguments). And partly because the field is just pretty old at this stage, and a lot of effort has gone into arguments in both directions by now.

Idk, I can't help but notice that your title at MIRI is "Research Communications", but there is nobody paid by the "Machine Intelligence Skepticism Institute" to put forth claims that you are wrong.

edit: removed superfluous "that".

RobBensinger @ 2023-01-25T18:27 (+10)

Idk, I can't help but notice that your title at MIRI is "Research Communications", but there is nobody paid by the "Machine Intelligence Skepticism Institute" to put forth claims that you are that wrong.

Since we're talking about p(doom), this sounds like a claim that my job at MIRI is to generate arguments for worrying more about AGI, and we haven't hired anybody whose job it is to generate arguments for worrying less.

Well, I'm happy to be able to cite that thing I wrote with a long list of reasons to worry less about AGI risk!

(Link)

RobBensinger @ 2023-01-25T18:28 (+9)

I'm not claiming the list is exhaustive or anything, or that everyone would agree with what should go on such a list. It's the reasons that update me the most, not the reasons that I'd expect to be most convincing to some third party.

But the very existence of public lists like this seems like something your model was betting against. Especially insofar as it's a novel list with items MIRI came up with itself that reflect MIRI-ish perspectives, not a curated list of points other groups have made.

NunoSempere @ 2023-01-25T19:44 (+4)

Nice

NunoSempere @ 2023-01-28T08:17 (+4)

It also occurs to me that I've also written an argument in favor of worrying about x-risk, here.

NunoSempere @ 2023-01-25T11:54 (+7)

Re: Arguments against conjunctiveness

So here the thing is that I don't find Nate's argument particularly compelling, and after a few times of the following pattern:

  1. Here is a reason to think that AI might not happen/might not cause an existential risk
  2. Here is an argument for why that reason doesn't apply, which could range from wrong to somewhat compelling to very compelling
  3. [Advocate proceeds to take the argument on 2. as sort of permission in their mind to assign maximal probability to AI doom]

I grow tired of it, and it starts to irk me.

RobBensinger @ 2023-01-25T18:36 (+3)

What's an example of "here is an argument for why that reason doesn't apply" that you think is wrong?

And are you claiming that Nate or I are "assigning maximal probability to AI doom", or doing this kind of qualitative black-and-white reasoning? If so, why?

Nate's post, for reference, was: AGI ruin scenarios are likely (and disjunctive)

NunoSempere @ 2023-01-26T12:43 (+3)

Rereading the post, I think that it has a bunch statements about what Soares believes, but it doesn't have that many mechanisms, pathways, counter-considerations, etc.

E.g.,:

The world’s overall state needs to be such that AI can be deployed to make things good. A non-exhaustive list of things that need to go well for this to happen follows

  • The world needs to admit of an AGI deployment strategy (compatible with realistic alignable-capabilities levels for early systems) that prevents the world from being destroyed if executed.
  • At least one such strategy needs to be known and accepted by a leading organization.
  • Somehow, at least one leading organization needs to have enough time to nail down AGI, nail down alignable AGI, actually build+align their system, and deploy their system to help.
    • This very likely means that there needs to either be only one organization capable of building AGI for several years, or all the AGI-capable organizations need to be very cautious and friendly and deliberately avoid exerting too much pressure upon each other.
  • It needs to be the case that no local or global governing powers flail around (either prior to AGI, or during AGI development) in ways that prevent a (private or public) group from saving the world with AGI.

This is probably a good statement of what Soares thinks needs to happen, but it is not a case for that, so I am left to evaluate the statements and the claim that they are conjunctive with reference to their intuitive plausibility.

I think I might be a bit dense here.

E.g.,:

It needs to be the case that no local or global governing powers flail around (either prior to AGI, or during AGI development) in ways that prevent a (private or public) group from saving the world with AGI.

Idk, he later mentions the US government's COVID response, but I think the relevant branch of the government for dealing with AGI threats would probably be the department of defense, which seems much more competent, and seems capable of plays like blocking exports of semiconductor manufacturing equipment to China. 

Greg_Colbourn @ 2023-01-25T10:34 (+5)

Deep learning also increases the probability of things like "roughly human-level AGIs run around for a few years before we see anything strongly superhuman", but this doesn't affect my p(doom) much because I don't see a specific path for leveraging this to prevent the world from being destroyed when we do reach superintelligence

But it does leave a path open to prevent doom: not reaching superintelligence! i.e. a global moratorium on AGI.

(The rest of the comment is great btw :))

RobBensinger @ 2023-01-25T10:47 (+4)

What causes the moratorium to be adopted, and how is it indefinitely enforced in all countries in the world?

(Also, if this can be achieved with roughly human-level AGIs running around, why can't it be achieved without such AGIs running around?)

Greg_Colbourn @ 2023-01-25T12:47 (+7)

Social/moral consensus? There is precedent with e.g. recombinant DNA or human genetic engineering (if only the AI Asilomar conference was similarly focused on a moratorium!) It might be hard to indefinitely enforce globally, but we might at least be able to kick the can down the road a couple of decades (as seems to have happened with the problematic bio research).

(It should be achieved without such AGIs running around, if we want to minimise x-risk. Indeed, we should have started on this already! I'm starting to wonder whether it might actually be the best option we have, given the difficulty, or perhaps impossibility(?) of alignment.)
 

Greg_Colbourn @ 2023-01-25T13:51 (+9)

Don't get me wrong, I'd love to live in a glorious transhuman future (like e.g. Iain M Bank's Culture), but I just don't think it's worth the risk of doom, as things stand. Maybe after a few decades of moratorium, when we know a lot more, we can reassess (and hopefully we will still be able to have life extension so will personally still be around).

It now seems unfortunate that the AI x-risk prevention community was seeded from the transhumanist/techno-utopian community (e.g. Yudkowsky and Bostrom). This historical contingency is probably a large part of the reason why a global moratorium on AGI has never been seriously proposed/attempted.

RobBensinger @ 2023-01-25T18:17 (+7)

This historical contingency is probably a large part of the reason why a global moratorium on AGI has never been seriously proposed/attempted.

Seems very surprising if true — the Yudkowskians are the main group that worries we're screwed without a global moratorium, and the main group that would update positively if there were a way to delay AGI by a few decades. (Though they aren't the main group that thinks it's tractable to coordinate such a big delay.)

From my perspective Bostrom and Yudkowsky were the ones saying from the get-go that rushing to AGI is bad. E.g., in Superintelligence:

One effect of accelerating progress in hardware, therefore, is to hasten the arrival of machine intelligence. As discussed earlier, this is probably a bad thing from the impersonal perspective, since it reduces the amount of time available for solving the control problem and for humanity to reach a more mature stage of civilization.

(Though he flags that this is a "tentative conclusion" that "could be overturned, for example if the threats from other existential risks or from post-transition coordination failures turn out to be extremely large". If we were thinking about going from "AGI in 100 years" to "AGI in 300 years", I might agree; if we're instead going from "AGI in 15 years" to "AGI in 40 years", then the conclusion seems way less tentative to me, given how unsolved the alignment problem is!)

The transhumanists were the ones who centered a lot of the early discussion around differential technological development, a.k.a. deliberately trying to slow down scary tech (e.g. AGI) so it comes after anti-scary tech (e.g. alignment), or attempting to accelerate alignment work to the same effect.

The idea that Bostrom or Yudkowsky ever thought "the alignment problem is a major issue, but let's accelerate to AGI as quickly as possible for the sake of reaching the Glorious Transhumanist Future sooner" seems like revisionism to me, and I'm skeptical that the people putting less early emphasis on differential technological development back in 2014, in real life, would have somehow performed better in this counterfactual.

Greg_Colbourn @ 2023-01-25T20:13 (+6)

The idea that Bostrom or Yudkowsky ever thought "the alignment problem is a major issue, but let's accelerate to AGI as quickly as possible for the sake of reaching the Glorious Transhumanist Future sooner" seems like revisionism to me

I'm not saying this is (was) the case. It's more subtle than that. It's the kind of background worldview that makes people post this (or talk of "pivotal acts") rather than this

The message of differential technological development clearly hasn't had the needed effect. There has been no meaningful heed paid to it by the top AI companies. What we need now is much stronger statements. i.e. ones that use the word "moratorium". Why isn't MIRI making such statements? It doesn't make sense to go to 0 hope of survival without even seriously attempting a moratorium (or at the very least, publicly advocating for one).

RobBensinger @ 2023-01-25T23:33 (+11)

I think the blunt MIRI-statement you're wanting  is here:

Capabilities work is currently a bad idea

Nate’s top-level view is that ideally, Earth should take a break on doing work that might move us closer to AGI, until we understand alignment better.

That move isn’t available to us, but individual researchers and organizations who choose not to burn the timeline are helping the world, even if other researchers and orgs don't reciprocate. You can unilaterally lengthen timelines, and give humanity more chances of success, by choosing not to personally shorten them.

Nate thinks capabilities work is currently a bad idea for a few reasons:

  • He doesn’t buy that current capabilities work is a likely path to ultimately solving alignment.
  • Insofar as current capabilities work does seem helpful for alignment, it strikes him as helping with parallelizable research goals, whereas our bottleneck is serial research goals. (See A note about differential technological development.)
  • Nate doesn’t buy that we need more capabilities progress before we can start finding a better path.

[...]

On Nate’s view, the field should do experiments with ML systems, not just abstract theory. But if he were magically in charge of the world's collective ML efforts, he would put a pause on further capabilities work until we've had more time to orient to the problem, consider the option space, and think our way to some sort of plan-that-will-actually-probably-work. It’s not as though we’re hurting for ML systems to study today, and our understanding already lags far behind today’s systems' capabilities.

[...]

Nate thinks that DeepMind, OpenAI, Anthropic, FAIR, Google Brain, etc. should hit the pause button on capabilities work (or failing that, at least halt publishing). (And he thinks any one actor can unilaterally do good in the process, even if others aren't reciprocating.)

Tangentially, I'll note that you might not want MIRI to say "that move isn't available to us", if you think that it's realistic to get the entire world to take a break on AGI work, and if you think that saying pessimistic things about this might make it harder to coordinate. (Because, e.g., this might require a bunch of actors to all put a lot of sustained work into building some special institution or law, that isn't useful if you only half-succeed; and Alice might not put in this special work if she thinks Bob is unconditionally unwilling to coordinate, or if she's confident that Carol is confident that Dan won't coordinate.)

But this seems like a very unlikely possibility to me, so I currently see more value in just saying MIRI's actual take; marginal timeline-lengthening actions can be useful even if we can't actually put the whole world on pause for 20 years.

Greg_Colbourn @ 2023-01-26T07:56 (+8)

This is good, but I don't think it goes far enough. And I agree with your comments re "might not want MIRI to say "that move isn't available to us"". It might not be realistic to get the entire world to take a break on AGI work, but it's certainly conceivable, and I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?). It seems reasonable to direct marginal resources toward pushing for a moratorium on AGI rather than more alignment work (although I still think this should at least be tried too!)

Your's and Nate's statement still implicitly assumes that AGI capabilities orgs are "on our side". The evidence is that they are clearly not.  Demis is voicing caution at the same time that Google leadership have started a race with OpenAI (Microsoft). It's out of Demis' (and his seemingly toothless ethics board's) hands.  Less  accepting what has been tantamount to "existential safety washing", and more realpolitik, is needed. Better now might be to directly appeal to the public and policymakers. Or find a way to strategise with those with power. For example, should the UN Security Council be approached somehow? This isn't "defection".

Greg_Colbourn @ 2023-01-26T20:33 (+8)

I'm saying all this because I'm not afraid of treading on any toes. I don't depend on EA money (or anyone's money) for my livelihood or career[1]. I'm financially independent. In fact, my life is pretty good, all apart from facing impending doom from this! I mean, I don't need to work to survive[2], I've got an amazing partner and and a supportive family. All that is missing is existential security! I'd be happy to have "completed it mate" (i.e I've basically done this with the normal life of house, car, spouse, family, financial security etc); but I haven't -  remaining is this small issue of surviving for a normal lifespan, having my children survive and thrive / ensuring the continuation of the sentient universe as we know it...

  1. ^

    Although I still care about my reputation in EA to be fair (can't really avoid this as a human)

  2. ^

    All my EA work is voluntary

RobBensinger @ 2023-01-28T21:18 (+4)

I think maybe at this point more realistic than expecting alignment to be solved in time (or at all?).

I think it's a lot more realistic to solve alignment than to delay AGI by 50 years. I'd guess that delaying AGI by 10 years is maybe easier than alignment, but it also doesn't solve anything unless we can use those 10 years to figure out alignment as well. For that matter, delaying by 50 years also requires that we solve alignment in that timeframe, unless we're trying to buy time to do some third other thing.

The difficulty of alignment is also a lot more uncertain than the difficulty of delaying AGI: it depends more on technical questions that are completely unknown from our current perspective. Delaying AGI by decades is definitely very hard, whereas the difficulty of alignment is mostly a question mark.

All of that suggests to me that alignment is far more important as a way to spend marginal resources today, but we should try to do both if there are sane ways to pursue both options today.

If you want MIRI to update from "both seem good, but alignment is the top priority" to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:

  • AGI alignment is a solvable problem.
  • Absent aligned AGI, there isn't a known clearly-viable way for humanity to achieve a sufficiently-long reflection (including centuries of delaying AGI, if that turned out to be needed, without permanently damaging or crippling humanity).
    • (There are alternatives to aligned AGI that strike me as promising enough to be worth pursuing. E.g., maybe humans can build Drexlerian nanofactories without help from AGI, and can leverage this for a pivotal act. But these all currently seem to me like even bigger longshots than the alignment problem, so I'm not currently eager to direct resources away from (relatively well-aimed, non-capabilities-synergistic) alignment research for this purpose.)
  • Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI. (Taking into account that we have no litmus test for what counts as an "AGI" and we don't know what range of algorithms or what amounts of compute you'd need to exclude in order to be sure you've blocked AGI. So a regulation that blocks AGI for fifty years would probably need to block a ton of other things.)
  • EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.
Greg_Colbourn @ 2023-01-29T10:44 (+8)

Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it's impossibility? Or perhaps a detailed plan for implementing it? (Or even the seemingly very unlikely "..there's nothing to worry about")), which would allow us to iterate on a better strategy (we shouldn't assume that our outlook will be the same after 10 years!)

but we should try to do both if there are sane ways to pursue both options today.

Yes! (And I think there are sane ways).

If you want MIRI to update from "both seem good, but alignment is the top priority" to your view, you should probably be arguing (or gathering evidence) against one or more of these claims:

  • AGI alignment is a solvable problem.

There are people working on this (e.g. Yampolskiy, Landry & Ellen), and this is definitely something I want to spend more time on (note that the writings so far could definitely do with a more accessible distillation).

Absent aligned AGI, there isn't a known clearly-viable way for humanity to achieve a sufficiently-long reflection

I really don't think we need to worry about this now. AGI x-risk is an emergency - we need to deal with that emergency  first (e.g. kick the can down the road 10 years with a moratorium on AGI research); then when we can relax a little, we can have the luxury to think about long term flourishing.

Humanity has never succeeded in any political task remotely as difficult as the political challenge of creating an enforced and effective 50+ year global moratorium on AGI.

I think this can definitely be argued against (and I will try and write more as/when I make a more fleshed out post calling for a global AGI moratorium). For a start, without all the work on nuclear proliferation and risk, we may well not be here today. Yes there has been proliferation, but there hasn't been an all-out nuclear exchange yet! It's now 77 years since a nuclear weapon was used in anger. That's a pretty big result I think! Also, global taboos around bio topics such as human genetic engineering are well established. If such a taboo is established, enforcement becomes a lesser concern, as you are  then only fighting against isolated rogue elements rather than established megacorporations. Katja Grace discusses such taboos in her post on slowing down AI.

  • EAs have not demonstrated the ability to succeed in political tasks that are way harder than any political task any past humans have succeeded on.

Fair point. I think we should be thinking much wider than EA here. This needs to become mainstream, and fast.

Also, I should say that I don't think MIRI should necessarily be diverting resources to work on a moratorium. Alignment is your comparative advantage so you should probably stick to that. What I'm saying is that you should be publicly and loudly calling for a moratorium. That would be very easy for you to do (a quick blog post/press release). But it could have a huge effect in terms of shifting the Overton Window on this. As I've said, it doesn't make sense for this not to be part of any "Death with Dignity" strategy. The sensible thing when faced with ~0% survival odds is to say "FOR FUCK'S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!", or even "STOP BUILDING AGI YOU FUCKS!" [Sorry for the language, but I think it's appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]

RobBensinger @ 2023-01-31T06:58 (+6)

Even a 10 year delay is worth a huge amount (in expectation). We may well have a very different view of alignment by then (including perhaps being pretty solid on it's impossibility?

Agreed on all counts! Though as someone who's been working in this area for 10 years, I have a newfound appreciation for how little intellectual progress can easily end up happening in a 10-year period...

(Or even the seemingly very unlikely "..there's nothing to worry about")

I have a lot of hopes that seem possible enough to me to be worth thinking about, but this specific hope isn't one of them. Alignment may turn out to be easier than expected, but I think we can mostly rule out "AGI is just friendly by default".

But it could have a huge effect in terms of shifting the Overton Window on this.

In which direction?

:P 

I'm joking, though I do take seriously that there are proposals that might be better signal-boosted by groups other than MIRI. But if you come up with a fuller proposal you want lots of sane people to signal-boost, do send it to MIRI so we can decide if we like it; and if we like it as a sufficiently-realistic way to lengthen timelines, I predict that we'll be happy to signal-boost it and say as much.

As I've said, it doesn't make sense for this not to be part of any "Death with Dignity" strategy. The sensible thing when faced with ~0% survival odds is to say "FOR FUCK'S SAKE CAN WE AT LEAST TRY AND PULL THE PLUG ON HUMANS DOING AGI RESEARCH!?!", or even "STOP BUILDING AGI YOU FUCKS!" [Sorry for the language, but I think it's appropriate given the gravity of the situation, as assumed by talk of 100% chance of death etc.]

I strongly agree and think it's right that people... like, put some human feeling into their words, if they agree about how fucked up this situation is? (At least if they find it natural to do so.)

NunoSempere @ 2023-01-25T12:27 (+4)

I think there's some social and psychological fuckery going on, but less than occurs in the opposite direction in the non-EA, non-rationalist, etc. superculture.

Yes, but you could think that the fuckery in the EA/rat comunity is concentrated on the topic of AI, and that the EA/rat communities can develop defenses against normal social fuckery but not viceversa.

NunoSempere @ 2023-01-25T11:45 (+3)

Hey, thanks for your lengthy comment. For future reference I would have found it more convenient if you had an individual comment for each consideration :)

RobBensinger @ 2023-01-26T17:58 (+4)

I could do that, though some of my points are related, which might make it confusing when karma causes the comments to get rearranged out of order.

bruce @ 2023-01-23T21:58 (+26)

Despite these flaws, I think that this text was personally important for me to write up, and it might also have some utility to readers.

Just a brief comment to say that I definitely appreciated you writing this post up, as well as linking to Phil's blog post! I share many of these uncertainties, but have often just assumed I'm missing some important object-level knowledge, so it's nice to see this spelled out more explicitly by someone with more exposure to the relevant communities. Hope you get some engagement!

Un Wobbly Panda @ 2023-01-23T22:24 (+14)

I'm missing some important object-level knowledge, so it's nice to see this spelled out more explicitly by someone with more exposure to the relevant community. Hope you get some engagement!

I know several people from outside EA (Ivy League, Ex-FANG, work in ML, startup, Bay Area) and they share the same "skepticism" (in quotes because it's not the right word, their view is more negative). 

I suspect one aspect of the problem is the sort of "Timnit Gebru"-style yelling and also sneering, often from the leftist community that is opposed to EA more broadly (but much of this leftist sneering was nucleated by the Bay Area communities).

This gives proponents of AI-safety an easy target, funneling online discourse into a cul de sac of tribalism. I suspect this dynamic is deliberately cultivated on both sides, a system ultimately supported by a lot of crypto/tech wealth. This leads to where we are today, where someone like Bruce (not to mention many young people) get confused. 

bruce @ 2023-01-24T00:21 (+67)

I don't follow Timnit closely, but I'm fairly unconvinced by much of what I think you're referring to RE: "Timnit Gebru-style yelling / sneering", and I don't want to give the impression that my uncertainties are strongly influenced by this, or by AI-safety community pushback to those kinds of sneering. I'd be hesitant to agree that I share these views that you are attributing to me, since I don't really know what you are referring to RE: folks who share "the same skepticism" (but some more negative version).

When I talk about uncertainty, some of these are really just the things that Nuno is pointing out in this post. Concrete examples of what some of these uncertainties look like in practice for me personally include:

  • I don't have a good inside view on timelines, but when EY says our probability of survival is ~0% this seems like an extraordinary claim that doesn't seem to be very well supported or argued for, and something I intuitively want to reject outright, but don't have the object level expertise to meaningfully do so. I don't know the extent to which EY's views are representative or highly influential in current AI safety efforts, and I can imagine a world where there's too much deferring going on. It seems like some within the community have similar thoughts.
  • When Metaculus suggests a median prediction of 2041 for AGI and a lower quartile prediction of 2030, I don't feel like I have a good way of working out how much I should defer to this.
  • I was surprised to find that there seems to be a pretty strong correlation between people who work on AI safety and people who have been involved in either the EA/LW community, and there are many people outside of the EA/LW space who have a much lower P(doom) than those inside these spaces, even those who prima facie have strong incentives to have an accurate gauge of what P(doom) actually is. 
    • To use a highly imperfect analogy, if being an EA doctor was highly correlated with a belief that [disease X] is made up, and if a significant majority of mainstream medicine believe [disease X] is a real phenomena, this would make me more uncertain about whether EAs are right, as I have to take into account the likelihood that a significant majority of mainstream medicine is wrong, against the possibility that EA doctors somehow have a way of interpreting noise / information in a much less biased way to non-EA doctors.
    • Of course, it's plausible that EAs are on to something legit, and everyone else is underweighting the risks. But all I'm saying is that this is an added uncertainty.
    • It doesn't help that, unlike COVID (another place EAs say they were ahead of the curve in), it's much easier to retreat into some unfalsifiable position when it comes to P(doom) and AI safety, and also (very reasonably) not helpful to point to base rates. 
  • While I wouldn't suggest that 80,000 hours has "gone off the guardrails" in the way Nuno suggests CFAR did, on particularly uncharitable days I do get the sense that 80,000 hours feels more like a recruitment platform for AI and longtermist careers for a narrow readership that fit a particular philosophical view, which was not what it felt like back in ~2015 when I first read it (at that time, perhaps a more open, worldview-diversified career guide for a wider audience). Perhaps this reflects a deliberate shift in strategy, but I do get the sense that if this is the case, additional transparency about this shift would be helpful, since the target audience are often fairly impressionable, idealistic high school students or undergrads looking for a highly impactful career.
  • Another reason I'm slightly worried about this, and related to the earlier point about deferral, is that Ben Todd says: "For instance, I agree it's really important for EA to attract people who are very open minded and curious, to keep EA alive as a question. And one way to do that is to broadcast ideas that aren't widely accepted." [emphasis added]
    • But my view is that this approach does not meaningfully differentiate between "attracting open-minded and curious people" VS "attracting highly deferring + easily influenced people", and 80,000 hours may inadvertently contribute to the latter if not careful.
    • Relatedly, there have been some recent discussions around deference around things like AI timelines, as well as in the EA community generally.


------------

I don't want this comment thread to get into a discussion about Timnit, because I think that detracts from engagement with Nuno's post, but a quick comment:[1]

Regardless of one's views on the quality of Timnit-style engagement, it's useful to consider the extent the AI safety community may need to coordinate with people who might be influenced by that category of "skepticism". This could either be at the object-level, or in terms of considering why some classes of "skepticism" seems to gain so much traction despite AI safety proponents' disagreements with them. It might point to better ways for the AI safety community when it comes to communicating their ideas, or to help maximise buy-in from key stakeholders and focus on shared goals. (Thinking about some of these things have been directly influential in how some folks view ongoing UN engagement processes, for example).

  1. ^

    Which should not be read as an endorsement of Timnit-style engagement.

Un Wobbly Panda @ 2023-01-24T01:20 (+13)

Thank you for this useful content and explaining your beliefs.

don't follow Timnit closely, but I'm fairly unconvinced by much of what I think you're referring to RE: "Timnit Gebru-style yelling / sneering", and I don't want to give the impression that my uncertainties are strongly influenced by this, or by AI-safety community pushback to those kinds of sneering. 

My comment is claiming a dynamic that is upstream of, and produces the information environment you are in. This produces your "skepticism" or "uncertainty". To expand on this, without this dynamic, the facts and truth would be clearer and you would not be uncertain or feel the need to update your beliefs in response to a forum post. 

My comment is not implying you are influenced by "Gebru-style" content directly. It is sort of implying the opposite/orthogonal. The fact you felt it necessary to distance yourself from Gebru several times in your comment, essentially because a comment mentioned her name, makes this very point itself.

don't really know what you are referring to RE: folks who share "the same skepticism" (but some more negative version), but I'd be hesitant to agree that I share these views that you are attributing to me, since I don't know what their views are.

Yes, I affirm that "skepticism" or "uncertainty" are my words. (I think the nature of this "skepticism" is secondary to the main point in my comment and the fact you brought this up is symptomatic of the point I made). 

At the same time, I think the rest of your comment suggests my beliefs/representation of you was fair (e.g. sometimes uncharitably seeing 80KH as a recruitment platform for "AI/LT" would be consistent with skepticism).

 

In some sense, my comment is not a direct reply to you (you are even mentioned in third person). I'm OK with this, or even find the resulting response desirable, and it may have been hard to achieve in any other way.

NunoSempere @ 2023-01-24T15:46 (+5)

Here is a post by Scott Alexander which I think might be pointing to a similar phenomenon as what you are hinting at. Money quote:

Consider the war on terror. They say that every time the United States bombs Pakistan or Afghanistan or somewhere, all we’re doing is radicalizing the young people there and making more terrorists. Those terrorists then go on to kill Americans, which makes Americans get very angry and call for more bombing of Pakistan and Afghanistan.

Taken as a meme, it’s a single parasite with two hosts and two forms. In an Afghan host, it appears in a form called ‘jihad’, and hijacks its host into killing himself in order to spread it to its second, American host. In the American host it morphs in a form called ‘the war on terror’, and it hijacks the Americans into giving their own lives (and tax dollars) to spread it back to its Afghan host in the form of bombs.

From the human point of view, jihad and the War on Terror are opposing forces. From the memetic point of view, they’re as complementary as caterpillars and butterflies. Instead of judging, we just note that somehow we accidentally created a replicator, and replicators are going to replicate until something makes them stop.

timunderwood @ 2023-01-24T09:03 (+7)

Object level point:

"I don't have a good inside view on timelines, but when EY says our probability of survival is ~0% this seems like an extraordinary claim that doesn't seem to be very well supported or argued for, and something I intuitively want to reject outright, but don't have the object level expertise to meaningfully do so. I don't know the extent to which EY's views are representative or highly influential in current AI safety efforts, and I can imagine a world where there's too much deferring going on. It seems like some within the community have similar thoughts."

EY's view of doom being basically certain are fairly marginal. They definitely are part of the conversation, and he certainly is not the only person who holds them. But most people who are actively working on AI safety see the odds of survival as much higher than roughly 0% -- and I think most people see the P(doom) as actually much lower than 80%.

The key motivating argument for AI safety being important, even if you think that EY's model of the world might be false (though it also might be true) is that while it is easy to come up with plausible reasons to think that P(doom) is much less than 1, it is very hard to dismiss enough of the arguments for it to get p(doom) close to zero. 

Greg_Colbourn @ 2023-01-25T14:24 (+11)

Yes, I think it's good that there is basically consensus here on AGI doom being a serious problem; the argument seems to be one of degree. Even OP says p(AGI doom by 2070) ~ 10%.

David Johnston @ 2023-01-24T22:38 (+8)

I think Gary Marcus seems to play the role of an “anti-AI-doom” figurehead much more than Timnit Gebru. I don’t even know what his views on doom are, but he has established himself as a prominent critic of “AI is improving fast” views and seemingly gets lots of engagement from the safety community.

I also think Marcus’ criticisms aren’t very compelling, and so the discourse they generate isn’t terribly valuable. I think similarly of Gebru’s criticism (I think it’s worse than Marcus’, actually), but I just don’t think it has as much impact on the safety community.

NunoSempere @ 2023-01-23T22:40 (+6)

Brutal comment, love it

NunoSempere @ 2023-01-23T22:40 (+2)

Thanks!

Pato @ 2023-01-24T08:19 (+20)

Personally I have trouble understanding this post. Could you write simpler?

Aaron_Scher @ 2023-01-24T19:22 (+50)

Here are my notes which might not be easier to understand, but they are shorter and capture the key ideas:

  • Uneasiness about chains of reasoning with imperfect concepts
    • Uneasy about conjunctiveness: It’s not clear how conjunctive AI doom is (AI doom being conjunctive would mean that Thing A and Thing B and Thing C all have to happen or be true in order for AI doom; this is opposed to being disjunctive where either A, or B, or C would be sufficient for AI Doom), and Nate Soares’s response to Carlsmith’s powerseeking AI report is not a silver bullet; there is social pressure in some places to just accept that Carlsmith’s report uses a biased methodology and to move on. But obviously there’s some element of conjunctiveness that has to be dealt with.
    • Don’t trust the concepts: a lot of the early AI Risk discussion’s came before Deep Learning. Some of the concepts should port over to near-term-likely AI systems, but not all of them (e.g., Alien values, Maximalist desire for world domination)
      • Uneasiness about in-the-limit reasoning: Many arguments go something like this: an arbitrarily intelligent AI will adopt instrumental power seeking tendencies and this will be very bad for humanity; progress is pushing toward that point, so that’s a big deal. Often this line of reasoning assumes we hit in-the-limit cases around or very soon after we hit greater than human intelligence; this may not be the case.
      • AGI, so what?: Thinking AGI will be transformative doesn’t mean maximally transformative. e.g., the Industrial revolution was such, because people adapted to it
    • I don’t trust chains of reasoning with imperfect concepts: When your concepts are not very clearly defined/understood, it is quite difficult to accurately use them in complex chains of reasoning.
  • Uneasiness about selection effects at the level of arguments
    • “there is a small but intelligent community of people who have spent significant time producing some convincing arguments about AGI, but no community which has spent the same amount of effort looking for arguments against”
    • The people who don’t believe the initial arguments don’t engage with the community or with further arguments. If you look at the reference class “people who have engaged with this argument for more than 1 hour” and see that they all worry about AI risk, you might conclude that the argument is compelling. However, you are ignoring the major selection effects in who engages with the argument for an hour. Many other ideological groups have a similar dynamic: the class “people who have read the new testament” is full of people who believe in the Christian god, which might lead you to believe that the balance of evidence is in their favor — but of course, that class of people is highly selected for those who already believe in god or are receptive to such a belief.
    • “the strongest case for scepticism is unlikely to be promulgated. If you could pin folks bouncing off down to explain their scepticism, their arguments probably won't be that strong/have good rebuttals from the AI risk crowd. But if you could force them to spend years working on their arguments, maybe their case would be much more competitive with proponent SOTA”
    • Ideally we want to sum all the evidence for and all the evidence against and compare. What happens instead is skeptics come with 20 evidence and we shoot them down with 50 evidence for AI risk. In reality there could be 100 evidence against and only 50 evidence for, and we would not know this if we didn’t have really-well-informed skeptics or we weren’t summing their arguments over time.
    • “It is interesting that when people move to the Bay area, this is often very “helpful” for them in terms of updating towards higher AI risk. I think that this is a sign that a bunch of social fuckery is going on.”
      • “More specifically, I think that “if I isolate people from their normal context, they are more likely to agree with my idiosyncratic beliefs” is a mechanisms that works for many types of beliefs, not just true ones. And more generally, I think that “AI doom is near” and associated beliefs are a memeplex, and I am inclined to discount their specifics.”
  • Miscellanea
    • Difference between in-argument reasoning and all-things-considered reasoning: Often the gung-ho people don’t make this distinction.
    • Methodological uncertainty: forecasting is hard
    • Uncertainty about unknown unknowns: Most of the unknown unknowns seem likely to delay AGI, things like Covid and nuclear war
    • Updating on virtue: You can update based on how morally or epistemically virtuous somebody is. Historically, some of those pushing AI Risk were doing so not for the goal of truth seeking but for the goal of convincing people
    • Industry vs AI safety community: Those in industry seem to be influenced somewhat by AI Safety, so it is hard to isolate what they think
  • Conclusion
    • Main classes of things pointed out: Distrust of reasoning chains using fuzzy concepts, Distrust of selection effects at the level of arguments, Distrust of community dynamics
    • Now in a position where it may be hard to update based on other people’s object-level arguments
NunoSempere @ 2023-01-24T23:27 (+6)

Nice, thanks, great summary.

Pato @ 2023-01-29T19:23 (+2)

Thanks! I think I understood everything now and in a really quick read.

NunoSempere @ 2023-01-24T09:04 (+21)
  • Some people said some things 
  • I think that the things those people said might not be as likely as they say they are, because:
    • I think the things  those people say require many conditions to happen
    • I think the things those people say are worded using fuzzy concepts, and so we can't trust that the concepts are correct
    • I think that in my social circle, there are many people saying those things and few people saying the opposite things, which gives the impression that the things that the first group say are probably true. In particular, the many will explore more arguments and so sound more convincing.
    • Some of the people who said the things are bad, and I used to listen to the bad people a lot.

You might also enjoy this tweet summary.

The above is condescending/a bit too simple, but I thought it was funny, hope you get some use out of it.

Pato @ 2023-01-29T19:12 (+4)

Thank you a lot! I wasn't expecting a summary, I wrote it so maybe you could have it as a consideration for future posts, so I guess I should have written less simple.

Arepo @ 2023-01-26T23:13 (+14)

I strongly agree on the dubious epistemics. A couple of corroborating experiences: 

Also on incentives, I think there's a strong status-based and a reasonable financial incentive for academics to find controversial results that merit a 'more research needed' line at the end of every paper. So if we lived in a world where the impact of some new technology is likely to be relatively minor, I would still expect a subset of academics to make a name for themselves highlighting all the things that could go wrong (cf for example the debates on the ethics of autonomous vehicles, which seems to be a complete nonissue in practice, but seems to have supported a cottage industry of concerned philosophers).

On industry vs the AI safety community, I think we're seriously underupdating on the fact that so many people in industry aren't concerned by this. Google (for eg) engineers are an incredibly intelligent bunch who probably understand AI software substantially better than most AI safety researchers (since there are strong feedback mechanisms on who is employed at Google, whereas success in AI safety seems to be far more subjective). So from that perspective, AI safety is one of the most oversubscribed fields of existential risk.

Lastly, I think going from 90% doom to 10% doom could be more profound than you think. I'm working on a bit of software atm to implement the model I suggested here. I don't think it's going to let us conclude anything very confidently, but that's sort of the point - various plausible inputs seem like they could give virtually any outcome from 'non-extinction catastrophes are almost as bad as extinction' to 'non-extinction catastrophes strongly reduce extinction risk'.

Benjamin Hilton @ 2023-01-27T19:19 (+14)

Hi! Wanted to follow up as the author of the 80k software engineering career review, as I don't think this gives an accurate impression. A few things to say:

  • I try to have unusually high standards for explaining why I believe the things I write, so I really appreciate people pushing on issues like this.
     
  • At the time, when you responded to <the Anthropic person>, you said "I think <the Anthropic person> is probably right" (although you added "I don't think it's a good idea to take this sort of claim on trust for important career prioritisation research"). 
     
  • When I leave claims like this unsourced, it’s usually because I (and my editors) think they’re fairly weak claims, and/or they lack a clear source to reference. That is, the claim is effectively is a piece of research based on general knowledge (e.g. I wouldn't source the claim "Biden is the President of the USA”) and/or interviews with a range of experts, and the claim is weak or unimportant enough not to investigate further. (FWIW I think it’s likely I should have prioritised writing a longer footnote on why I believe this claim.)

    The closest data is the three surveys of NeurIPS researchers, but these are imperfect. They ask how long it will take until there is "human-level machine intelligence". The median expert asked thought there was an around 1 in 4 chance of this by 2036. Of course, it's not clear that HLMI and transformative AI are the same thing, or that thinking HLMI being developed soon necessarily means that HLMI will be made by scaling and adapting existing ML methods. In addition, no survey data pre-dates 2016, so it's hard to say that these views have changed based solely on survey data. (I've written more about these surveys and their limitations here, with lots of detail in footnotes; and I discuss the timelines parts of those surveys in the second paragraph here.)

    As a result, when I made this claim I was relying on three things. First, that there are likely correlations that make the survey data relevant (i.e., that many people answering the survey think that HLMI will be relatively similar to or cause transformative AI, and that many people answering the survey think that if HLMI is developed soon that suggests it will be ML-based). Second, that people did not think that ML could produce HLMI in the past (e.g. because other approaches like symbolic AI were still being worked on, because texts like Superintelligence do not focus on ML and this was not widely remarked upon at the time despite that book’s popularity, etc.). Third, that people in the AI and ML fields who I spoke to had a reasonable idea of what other experts used to think and how that has changed (note I spoke to many more people than the one person who responded to you in the comments on my piece)!

    It's true that there may be selection bias on this third point. I'm definitely concerned about selection bias for shorter timelines in general in the community, and plan to publish something about this at some point. But in general I think that the best way, as an outsider, to understand what prevailing opinions are in a field, is to talk to people in that field – rather than relying on your own ability to figure out trends across many papers, many of which are difficult to evaluate, many of which may not replicate. I also think that asking about what others in the field think, rather than what the people you're talking to think, is a decent (if imperfect) way of dealing with that bias.

    Overall, I thought the claim I made was weak enough (e.g. "many experts" not "most experts" or "all experts") that I didn't feel the need to evaluate this further.
     
  • It's likely, given you’ve raised this, that I should have put this all in a footnote. The only reason I didn't is that I try to prioritise, and I thought this claim was weak enough to not need much substantiation. I may go back and change that now (depending on how I prioritise this against other work).
Arepo @ 2023-01-27T23:36 (+6)

Thanks Benjamin, I upvoted. Some things to clarify on my end:

  • I think the article as a whole was good, or I would have said so!
  • I did and do think a) that Anthropic Person (AP)  was probably right, b) that their attitude was nevertheless irresponsible and epistemically poor and c) that I made it clear that I thought despite them being probably right this needed more justification at the time (the last exchange I have a record of was me reopening the comment thread that you'd resolved to state that I really did think this was important and you re-closing it without further comment)
  • My concern with poor epistemics was less with reference to you - I presume you were working under time constraints  in an area you didn't have specialist knowledge on - than to AP, who had no such excuse.
  • I would have had no factual problem with the claim 'many experts believe'. The phrasing that I challenged, and the grounds I gave for challenging it was that 'many experts now believe'  (emphasis mine) implies positive change over time - that the proportion of experts who believe this is increasing. That doesn't seem anything like as self-evident as a comment about the POTUS. 
  • Fwiw I think the rate of change (and possibly even second derivative) of expert beliefs on such a speculative and rapidly evolving subject is much more important than the absolute number or even proportion of experts with the relevant belief, especially since it's very hard to define who even qualifies as an expert in such a field (per my comment, most of the staff at Google - and other big AI companies - could arguably qualify)
  • If you'd mentioned that other experts you'd spoken to had a sense that sentiment was changing (and mentioned those conversations as a pseudocitation) I would have been substantially less concerned by the point - though I do think it's important enough to merit proper research (though such research would have probably been beyond the scope of your 80k piece), and not to imply stronger conclusions than the evidence we have merits.
  • I am definitely worried about selection bias - the whole concept of 'AI safety researcher' screams it (not that I think you only spoke to such people, but any survey which includes them and excludes 'AI-safety-concern-rejecting researcher/developer' seems highly likely to get prejudicial results)

But in general I think that the best way, as an outsider, to understand what prevailing opinions are in a field, is to talk to people in that field – rather than relying on your own ability to figure out trends across many papers, many of which are difficult to evaluate, many of which may not replicate. I also think that asking about what others in the field think, rather than what the people you're talking to think, is a decent (if imperfect) way of dealing with that bias.

I don't understand what counterclaim you're making here. I strongly agree that the opinions of experts are very important, hence my entire concern about this exchange!

Benjamin Hilton @ 2023-01-30T15:15 (+2)

Thanks for this! Looks like we actually roughly agree overall :)

aogara @ 2023-01-27T23:31 (+4)

I think it's noteworthy that surveys from 2016, 2019, and 2022 have all found roughly similar timelines to AGI (50% by ~2060) for the population of published ML researchers. On the other hand, the EA and AI safety communities seem much more focused on short timelines than they were seven years ago (though I don't have a source on that). 

Benjamin Hilton @ 2023-01-30T15:32 (+7)

There are important reasons to think that the change by the EA community is within the measurement error of these surveys, which makes this less noteworthy.

(Like say you put +/- 10 years and +/- 10% on all these answers - note there are loads of reasons why you wouldn't actually assess the uncertainty like this, (e.g. probabilities can't go below 0 or above 1), but just to get a feel for the uncertainty this helps. Well, then you get something like:

  •  10%-30% chance of TAI by 2026-2046
  • 40%-60% by 2050-2070
  • and 75%-95% by 2100

Then many many EA timelines and shifts in EA timelines fall within those errors.)

Reasons why these surveys have huge error

1. Low response rates.

The response rates were really quite low.

2. Low response rates + selection biases + not knowing the direction of those biases

The surveys plausibly had a bunch of selection biases in various directions.

This means you need a higher sample to converge on the population means, so the surveys probably aren't representative. But we're much less certain in which direction they're biased.

Quoting me:

For example, you might think researchers who go to the top AI conferences are more likely to be optimistic about AI, because they have been selected to think that AI research is doing good. Alternatively, you might think that researchers who are already concerned about AI are more likely to respond to a survey asking about these concerns

3. Other problems, like inconsistent answers in the survey itself

AI impacts wrote some interesting caveats here, including:

  • Asking people about specific jobs massively changes HLMI forecasts. When we asked some people when AI would be able to do several specific human occupations, and then all human occupations (presumably a subset of all tasks), they gave very much later timelines than when we just asked about HLMI straight out. For people asked to give probabilities for certain years, the difference was a factor of a thousand twenty years out! (10% vs. 0.01%) For people asked to give years for certain probabilities, the normal way of asking put 50% chance 40 years out, while the ‘occupations framing’ put it 90 years out. (These are all based on straightforward medians, not the complicated stuff in the paper.)
  • People consistently give later forecasts if you ask them for the probability in N years instead of the year that the probability is M. We saw this in the straightforward HLMI question, and most of the tasks and occupations, and also in most of these things when we tested them on mturk people earlier. For HLMI for instance, if you ask when there will be a 50% chance of HLMI you get a median answer of 40 years, yet if you ask what the probability of HLMI is in 40 years, you get a median answer of 30%.

The 80k podcast on the 2016 survey goes into this too.

NunoSempere @ 2023-01-26T23:58 (+11)

Cheers! You might want to follow up with 80,000 hours on the epistemics point, e.g., alexrjl would probably be interested in hearing about and then potentially addressing your complaints.

Greg_Colbourn @ 2023-01-25T10:25 (+11)

This seems like a pretty general argument for being sceptical of anything, including EA!

NunoSempere @ 2023-01-25T11:40 (+9)

I agree. In particular, it's a pretty general argument to be more skeptical of something than its most gung-ho advocates. And its stronger the more you think that these dynamics are going on. But for example, e.g., physics subject to experimental verification pretty much void this sort of objection.

Greg_Colbourn @ 2023-01-25T10:44 (+2)

Also, the ending rubbed me up the wrong way a bit(!):

"Lastly, I would loathe it if the same selection effects applied to this document: If I spent a few days putting this document together, it seems easy for the AI safety community to easily put a few cumulative weeks into arguing against this document, just by virtue of being a community."

Is this basically saying that you aren't interested in engaging in the object level  arguments? (Or the meta level arguments either, for that matter?) As you say:

Readers might want to keep in mind that parts of this post may look like a bravery debate.

And:

It’s not clear to me whether I have bound myself into a situation in which I can’t update from other people’s object-level arguments.

NunoSempere @ 2023-01-25T11:28 (+4)

Well, no, that was me saying that I thought it was a live possibility that this comment section would be swarmed by people with the opposite opinion in a way which was unpleasant for me. This didn't end up happening!

D0TheMath @ 2023-01-24T00:52 (+9)

I like this, and think its healthy. I recommend talking to Quintin Pope for a smart person who has thought a lot about alignment, and came to the informed, inside-view conclusion that we have a 5% chance of doom (or just reading his posts or comments). He has updated me downwards on doom a lot.

Hopefully it gets you in a position where you're able to update more on evidence that I think is evidence, by getting you into a state where you have a better picture of what the best arguments against doom would be.

MichaelDickens @ 2023-01-24T06:26 (+25)

Is 5% low? 5% still strikes me as a "preventing this outcome should plausibly be civilization's #1 priority" level of risk.

Lukas_Gloor @ 2023-01-25T13:01 (+4)

Yes to (paraphrased) "5% should plausibly still be civilization's top priority."

However, in another sense, 5% is indeed low!

I think that's a significant implicit source of disagreement over AI doom likelihoods – what sort of priors people start with.

The following will be a bit simplistic (in reality proponents of each side will probably state their position in more sophisticated ways).
 
On one side, optimists may use a prior of "It's rare that humans build important new technology and it doesn't function the way it's intended."

On the other side, pessimists can say that it has almost never happened that people who developed a revolutionary new technology displayed a lot of foresight about its long-term consequences when they started using it. For instance, there were comparatively few efforts at major social media companies to address ways in which social media might change society for the worse. Or, same reasoning for the food industry and the obesity epidemic or online dating and its effects on single parenthood rates. 

I'm not saying revolutions in these sectors were overall negative for human happiness – just that there are what seems to be costly negative side-effects where no one competent has ever been "in charge" of proactively addressing them (nor do we have good plans to address them anytime soon). So, it's not easily apparent how we'll suddenly get rid of all these issues and fix the underlying dynamics, apart from "AI will give us god-like power to fix everything." The pessimists can argue that humans have never seemed particularly "in control" over technological progress. There's this accelerating force that improves things on some metrics but makes other things worse elsewhere. (Pinker-style arguments for the world getting better seem one-sided to me – he mostly looks at trends that were already relevant 100s of years ago, but doesn't talk about "newer problems" that only arose as Molochian side-effects of technological progress.) 
 AI will be the culmination of all that (of the accelerating forces that have positive effects on immediately legible metrics, but negative effects on some other variables due to Molochian dynamics). Unless we use it to attain a degree of control that we never had, it  won't go well.  
To conclude, there's a sense in which believing "AI doom risk is only 5%" is like believing that there's a 95% that AI will solve all the world's major problems. Expressed in that way, it seems like a pretty strong claim.

(The above holds especially for definitions of "AI doom" where humanity would lose most of its long-term "potential." That said, even if by "AI doom" one means something like "people all die," it stands to argue that one likely endpoint/attractor state from not being able to fix all the world's major problems will be people's extinction, eventually.)

I've been meaning to write a longer post on these topics at some point, but may not get to it anytime soon.

D0TheMath @ 2023-01-25T17:48 (+3)

Eh, I don’t think this is a priors game. Quintin has lots of information, I have lots of information, so if we were both acting optimally according to differing priors, our opinions likely would have converged.

In general I’m skeptical of arguments of disagreement which reduce things to differing priors. It’s just not physically or predictively correct, and it feels nice because now you no longer have an epistemological duty to go and see why relevant people have differing opinions.

Lukas_Gloor @ 2023-01-25T19:49 (+3)

That would be a valid reply if I had said it's all about priors. All I said was that I think priors make up a significant implicit source of the disagreement – as suggested by some people thinking 5% risk of doom seems "high" and me thinking/reacting with "you wouldn't be saying that if you had anything close to my priors."

Or maybe what I mean is stronger than "priors." "Differences in underlying worldviews" seems like the better description. Specifically, the worldview I identify more with, which I think many EAs don't share, is something like "The Yudkowskian worldview where the world is insane, most institutions are incompetent, Inadequate Equilibria is a big deal, etc." And that probably affects things like whether we anchor way below 50% or above 50% on what the risks should be that the culmination of accelerating technological progress will go well or not.

In general I’m skeptical of arguments of disagreement which reduce things to differing priors. It’s just not physically or predictively correct, and it feels nice because now you no longer have an epistemological duty to go and see why relevant people have differing opinions.

That's misdescribing the scope of my point and drawing inappropriate inferences. Last time I made an object-level argument about AI misalignment risk was  just 3h before your comment. (Not sure it's particularly intelligible, but the point is, I'm trying! :) )
 So, evidently, I agree that a lot of the discussion should be held at a deeper level than the one of priors/general worldviews.

Quintin has lots of information, I have lots of information, so if we were both acting optimally according to differing priors, our opinions likely would have converged.

I'm a fan of Shard theory and some of the considerations behind it have already updated me towards a lower chance of doom than I had before starting to incorporate it more into my thinking. (Which I'm still in the process of doing.)

D0TheMath @ 2023-01-24T06:46 (+4)

Yeah, he’s working on it, but its not his no. 1 priority. He developed shard theory.

ben.smith @ 2023-01-24T16:40 (+3)

Agreed, but 5% is much lower than "certain or close to certain", which is the starting point Nuno Sempere said he was sceptical of.

I don't know that anyone thinks doom is "certain or close to certain", though the April 1 post could be read that way. 5% is also much lower than, say, 50%, which seems to be a somewhat more common belief.

NunoSempere @ 2023-01-24T01:02 (+3)

Thanks!

Arepo @ 2023-01-26T00:07 (+2)

What outcome does he specifically predict 5% probability of?

Ozzie Gooen @ 2023-01-26T23:29 (+8)

I earlier gave some feedback on this, but more recently spent more time with it. I sent these comments to Nuno, and thought they could also be interesting to people here.

Greg_Colbourn @ 2023-01-25T13:37 (+6)

The conjunctive/disjunctive dichotomy seems to be a major crux when it comes to AI x-risk. How much do belief in human progress, belief in a just world, the Long Peace, or even deep-rooted-by-evolution (genetic) collective optimism (all things in the "memetic water supply") play into the belief that the default is not-doom? Even as an atheist, I think it's sometimes difficult (not least because it's depressing to think about) to fully appreciate that we are "beyond the reach of God".

NunoSempere @ 2023-01-24T01:37 (+6)

Note: After talking with an instructor, I added a "brief aside" section pointing out that ESPR separated itself from CFAR in 2019 and has been trying to mitigate the factors I complain about since then. This is important for ESPR in terms of not having its public reputation destroyed, but doesn't really affect the central points in the post.

slg @ 2023-01-26T04:22 (+5)

Appreciated this post! Have you considered crossposting this to Lesswrong?  Seems like an important audience for this. 

NunoSempere @ 2023-01-26T12:33 (+3)

Hey, I considered it and decided not to, but you are welcome to cross post it (or the original blog post <https://nunosempere.com/blog/2023/01/23/my-highly-personal-skepticism-braindump-on-existential-risk/>)

slg @ 2023-01-26T13:32 (+4)

Alright; I'll do so later today!

Arturo Macias @ 2023-01-24T09:03 (+5)

I completely agree with this position, but my take is different: Nuclear war risk is high all the time, and all geopolitical and climate risks can increase it.  It is perhaps not existential for the species, but certainly it is for cilivization. Given this, for me it is the top risk, and to some extent, all efforts for progress, political stabilization, climate risk mitigation are modestly important in themselves, and massively important to affect nuclear war risk.

Now, the problem with AI risk is that our understanding of why and how IA works is limited. If my understaing is correct, we have constructed Alpha Zero mainly by growing it, not by designing it. We really dont understand "how it works". The "black box risk" is huge, and until we have a better theoretical understanding of AI , all efforts will be mainly useless. The "information bottleneck principle" tried it, but interest on it faded. I think other generalizing principles have not been proposed, but I am a user, not a developer, so I could be wrong. 

Stephen Clare @ 2023-01-24T14:10 (+4)

Would you mind writing a bit more about the connection between climate change and nuclear risk?

Arturo Macias @ 2023-01-24T14:23 (+6)

It increases large migrations, political unstability, drought... and that create geopolitical unstability, and the probability of conventional war, revolution, etc and those events can easily trigger a nucelar as long a nuclear power is involved. Do the math: around 1/3 of people lives in nuclear armed nations. 

Nuclear war turn historical risk into something that can have geological time consequences. I don't believe there is really any other existential risk (on the timescale of decades) other than nuclear war.  There is only one "precipice" but many ways to fall there.

Roman Leventov @ 2023-01-24T17:40 (+3)
  • Alien values
  • Maximalist desire for world domination
  • Convergence to a utility function
  • Very competent strategizing, of the “treacherous turn” variety
  • Self-improvement

Alien values are guaranteed unless we explicitly impart non-alien ethics to AI, which we currently don't know how to do, and don't know (or can't agree) what that ethics should be like. Next two points are synonyms and are also basically synonyms to "alien values". The treacherous turn is indeed unlikely (link).

Self-improvement is given, the only question is where is the "ceiling" of this improvement. It might not be that "far", by some measure, from human intelligence, or that difference may still not allow AI to plan that far ahead due to the intrinsic unpredictability of the world. So the world may start to move extremely fast (see below), but the horizon of planning and predictability of that movement may not be longer than it is now (or it could be even shorter).

For a given operationalization of AGI, e.g., good enough to be forecasted on, I think that there is some possibility that we will reach such a level of capabilities, and yet that this will not be very impressive or world-changing, even if it would have looked like magic to previous generations. More specifically, it seems plausible that AI will continue to improve without soon reaching high shock levels which exceed humanity’s ability to adapt.

I think you implicitly underestimate the cost of coordination among humans. Huge corporations are powerful but also very slow to act. AI corporations will be very powerful and also very fast and potentially very coherent in their strategy. This will be a massive change.

Sharmake @ 2023-01-23T21:05 (+2)

The point is this distorts the apparent balance of reason - maybe this is like Marxism, or NGDP targetting, or Georgism, or general semantics, perhaps many of which we will recognise were off on the wrong track.

I do note one is not like the others here. Marxism is probably way more wrong than any of the other beliefs, and I feel like the inclusion of the others rather weakens the case here.

NunoSempere @ 2023-01-23T21:33 (+2)

Note that you can also think of many other topics people feel strongly about and how the balance of reason looks like e.g., feminist theory, monarchism & divine right of kings, anarcho-capitalist theory, Freudian psychology, Maxwell's theory of electromagnetism, intelligent design, etc.

Arturo Macias @ 2023-01-24T11:59 (+5)

"Maxwell's theory of electromagnetism"

Is this a test of attention? Or there is something I miss?

NunoSempere @ 2023-01-24T12:30 (+3)

Well, I don't want to make the implication that everything that people believe in strongly is wrong. For example, in the case of Maxwell's theory of electromagnetism, community dynamics basically don't matter in the face of overwhelming experimental verification. 

NunoSempere @ 2023-01-24T12:33 (+8)

Freudian psychology is another case which I think is interesting, in that it was pre-paradigmatic, and had both insights that have been incorporated into the mainstream (e.g., to analyze subconscious motivations) as well as some batshit insane stuff.

Arturo Macias @ 2023-01-24T14:00 (+3)

All the rest of the list is between "utterly wrong" and "quite controversial", except for electromagnetism, that is "almost completely rigth"!

NunoSempere @ 2023-01-24T14:32 (+2)

I don't have a point I'm arriving at here, but here is a paragraph from a philosophy of science book I'm reading that might be interesting:

& in general the history of the theory of electromagnetism is fairly interesting.

Arturo Macias @ 2023-01-24T15:11 (+3)

Now I see: my mathematician bias in action. For me maxwellian elctromagnetism means Maxwell (or Heaviside) equations. Of course, I have no idea about "physical interpretations" (either then or now). All physics I now come from "Physics for mathematicians" books.

Greg_Colbourn @ 2023-01-25T10:25 (+6)

This is an important point. The difficulty with AGI x-risk is that experimental verification isn't really possible (short of catastrophic-but-not-existential warning shots, that the most doomy people think are unlikely). Can anyone steelman with any strongly held beliefs that are justified without resort to overwhelming empirical verification? Maybe certain moral beliefs like the Golden Rule? But what about risks? Is there precedent with non-empirically-verified commonly accepted belief in a risk?

NunoSempere @ 2023-01-25T11:43 (+1)

Belief in "preventing nuclear war from producing widespread annihilation is important" seems reasonably widespread, and it is supported by empirical evidence that nuclear bombs are possible, even though the claim that nuclear war would be such that it would produce widespread annihilation hasn't been verified. But of course you can see that such widespread annihilation would be possible, by bombing the most populous cities in order.

Greg_Colbourn @ 2023-01-25T12:24 (+3)

Yeah, I thought about nuclear risk, but Hiroshima and Nagasaki seem like good enough evidence for the possibility of widespread annihilation (or even Trinity for that matter). This would only be a good example if there was widespread appreciation for GCR potential from nuclear risk before any nuclear detonations. I don't think there was? (Especially considering that there was only a few short years (1933 - 1945) from theory to practice with the nuclear chain reaction.)

Sabs @ 2023-01-25T07:34 (+4)

Shakespeare authorship is another classic one. Oxfordians will usually know far more about Shakespeare than you or I do!