Against most, but not all, AI risk analogies
By Matthew_Barnett @ 2024-01-14T19:13 (+43)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullDavidmanheim @ 2024-01-15T07:24 (+12)
"The analogies establish almost nothing of importance about the behavior and workings of real AIs"
You seem to be saying that there is some alternative that establishes something about "real AIs," but then you admit these real AIs don't exist yet, and you're discussing "expectations of the future" by proxy. I'd like to push back, and say that I think you're not really proposing an alternative, or that to the extent you are, you're not actually defending that alternative clearly.
I agree that arguing by analogy to discuss current LLM behavior is less useful than having a working theory of interpretability and LLM cognition - though we don't have any such theory, as far as I can tell - but I have an even harder time understanding what you're proposing is a superior way of discussing a future situation that isn't amenable to that type of theoretical analysis, because we are trying to figure out where we do and do not share intuitions, and which models are or are not appropriate for describing the future technology. And I'm not seeing a gears level model proposed, and I'm not seeing concrete predictions.
Yes, arguing by analogy can certainly be slippery and confusing, and I think it would benefit from grounding in concrete predictions. And use of any specific base rates is deeply contentious, since reference classes are always debateable. But at least it's clear what the argument is, since it's an analogy. In opposition to that, arguing by direct appeal to your intuitions, where you claim your views are a "straightforward extrapolation of current trends" is being done without reference to your reasoning process. And that reasoning process, because it doesn't have a explicit gears level model, is based on informal human reasoning and therefore, as Lakens argues, deeply rooted in metaphor anyways, seems worse - it's reasoning by analogy with extra steps.
For example, what does "straightforward" convey, when you say "straightforward extrapolation"? Well, the intuition the words build on is that moving straight, as opposed to extrapolating exponentially or discontinuously, is better or simpler. Is that mode of prediction easier to justify than reasoning via analogies to other types of minds? I don't know, but it's not obvious, and dismissing one as analogy but seeing the other as "straightforward" seems confused.
Chris Leong @ 2024-01-14T21:53 (+10)
Even if there are risks to using analogies with persuasion, we need analogies in order to persuade people. While a lot of people here are strong abstract thinkers, this is really rare. Most people need something more concrete to latch onto. Uniform disarmament here is a losing strategy; and not justified here as I don't think the analogies are as weak as you think. If you tell me what you consider to be the two weakest analogies above, I'm sure I'd be pretty to steelman at least one of them.
If we want to improve epistemics, a better strategy would probably be to always try to pair analogies (at least for longer texts/within reason). So identify an analogy to describe how you think about AI, identify an alternate plausible analogy for how you should think about it and then explain why your analogy is better/whereabouts you believe AI lies between the two.
Many proponents of AI risk seem happy to critique analogies when they don't support the desired conclusion, such as the anthropomorphic analogy.
Of course! Has there ever been a single person in the entire world who has embraced all analogies instead of useful and relevant analogies?
Maybe you're claiming that AI risk proponents reject analogies in general when someone is using an analogy that supports the opposite conclusion, but accepting the validity of analogies when it supports their conclusion. If this were the case, it would be bad, but I don't actually think this is what is happening. My guess would be that you've seen situations where someone has used an analogy to critique AI safety and then the AI safety person said something along the lines, "Analogies are often misleading" and you took this as a rejection of analogies in general as opposed to a reminder to check whether the analogy actually applies.
Matthew_Barnett @ 2024-01-14T22:06 (+5)
Maybe you're claiming that AI risk proponents reject analogies in general when someone is using an analogy that supports the opposite conclusion, but accepting the validity of analogies when it supports their conclusion. If this were the case, it would be bad, but I don't actually think this is what is happening.
Then perhaps you can reply to the examples I used in the post when arguing that analogies are often used selectively? I named two examples: (1) a preference for an analogy to chimps rather than to golden retrievers when arguing about AI alignment, and (2) a preference for an analogy to human evolution rather than an analogy to within-lifetime learning when arguing about inner misalignment.
I do think that a major element of my thesis is that many analogies appear to be chosen selectively. While I advocate that we should not merely switch analogies, I think if we are going to use analogies-as-arguments anyway, then we should try to find ones that are the most plausible, and natural. And I don't currently see much reason to prefer to chimp and evolution analogies over their alternatives in that case.
Chris Leong @ 2024-01-20T05:22 (+8)
I actually thought that the discussion of the chimp analogy was handled pretty well in the podcast. Ajeya brought up that example and then Rob explicitly brought up an alternate mental model of it being a tool (like Google Maps). Discussing multiple possible mental models is exactly want you want to be doing to guard against biases. I agree that it would have be nice to discuss an analogy more like a golden retriever or kid as well, but there's always additional issues that could be discussed.
I agree Ajeya didn't really provide her reasons for seeing the chimp analogy as useful there, but I think it's valuable as a way of highlighting the AI equivalent of the nature vs. nurture debate. Many people talk about AI's using the analogy of children and they assume that we can produce moral AI's by just treating them well/copying good human parenting strategies. I think the chimp analogy is useful as a way of highlighting that appearance can be decieving.
Matthew_Barnett @ 2024-01-20T20:42 (+2)
I actually thought that the discussion of the chimp analogy was handled pretty well in the podcast. Ajeya brought up that example and then Rob explicitly brought up an alternate mental model of it being a tool (like Google Maps)
The tool analogy appeared to have been brought up as a way of strawmanning/weakmanning people who disagree with them. I think the analogy to Google Maps is not actually representative of how most intelligent AI optimists reason about AI as of 2023 (even if Holden Karnofsky used it in 2012, before the deep learning revolution). The full quote was,
Rob Wiblin: Right. I guess the idea there is that you might think that the chimp is learning that people are to be trusted and it’s all good, but it’s a different mind that thinks differently and draws different conclusions, and it might have particular tendencies that are not obvious to you, particular impulses that are not relatable to you.
The shrinking number of people who are not troubled by any of this at all, I assume that most of them have a different analogy in mind, which is like a can opener or a toaster. OK, that’s a little bit silly. To be more sympathetic, the analogy that they have in their mind is that this is a tool that we’ve made, that we’ve designed.
Ajeya Cotra: Like Google Maps.
Rob Wiblin: Like Google Maps. “We designed it to do the thing that we want. Why do you think it’s going to spin out of control? Tools that we’ve made have never spun out of control and started acting in these bizarre ways before.” If the analogy you have in mind is something like Google Maps, or your phone, or even like a recommendation algorithm, it makes sense that it’s going to seem very counterintuitive in that case to think that it’s going to be dangerous. It’ll be way less intuitive in that case than in the case where you’re thinking about raising a gorilla.
Ajeya Cotra: Yeah. I think the real disanalogy between Google Maps and all of this stuff and AI systems is that we are not producing these AI systems in the same way that we produced Google Maps: by some human sitting down, thinking about what it should look like, and then writing code that determines what it should look like.
Many people talk about AI's using the analogy of children and they assume that we can produce moral AI's by just treating them well/copying good human parenting strategies. I think the chimp analogy is useful as a way of highlighting that appearance can be decieving.
As I said in the post, I think the chimp analogy can be good for conveying the logical possibility of misalignment. Indeed, appearances can be deceiving. I don't see any particularly strong reasons to think appearances actually are deceiving here. What evidence is there that AIs won't actually just be aligned by default given good "parenting strategies" i.e. reasonably good training regimes? (And again, I'm not saying AIs will necessarily be aligned by default. I just think this question is uncertain, and I don't think the chimp analogy is actually useful as a mental model of the situation here.)
Chris Leong @ 2024-01-20T22:16 (+2)
There are lots of people who think about AI as a tool.
Matthew_Barnett @ 2024-01-21T04:39 (+3)
A lot of people think about AI in all sorts of inaccurate ways, including those who argue for AI pessimism. "AI is like Google Maps" is not at all how most intelligent AI optimists such as Nora Belrose, Quintin Pope, Robin Hanson, and so on, think about AI in 2024. It's a weakman, in a pretty basic sense.
RobertM @ 2024-01-15T07:10 (+1)
I think that neither of those are selective uses of analogies. They do point to similarities between things we have access to and future ASI that you might not think are valid similarities, but that is one thing that makes analogies useful - they can make locating disagreements in people's models very fast, since they're structurally meant to transmit information in a highly compressed fashion.
skluug @ 2024-02-07T14:57 (+1)
Interesting post! I think analogies are good for public communication but not for understanding things at a deep level. They're like a good way to quickly template something you haven't thought about at all with something you are familiar with. I think effective mass communication is quite important and we shouldn't let the perfect be the enemy of the good.
I wouldn't consider my Terminator comparison an analogy in the sense of the other items on this list. Most of the other items have the character of "why might AI go rogue?" and then they describe something other than AI that is hard to understand or goes rogue in some sense and assert that AI is like that. But Terminator is just literally about an AI going rogue. It's not so much an analogy as a literal portrayal of the concern. My point wasn't so much that you should proactively tell people that AI risk is like Terminator, but that people are just going to notice this on their own (because it's incredibly obvious), and contradicting them makes no sense.