[Linkpost] Sam Harris on "The Fall of Sam Bankman-Fried"

By michel @ 2022-11-16T05:35 (+104)

This is a linkpost to https://www.samharris.org/podcasts/making-sense-episodes/303-the-fall-of-sam-bankman-fried

Sam Harris just shared a 20 min podcast with his thoughts on the FTX crash, Sam Bankman-Fried (SBF), and effective altruism. I think he had a good take, and I'm glad he shared it.

Highlights (written retrospectively; hope I'm not misrepresenting):

Sam Harris discusses how he didn't expect what looks like serious wrongdoing – and likely fraud – from SBF, who he had glorified on his podcast in December 2021 for an episode on Earning to Give. Even listening to the podcast back again, Sam Harris didn't detect any deviousness in SBF.
He acknowledges there's a lot we don't know yet about SBF's character and how long this has been going on.
Sam mentions that he's steered clear of the EA community because he has always found it "cult-like"
But he defends EA principles and says that this whole situation means "exactly zero" for his commitment to those ideas.
- He compares criticizing EA ideas in light of FTX to criticizing the scientific method in light of elite scientists faking results.
He extends his defense to utilitarianism and consequentialism: if you're critiquing consequentialism ethics on the basis that it led to bad consequences you're "seriously confused."

RobBensinger @ 2022-11-16T06:20 (+34)

if you're critiquing consequentialism ethics on the basis that it led to bad consequences you're "seriously confused."

... Though it's totally valid for a consequentialist to say "consequentialism at the level of actions has worse consequences than consequentialism at the level of policies/dispositions, so I'll do the latter kind of consequentialism".

(And, similarly, it's valid for a consequentialist to observe that causal decision theory has worse consequences than LDT/FDT/UDT, and ditch causal decision theory on that basis. There are better and worse ways of doing consequentialism.)

Arepo @ 2022-11-16T10:46 (+1)

'Valid' yes, but we should also be cautious of assuming valid = good idea. Plenty of people do 'naive' consequentialism extremely well - that's arguably what the EA movement was founded on - so we shouldn't assume it or CDT are necessarily inadequate merely because people have invented more complex alternatives.

RobBensinger @ 2022-11-16T12:45 (+4)

'Valid' yes, but we should also be cautious of assuming valid = good idea.

I mean, the arguments I stated are valid and the premises are true, so the conclusions are indeed good ideas. Like, we should in-real-life do consequentialism at the policy level, not at the level of individual actions; this is not just a hypothetical.

Plenty of people do 'naive' consequentialism extremely well - that's arguably what the EA movement was founded on

Arguable indeed! I claim that this is false. (E.g., https://www.lesswrong.com/s/waF2Pomid7YHjfEDt/p/K9ZaZXDnL3SEmYZqB and Timeless Decision Theory were in the water 2-3 years before the term "effective altruism" was coined.)

so we shouldn't assume it or CDT are necessarily inadequate merely because people have invented more complex alternatives.

CDT is inadequate because it gets you less utility, not because it's simpler. And I don't know why you think FDT/UDT/LDT is more complex than CDT in the first place; I'd say CDT is more complex.

Arepo @ 2022-11-16T15:03 (+5)

Like, we should in-real-life do consequentialism at the policy level, not at the level of individual actions; this is not just a hypothetical.

What does 'not just a hypothetical' mean? You've asserted something about ethics on which I and many philosophers disagree. I therefore claim that you shouldn't treat its 'validity' as showing it's a good idea. You seem to just want to assert it is anyway.

Arguable indeed! I claim that this is false. (E.g., https://www.lesswrong.com/s/waF2Pomid7YHjfEDt/p/K9ZaZXDnL3SEmYZqB and Timeless Decision Theory were in the water 2-3 years before the term "effective altruism" was coined.) ... CDT is inadequate because it gets you less utility

Firstly, the EA movement started several years before the term was coined - I believe Toby started taking GWWC pledges around 2006, Givewell formally launched in 2007, and both were presumably doing research into the subject for quite a while beforehand. The rationalist community also wasn't involved from the start, and since many rationalists reject effective altruism I don't think you can claim its proponents were instrumental in its founding.

Secondly, I have seen neither formal or empirical proof that CDT 'gets you less utility' than any of the rationalist community's alternatives. The arguments for the latter are all ultimately philosophical, and many smart people seem unconvinced by them.

RobBensinger @ 2022-11-16T19:27 (+4)

I therefore claim that you shouldn't treat its 'validity' as showing it's a good idea. You seem to just want to assert it is anyway.

I wrote a comment reminding EAs that you should do consequentialism at the level of policies, not at the level of individual actions. I happened to use the word "valid" in that comment. You replied "'Valid' yes, but we should also be cautious of assuming valid = good idea.", so I clarified that indeed, I'm claiming that this is an actually good idea, not a version of "valid" that's non-action-relevant (e.g., because the premises are false).

I don't require all philosophers to agree with me before I'm willing to say something, and I'm happy to hear counter-arguments if you think I'm wrong.

Firstly, the EA movement started several years before the term was coined - I believe Toby started taking GWWC pledges around 2006, Givewell formally launched in 2007, and both were presumably doing research into the subject for quite a while beforehand. The rationalist community also wasn't involved from the start, and since many rationalists reject effective altruism I don't think you can claim its proponents were instrumental in its founding.

Seems pretty arbitrary to me to decide that GiveWell and GWWC were forerunners for EA, but LW, FHI, etc. weren't. A lot of the people at GiveWell meetups in the early days were rationalists, the Oxford community nucleated in part around rationalists like Bostrom, EAs like Holden Karnofsky were reading the Sequences and attending Singularity Summits at an early date, etc.

In practice, early EA formed from a confluence of GiveWell, Oxford philosophers (including rationalists like Bostrom), the Extropians mailing list and LessWrong, the ethos and cause areas of Peter Singer, and various smaller strains. You can decide to use labels in a way that carves out some of those as "real EA" and some as "not real EA", but this doesn't change the sociological reality of which people were talking to which other people, who was donating to and working on standard EA cause areas early on, who was endorsing EA principles early on, etc.

Secondly, I have seen neither formal or empirical proof that CDT 'gets you less utility' than any of the rationalist community's alternatives.

CDT gets you less utility in, for example, the twin prisoner's dilemma and Newcomb's problem, Death in Damascus, Ahmed's random coin, Death on Olympus: https://www.pdcnet.org/jphil/content/jphil_2020_0117_0005_0237_0266 . Along with Newcomb's transparent problem and Parfit's hitchhiker and mechanical blackmail: https://arxiv.org/pdf/1710.05060.pdf.

CDT proponents can hope to one day discover a similarly realistic and general class of problems where CDT outperforms FDT, thus making the choice more complicated or arbitrary; but there are good theoretical arguments to expect such a class not to exist (since FDT is in a sense just CDT plus taking into account more of the world's actual structure), and regardless, it's not open to debate whether FDT agents get more utility than CDT agents in the above dilemmas.

Traditional CDT is just transparently wrong, at least if you care about utility more than you care about any particular cognitive ritual you use in the pursuit of utility. The usual response from CDT proponents isn't "you're wrong, it's CDT that tends to get more utility"; it's "utility is the wrong criterion to use to choose between these decision theories".

Arepo @ 2022-11-17T13:41 (+3)

First of all, I want to apologise if I came across as rude in the previous comments. The revelations of the last week have been frustrating on many counts, and I find myself ever more sceptical of EA managers. This is obviously nothing against you personally. With that said:

I clarified that indeed, I'm claiming that this is an actually good idea, not a version of "valid" that's non-action-relevant (e.g., because the premises are false).

Thank you for clarifying. My objection is not so much what you claim as a good idea, but how you claim it:

'There are better and worse ways of doing consequentialism'

'this is not just a hypothetical.'

'CDT is inadequate'

'it's not open to debate'

'Traditional CDT is just transparently wrong'

You say you're happy to hear counter-arguments, but it sounds very much like you've made up your mind. For some random dude on the forum I wouldn't pay these comments a second thought, but (the last week notwithstanding), I would hope a high ranking EA at a prominent EA organisation would hold himself to better epistemic standards - ie just express yourself with more humility. I do not think your claims are necessarily wrong, I just think it's far too soon to assert they're necessarily right. For example:

The usual response from CDT proponents isn't "you're wrong, it's CDT that tends to get more utility"; it's "utility is the wrong criterion to use to choose between these decision theories".

The concept of 'CDT proponents' already presupposes a lot about your interlocutor - typically that they're willing to take such thought experiments as Parfit's Hitchhiker et al at face value. But just as one can reject reasoning by thought experiment as a methodology for ethics, one can reject it as a methodology for decision theory. I would personally say the DT thought experiments are even worse than those traditionally used for ethics: they ask you to imagine things that aren't logically consistent with the real world, such as humans who can make categorical changes to their own behaviour, arbitrarily perceptive psychologists, literally identical humans, and even omniscient spirits and gods who defy causality. If one is not willing to assume such concepts have real-world instantiations, one has much less grounds to 'ditch CDT', since no-one has shown it underperforms in the actual world (indeed, I've tricked actual rationalists out of small amounts of real-world utility by persuading them to one-box at inopportune moments).

A lot of the people at GiveWell meetups in the early days were rationalists

Being an early adopter is not the same as being a founder, let alone the same as your influencers being founders. A lot of people at the Oxford proto-EA meetups were dancers. That doesn't make Shakira an EA founder.

Seems pretty arbitrary to me to decide that GiveWell and GWWC were forerunners for EA, but LW, FHI, etc. weren't ... You can decide to use labels in a way that carves out some of those as "real EA" and some as "not real EA", but this doesn't change the sociological reality of which people were talking to which other people, who was donating to and working on standard EA cause areas early on, who was endorsing EA principles early on, etc.

You challenged my claim that EA was founded on naive consequentialism, by which, to clarify, I mean the ideas of Toby Ord and Brian Tomasik to combine greater generosity with greater effectiveness for a multiplicative effect, and to a slightly lesser extent the ideas of Givewell to simply be more effective in one's giving.

Rationalists may well have joined in these ideas early on, but there was plenty written on Less Wrong about why one should reject EA ideas, and AFAIK nothing at all written before GWWC or Brian's essays encouraging people to put their money where their mouth was. Fundamentally, GWWC's concept is what EA is. Rationalism's concept is a much more general 'how we might think about things more clearly' - obviously a useful thing for EAs to be able to do, but no more fundamentally related to EA than it is to chess.

Regardless, I don't think you'll disagree that Toby, Brian and Holden were among EA founders and, by doing relatively simple cost-effectiveness analyses and (in Toby's case). And, by encouraging people to think constantly about what their money could be spent on, about counterfactuals of earning to give etc, I think it's clear that Toby and Brian were using as naive an in-practice-genuinely-meant consequentialism as anyone has ever employed. Whether or no you count Eliezer as a co-founder of the movement, it therefore seems very strange to say it's 'false' that the movement was founded on naive consequentialism simply because he later argued against it.

RobBensinger @ 2022-11-17T22:08 (+5)

I don't think "known causal decision theorist Sam Bankman-Fried committed multibillion-dollar fraud, therefore we should be less confident that causal decision theory is false" is a good argument. There are (IMO) things some EAs should soul-search about after this, but "downgrade our confidence in literally all EA-associated claims" is the wrong lesson.

they ask you to imagine things that aren't logically consistent with the real world, such as humans who can make categorical changes to their own behaviour,

Do you mean that FDT requires that humans be capable of following through on things like "pay Parfit's hitchhiker"? I'd say it's obvious that humans can follow through on that kind of commitment. Humans may not be able to be 100% confident in their own future behavior, but 100% confidence isn't required.

arbitrarily perceptive psychologists,

See https://www.lesswrong.com/posts/RhAxxPXrkcEaNArnd/notes-on-can-you-control-the-past, especially:

[...] There's a cute theorem I've proven (or, well, I've jotted down what looks to me like a proof somewhere, but haven't machine-checked it or anything), which says that if you want to disagree with logical decision theorists, then you have to disagree in cases where the predictor is literally perfect. The idea is that we can break any decision problem down by cases (like "insofar as the predictor is accurate, ..." and "insofar as the predictor is inaccurate, ...") and that all the competing decision theories (CDT, EDT, LDT) agree about how to aggregate cases. So if you want to disagree, you have to disagree in one of the separated cases. (And, spoilers, it's not going to be the case where the predictor is on the fritz.)
I see this theorem as the counter to the decidedly human response "but in real life, predictors are never perfect". "OK!", I respond, "But decomposing a decision problem by cases is always valid, so what do you suggest we do under the assumption that the predictor is accurate?"
Even if perfect predictors don't exist in real life, your behavior in the more complicated probabilistic setting should be assembled out of a mixture of ways you'd behave in simpler cases. Or, at least, so all the standard leading decision theories prescribe. So, pray tell, what do you do insofar as the predictor reasoned accurately? [...]

LDT doesn't require that any predictors be perfectly accurate in real life. It just requires that there be agents that can predict your future behavior better than chance.

literally identical humans,

Not required, for the same reason. E.g., LDT comes into play whenever two humans make decisions based on a similar reasoning process (like "we both are using the long division algorithm to solve this math problem"), not just when the full brain-state is identical.

Like, to be clear, you can make literally identical humans, because there's nothing physically impossible about emulating a human brain in computing hardware, and emulations are trivial to copy.

And it's even more obvious that AI systems are copyable; and "figure out decision theory so we can better understand AI reasoning" is indeed the primary reason MIRI folks care about CDT vs. LDT.

But just because literal copies are a real-world example we need to take into account in AI (and, someday, in ems) doesn't mean that any of the core arguments for LDT require there to be literal copies of agents. This is discussed in https://www.lesswrong.com/posts/RhAxxPXrkcEaNArnd/notes-on-can-you-control-the-past.

and even omniscient spirits and gods who defy causality.

How so? This is sort of an assumption in parts of algorithmic decision theory (AIXI has all possible worlds in its hypothesis space, and runs on compute that's bigger than any of the possible worlds it's reasoning about, though it isn't indexically omniscient / doesn't start off knowing which world it's in). But I don't know of any standard LDT arguments that require omniscience or causal loops or anything.

indeed, I've tricked actual rationalists out of small amounts of real-world utility by persuading them to one-box at inopportune moments

"I defected and the opponent cooperated" could mean one of two different things:

"The opponent cooperated even though they knew I would defect".
"I tricked the opponent into thinking that I would cooperate, and then I defected anyway".

Re case 1: FDT advises defection in decision problems where your opponent defects, so any rationalists who endorse 1 are not following FDT's prescriptions. Obviously it shouldn't count as a strike against FDT if rationalists lose money by diverging from FDT.

Re case 2: no decision theory can protect agents from ever being tricked by others, or protect agents from the general fact that having false beliefs will make you lose utility. "If FDT agents believe falsehood X, they'll reliably lose money" is true for many values of X, but this doesn't help distinguish FDT from other theories; no decision algorithm can magically protect you from losing utility if you lack good world-models.

(This is why decision problems in the literature generally stipulate that the agent knows what situation it's in. It's clearly a strike against a decision theory if it predictably fails when the agent knows what's going on; whereas if the agent fails when it's clueless, the blame may lie with the false/incomplete world-model, rather than with the decision algorithm.)

You could respond "but CDT can't make this particular mistake", but I don't think this should be convincing unless you're pointing to a case where CDT does better than FDT while the FDT agent has relevantly accurately beliefs. Otherwise I can just respond, "The CDT agent is guaranteed to lose in the cases where cooperate-cooperate equilibria are achievable; so both agents will lose utility in various situations, but CDT has the additional defect that it loses when it has correct world-models, not just when it has incorrect ones."

It's one thing to err when you don't know enough to do better; it's another thing to light money on fire and watch it burn for no reason, when you know exactly how to get more utility.

LDT agents achieving rational cooperate-cooperate equilibria can be compared to trading partners who realize gains from trade. You can respond "But being willing to ever trade opens up the possibility of being cheated; how about if I instead precommit to never trading in any circumstance, so no one can cheat me." And that's indeed an option available to you. (And in fact, it's one the FDT agent will take too if they're in a weird world where this disposition is somehow rewarded. FDT is flexible enough to cover this case, whereas CDT isn't flexible enough to self-modify to FDT when needed.)

But in the real world, it's not actually a good idea to throw all trade opportunities out the window a priori, because (a) the value of honest trade is too large to be worth throwing away, and (b) if you're worried that you're bad at identifying cheaters, you can just default to defecting in all cases except the ones where you're extremely confident that you're dealing with a non-cheater.

FDT's prescription in this case, "defect unless you're confident enough that the other person really will cooperate iff you're the sort of person who cooperates in this situation", is strictly better than CDT's "defect no matter what", because you can always set the required confidence level higher within FDT. FDT just says, "Don't rule out the possibility of coordination totally."

That said, if in-real-life people who endorse FDT consistently get their lunch stolen by people who endorse CDT, even though this goes against FDT's prescriptions, then I would update on that and tell human beings to follow something closer to CDT in their daily life.

This is a crux for me in the context of "what advice should I give humans?", even if it's not a crux for the application to AI.

It would just be very weird if humans are unable to implement FDT well.

Arepo @ 2022-11-18T01:28 (+2)

You said "The rationalist community also wasn't involved from the start". I think this is false almost no matter how you slice it.

I've given a timeline to the contrary which you don't seem to contradict, so I have little more to say here. If you think that 'some rationalists were at some EA events' implies that 'Eliezer Yudkowsky's post ~2 years later on was somehow foundational to the EA movement', then I don't think we're going to agree.

I don't think "known causal decision theorist Sam Bankman-Fried committed multibillion-dollar fraud, therefore we should be less confident that causal decision theory is false" is a good argument.

I haven't said anything to the effect that SBF's behaviour should update us on decision theory, so please don't put that in my mouth. I said that I would like to see you, as a prominent EA, show more epistemic humility.

Do you mean that FDT... E.g., LDT comes into play

I didn't mention any decision theory except CDT, which I have not seen sufficient reason to reject based on the thought experiments you've cited. For example, I expect a real jeep driver in a real desert with no knowledge of my history to have no better than base rate chance of guessing my intentions based on the decision theory I've at some stage picked out. I expect a real omnipotent entity with seemingly perfect knowledge of my actions to raise serious questions about personal identity, to which a reasonable answer is 'I will one-box because it will cause future simulations like me to get more utility'. I don't have the bandwidth to trawl through every argument and make a counterargument to the effect that the parameters are ill-defined, but that seems to be the unifying factor among them. If you think your views are provable, then don't link me to multiple thousand-flowery-word essays: just copy and paste the formal proof!

I initially misunderstood you as making a claim that early EAs were philosophically committed to "naive consequentialism" in the sense of "willingness to lie, steal, cheat, murder, etc. whenever the first-order effects of this seem to outweigh the costs".

Your original comment was about how 'consequentialism at the level of actions has worse consequences than consequentialism at the level of policies/dispositions' which said nothing about lying, stealing etc. It was presented as a counterpoint to Harris who, to my knowledge, does neither of those things with any regularity.

Toby Ord's PhD thesis, which he completed while working on GWWC, was on 'global consequentialism', which explicitly endorses act-level reasoning if, on balance, it will lead to the best effect. His solicitation for people to do something actively beneficent rather than just be a satisficing citizen ran against very much against the disinterested academic stylings of rule consequentialist reasoning in practice. You can claim it was advocating a 'policy or disposition of giving', but if you're going to use such language so broadly, you no longer seem to be disagreeing with the original claim that 'if you're critiquing consequentialism ethics on the basis that it led to bad consequences you're seriously confused'.

RobBensinger @ 2022-11-17T23:06 (+2)

Being an early adopter is not the same as being a founder, let alone the same as your influencers being founders.

You said "The rationalist community also wasn't involved from the start". I think this is false almost no matter how you slice it. "OK, the rationalist community was involved but not all of the founders were rationalists" is a different claim, and I agree with that claim.

You challenged my claim that EA was founded on naive consequentialism, by which, to clarify, I mean the ideas of Toby Ord and Brian Tomasik to combine greater generosity with greater effectiveness for a multiplicative effect, and to a slightly lesser extent the ideas of Givewell to simply be more effective in one's giving.

If by "The EA movement was founded on people doing 'naive' consequentialism extremely well" you just meant "Toby Ord and Brian Tomasik did a lot of good by helping propagate the idea that generosity and effectiveness are good things, and GiveWell did a lot of good by encouraging people to try to donate to charity in a more discerning way", then I don't disagree.

I initially misunderstood you as making a claim that early EAs were philosophically committed to "naive consequentialism" in the sense of "willingness to lie, steal, cheat, murder, etc. whenever the first-order effects of this seem to outweigh the costs". I'd want to see more evidence before I believed that, and I'd also want to be very clear that I don't consider the same set of people (Ord, Tomasik, GiveWell) to be a majority of "early EA", whether we're measuring in head count, idea influence, idea quality and novelty, etc.

(Some examples of early-EA ideas that strike me as pushing against naive consequentialism: Eliezer's entire oeuvre, especially the discussion of TDT, ethical injunctions, and "ends don't always justify the means"; moral uncertainty, e.g. across different moral theories; general belief in moral progress and in the complexity of value, which increase the risk that we're morally mistaken today; approaches to moral uncertainty such as the parliamentary model; "excited altruism" and other framings that emphasized personal values and interests over obligation; the unilateralist's curse; ideas like Chesterton's Fence and "your confidence level inside the argument is different from your confidence level outside the argument", which push toward skepticism of first-order utility calculations.)

I do think that early EA was too CDT-ish, and (relatedly) too quick to play down the costs of things like lying and manipulating others. I think it's good that EA grew out of that to some degree, and I hope we continue to grow out of it more.

sphor @ 2022-11-18T02:48 (+1)

You say you're happy to hear counter-arguments, but it sounds very much like you've made up your mind.

FWIW I think there's no inherent tension here and it's a healthy attitude. One needs to make up one's mind, and saying (and acting) like you're happy to hear counter-arguments after that is very good.

Edited to add: I'm not making or implying any comments on the wider discussion, just this narrow point.

Arepo @ 2022-11-18T11:17 (+2)

I'd agree the first three remarks are on the strong side of reasonable, but the last two seem epistemically unhygienic. I am extremely confident eg that global total scalar valence utilitarianism is a better answer to the questions moral philosophy seeks to answer than any alternative that's been proposed, but I would never use a phrase like 'not up for debate', which sounds childish to me.

HakonHarnes @ 2022-11-16T10:47 (+24)

I also generally found this podcast encouraging and Sam is an eloquent speaker.

I did however find his characterisation of conventional philanthropic organisations rather strange. He highlights perverse incentives in that organisations would not really want to solve the issue they are ostensibly working on, as it would put them out of business. Although perhaps true in a strict theoretical sense, and there may be some unconscious / systemic drivers of this type of behaviour as well, it seems a very odd thing to focus on. This isn't even what differentiates EA from other philanthropy as far as I can gather (why would this not also apply to EA aligned orgs?).

Also, I've noticed over the years that Sam has a tendency to label critique and objections as "confusion". It's become somewhat of a trigger word for me. His opponents are always "confused" and misunderstanding him (which does happen a fair bit in fairness), whereas he himself is never confused about the pushback he receives. I find it does happen that he is in fact the one misunderstanding his opponent.

Just wanted to put that out there, perhaps you'll notice the same thing when listening to Sam in the future :)