A Happiness Manifesto: Why and How Effective Altruism Should Rethink its Approach to Maximising Human Welfare

By MichaelPlant @ 2018-10-25T15:48 (+83)

TL;DR I argue effective altruists can and should use happiness surveys to determine cost-effectiveness and show how doing this generates some substantially different charity recommendations from those given by GiveWell.

Abstract:

I argue that, despite some long-standing doubts, happiness can be measured through self-reports and therefore happiness surveys should be used to determine how much happiness different outcomes produce. Specifically, I recommend life satisfaction (LS), found by asking “Overall, how satisfied are you with your life nowadays?” (0 - 10), as a suitable, though imperfect, measure of happiness. As, presumably, everyone values happiness to some extent, it follows that everyone should incorporate information from LS scores when determining how to do the most good. I show how the LS approach could be applied to a novel area: assessing which charities produce the most happiness. According to GiveWell, which does not assess charities solely (or even primarily) by LS scores, certain life-saving and poverty-alleviating charities in the developing world do the most good per dollar. I show that, if we understand good in terms of maximising self-reported LS, alleviating poverty is surprisingly unpromising whereas mental health interventions, which have so far been overlooked, seem more effective. Various philosophical and methodological issues leave it unclear whether GiveWell’s life-saving recommendations are more cost-effective than the mental health charity I discuss. I conclude by explaining the implications of this analysis for effective altruists who want to increase human happiness.^¹

1. Introduction

2. The relationship between happiness and measure of subjective well-being

3. Can happiness really be measured?

4. Can we compare individuals’ happiness scores?

5. Using life satisfaction as the common currency for well-being adjusted life years (WALYs)

6. Evaluating the life satisfaction impact of (GiveWell) charities

7. What should effective altruists do next?

Annex A: How good are QALYs and DALYs as proxies for happiness?

1. Introduction

How should we compare the impact of various outcomes, such as the treatment of a health condition or poverty reduction, in terms of how much good they do? The current method used by effective altruists (EAs) is, ultimately, to rely on their own subjective judgements; the value of these outcomes is weighed in the mind. EAs have often used health metrics, such as QALYs and DALYs - which rely on other people’s aggregated subjective judgements - but health is not the only item of interest, therefore subjective judgements are still needed to decide how to compare the value of health outcomes to other outcomes. As they make clear in their cost-effectiveness analysis, GiveWell determine the value of different interventions by polling their staff members. The main question in their analysis is “how many years of doubled consumption are as morally valuable as saving the life of an under-5-year old child?” GiveWell then use the median answer to establish the trade-off ratio between these two outcomes.

We might think relying on our subjective judgements is both unobjectionable and unavoidable on the grounds we are making moral evaluations, and there is simply no other way to do this. However, claiming these are moral judgements is only partly true and, indeed, may not be true at all. Some of what appear to be moral evaluations are judgements about facts and so, in principle, empirical questions. Here’s an analogy. Suppose you and I are trying to determine which of two oddly-shaped jars contains the most water. What sort of assessment are we making here? Not a moral judgement, but a subjective judgement of fact. We could go on to say the best jar is the one that holds the most water, which would be a moral judgement.^² Yet, regardless of whether we’ve made that moral evaluation, the question of how much water the jars hold is still a factual one. Now, suppose you and I are trying to assess which of two outcomes - curing some health condition or alleviating poverty by some amount - increases happiness by more. Suppose we agree on our concept of happiness: a positive balance of enjoyment over suffering. Are we making a moral evaluation when we state which outcome we think would increase happiness more? Clearly, as the jar analogy showed, we are not. Note, need not assume happiness is of any moral importance - perhaps we conclude only liberty has value - to compare the outcomes.

Deciding (a) which thing or things are intrinsically valuable and this constitute a good (b) how to aggregate goods to determine overall value - in philosophical jargon, an axiology, the method of ranking states of affairs in terms of the their ultimate value - is, certainly, a moral judgement.^³ Suppose, for example, we decide the value of an outcome is the sum total of happiness in it. However, once you’ve done that, determining how much good an outcome contains is, in principle, an empirical question. Any subjective assessments one makes thereafter will be judgements of facts, not of value. I hope it’s obvious that we will, where possible, want to measure the good(s) directly, and that objective measurement of the facts ‘trumps’ our subjective evaluations of the facts. If we could measure the water capacity of the jars, that would settle the question of which held more and render our guesswork obsolete.

If we want to have productive disagreements with one another about which outcomes do more good, it’s important to make it clear whether disagreements arise from claims about value or from claims about facts. Suppose Singer and MacAskill disagree about whether option A is better than option B. Given Singer and MacAskill have (I think) the same views on value, the disagreement is a factual one.^⁴ By comparison, GiveWell’s analysis combines the opinions of multiple people who have different views about value, it’s not possible to tell whether one disagrees with GiveWell (supposing one does) because one differs about facts or about values. To make this plain, as noted, the key question in GiveWell’s cost-effectiveness is “how many years of doubled consumption are as morally valuable as saving the life of an under-5-year old child?” Here is a non-exhaustive list of factors two people could disagree about when answering that question, and whether that factor is a question of fact or value.

Factor	Value or fact question?
How much does doubling consumption for a year increase well-being?	Fact, for a given account of well-being
What is well-being?	Value
The well-being the child would have if it lived?	Fact, given expected well-being
How many years the child will live for?	Fact
What the badness of death is (i.e. is it better, all else equal, to save a 2-year old or a 20-year old?	Value

This document aims to illuminate one factor that, presumably, will be of interest to everyone: how much different outcomes improve happiness. While this has generally been judged subjectively, it is a question of fact, not of value (for a given account of happiness). I argue that, despite long-standing doubts, happiness can be measured through population surveys and therefore we should use data from happiness surveys, rather than relying on our own subjective judgements, to determine what increases happiness and by how much. Specifically, I recommend life satisfaction (‘LS’) scores, which is found by inquiring “How satisfied are you with your life nowadays?” (on a scale from 0 “not at all” to 10 “completely”) as the most suitable (although not ideal) proxy measure of happiness. While these claims may be unfamiliar and contentious within philosophy and effective altruism, they are common knowledge within (certain corners of) economics and psychology. This part of the document largely restates claims made by others.^⁵ Having argued that we should use LS scores to determine what maximises happiness in general, I apply it to the specific case of determining what the most cost-effective charities are at producing happiness, which has not yet been done. I show how the LS-approach works differently to and generates partially different recommendations from GiveWell, who do not primarily use LS scores to determine cost-effectiveness.

The rest of the document proceeds as follows. Section 2 explains how ‘subjective well-being’ (SWB), as it is normally called is social science, is measured, how SWB relates to happiness and why LS is a suitable proxy for happiness. Section 3 argues SWB measures are valid and reliable. Section 4 argues SWB measures can be used to make interpersonally cardinal comparisons. Section 5 outlines how LS can be used to determine what increase happiness in general. Section 6 applies the LS approach to charity evaluation. Section 7 sets out future work for effective altruists interested in increasing happiness.

2. The relationship between happiness and measures of subjective well-being

Social scientists (mostly economists and psychologists) talk about measures of ‘subjective well-being’ (‘SWB’), which are, to quote Metcalfe and Dolan (two social scientists) ‘ratings of thoughts and feelings about life’ (Dolan and Metcalfe (2012). SWB is typically thought to have three components, these are (OECD 2013):

Evaluation (sometimes called ‘cognitive’) - reflective assessment on a person’s life or some specific aspect of it. Life satisfaction is a life evaluation question but not the only one.

Experience (sometimes called ‘affective’ or ‘hedonic’) - a person’s feeling or emotional states, typically measured with reference to a particular point in time

Eudaimonia - a sense of meaning and purpose in life, or psychological functioning.

For a list of example SWB questions, see OECD (2013, Annex A).

What is the relationship between the SWB measures and happiness?^⁶ Often, measures of SWB are referred to as measures of ‘happiness’. This is technically incorrect and also misleading. On the earlier definition of happiness - a positive balance of enjoyment over suffering - the experience component of SWB is identical with happiness. Evaluations measure how people feel about their lives, rather than how happy they feel during them. Eudaimonic measures may tap into psychological states - ones related to meaning, whatever this is - that, presumably, feel enjoyable to experience and thus comprise happiness, but do not capture all the psychological states relevant to happiness. Hence, SWB is not only a measure of happiness.

The ‘gold standard’ for measuring happiness is the experience sampling method (ESM), where participants are prompted to record their feelings and possibly their activities one or more times a day.^⁷ While this is an accurate record of how people feel, it is expensive to implement and intrusive for respondents. A more viable approach is the day reconstruction method (DRM) where respondents use a time-diary to record and rate their previous day. DRM produces comparable results to ESM, but is less burdensome to use (Kahneman et al. 2004).

Given we are interested in measuring happiness, we might think we should ignore the non-experience components altogether. Practically, however, this is unfeasible and we are forced to rely on life satisfaction measures as the main proxy measure for happiness (a ‘proxy’ measure is an indirect measure of the phenomenon of interest). It is much easier to collect LS data as it requires just one quick question that takes subjects around 30 seconds to answer, whereas the DRM takes approximately 40 minutes to fill out. As a result of this ease of use, it is the SWB measure on which most data has been collected and most analysis done. It is now possible to say, a point I return to later, to what extent various outcomes cause an absolute increase in life satisfaction on a 0-10 scale, which is what we need to determine cost-effectiveness (see Layard et al. 2018). By contrast, to the best of my knowledge, there is insufficient research on experience measures to draw the same conclusions.

How much of a problem is it to use evaluative measures in lieu of experience ones? Experience and evaluative measures are conceptually different and answered in somewhat different ways. As Deaton and Stone (2013) explain:

Hedonic [i.e. experience] measures are uncorrelated with education, vary over the days of the week, improve with age,and respond to income only up to a threshold. Evaluative measures remain correlated with income even at high levels of income, are strongly correlated with education, are often U-shaped in age, and do not vary over the days of the week (Stone et al. 2010; Kahneman and Deaton 2010)

This doesn’t mean evaluative measures can’t be used as proxies for happiness. The evaluative and experience measures do correlate, suggesting evaluative judgements are, in part if not in whole, determined by how happy people are (OECD 2013, p32-34). While Deaton and Stone identify some cases where they come apart, it’s hard to think of instances where there would be different priorities, either for governments or for effective altruists, if the goal was trying to maximise life satisfaction scores rather than happiness scores. The sensible approach seems to be use happiness data where it’s available, but LS data where it isn’t and, when using LS data to determine cost-effectiveness, to keep in mind how the two might differ. Further work to investigate if and when using one measure over another would generate different priorities seems valuable.

While eudaimonia measures are regarded as a component of SWB, I will not refer to them again. Not only are they not the most relevant component, little data has been collected on them and it’s not conceptually clear what they capture.

3. Can happiness really be measured?

There are long-standing doubts (in economics) that happiness either can be or needs to be measured. Some historical context may be helpful here. According to Layard (2003):

In the eighteenth century Bentham and others proposed that the object of public policy should be to maximise the sum of happiness in society. So economics evolved as the study of utility or happiness, which was assumed to be in principle measurable and comparable across people. It was also assumed that the marginal utility of income was higher for poor people than for rich people, so that income ought to be redistributed unless the efficiency cost was too high.

All these assumptions were challenged by Lionel Robbins in his famous book on the Nature and Significance of Economic Science published in 1932. Robbins argued correctly that, if you wanted to predict a person’s behaviour, you need only assume he has a stable set of preferences. His level of happiness need not be measurable nor need it be compared with other people. Moreover economics was, as Robbins put it, about “the relationship between given ends and scarce means”, and how the “ends” or preferences came to be formed was outside its scope.

Interest in measuring happiness has returned in recent decades.^⁸ This seems to be caused by 1) the Easterlin paradox, the (contested) finding that while richer people are more satisfied with their lives than poor people, an increase in average wealth does not raise average life satisfaction; 2) the behavioural economic work of Tversky, Kahneman and others suggesting individuals do not, when left to their own devices, seem to maximise their own utility^⁹ and 3) growing dissatisfaction with GDP as a measure of progress.¹⁰

The idea governments should measure SWB and use it to guide policy has started to take root. In 2013 the OECD issued guidelines recommending its member-nations collect SWB data:

There is now widespread acknowledgement that measuring subjective well-being is an essential part of measuring quality of life alongside other social and economic dimensions [...] The Guidelines also outline why measures of subjective well-being are relevant for monitoring and policy making.

The UK’s Office of National Statistics have been collecting data on SWB since 2012 and currently polls 158,000 people a year. (Readers unfamiliar with SWB measures may find their FAQs helpful.)

Now, some scholars who argue we shouldn’t use SWB measures, such as Fleurbaey, Schokkaert and Dencanq (2009), nevertheless accept such measures are meaningful:

With the mass of data accumulated on happiness and satisfaction and the development of their econometric exploitation, subjective utility seems more measurable than ever. There now seem to be good reasons to trust the existence of sufficient regularity in human psychology, so that interpersonal comparisons appear feasible in principle.These new developments have triggered a revival of welfarism as well. If utility can be measured after all, why not take it as the metric of social welfare? Several authors have taken this line (Kahneman et al. 2004b, Layard 2005). However, none of the recent developments in the field of measurement directly undermine the arguments that were raised against welfarism in the philosophical debates of the previous decades. The fact that something becomes easier to measure does not give any new normative reason to rely on it.

While the above explains there has been a shift in opinion regarding the measurability of SWB, I have not yet said why we would think the SWB measures are accurate - they succeed in measuring what they set out to measure.

The accuracy of a measure is usually assessed in terms of its validity and reliability. Validity refers to whether the measure captures the underlying concept that it purports to measure. Suppose I try to measure your height by weighing you on a set of bathroom scales. The scales might be valid measure of weight but it’s clear, I hope, they are not a valid measure of height. Reliability is about whether the measure gives consistent results in identical circumstances (i.e. it has a high signal-to-noise ratio). If my scales produce a random number every time I step on them, they are not reliable. Reliability is necessary but not sufficient for validity; if I used a normal, non-broken set of scales to measure your height it would give me the same score, and so be reliable (assuming your weight doesn’t fluctuate), but still wouldn’t be valid. As the reliability and validity of SWB scales has been covered at great length in (OECD 2013) and elsewhere I will largely confine myself to explaining the key ideas and providing several illustrative quotations from that document.

Reliability can be assessed in two ways: by internal consistency - whether the items with a multi-item scale correlate, or different scales of the same measure correlate - and by test-retest reliability, where the same question is given to the same respondent more than once at different times. Note that if the item in question genuinely does change between measures, we would expect the test-retest reliability to be low.

Regarding life evaluations, quoting (OECD 2013, pp47):

Bjornshov (2010), for example, finds a correlation of 0.75 between the average Cantril Ladder measure of life evaluation from the Gallup World Poll and life satisfaction as measured in the World Values Survey for a sample of over 90 countries. [...] Test-retest results for single item life evaluation measure tend to yield correlations of between 0.5 and 0.7 for time period of 1 day to 2 weeks (Krueger and Schkade, 2008). Michalos and Kahlke (2010) report that a single-item measure of life satisfaction had a correlation of 0.65 for a one year period and of 0.65 for a two-year period.

And regarding affect/experience measures:

There is less information available on the reliability of measure of affect and eudaimonic well-being than is the case for measures of life evaluation. However, the available information is largely consistent with the picture for life satisfaction. In terms of internal consistency reliability, Diener et al. (2009) report [...] the positive, negative and affective balance subscale of their Scale of Positive and Negative Experience (SPANE) have alphas of 0.84, 0.88, and 0.88 respectively. [...] In the case of test-retest reliability, [...] Krueger and Schkade (2008) report test-retest scores of 0.5 and 0.7 for a range of different measures of affect over a 2-week period.

The authors conclude the life evaluation and affect measures exhibit sufficient correlation, by the standards of social science, to be deemed acceptably reliable.

Validity, by contrast, is somewhat harder to test than reliability because the underlying phenomena SWB measures attempt to capture are subjective, hence there is no objective way to demonstrate success. Nevertheless are three ways to assess validity. All of these ultimately rely on whether the measures conform to our expectation about the item we are intending to measure.

The first is face validity - do respondents judge the questions as an appropriate way to measure the concept of interest? If not, it’s likely the measures aren’t valid. In the case of SWB measures, it’s somewhat obvious this is the case, e.g. that asking people whether they felt happy yesterday is a good way to assess whether they felt happy yesterday. Participants aren’t generally asked about face validity, but this can be tested by (a) response speed and (b) non-response rates: if people don’t take a long time, or don’t answer, that suggests they don’t understand the question. Median response rates for SWB questions are around 30 seconds for single item measure, suggesting the questions are not conceptually difficult. (ONS, 2011) Quoting from (OECD 2013, pp49): “in a large analysis by Smith (2013) covering three datasets [...] and over 400,000 observations, item-specific non-response rates for life evaluation and affect were found to be similar for those for [the straightforward] measures of educational attainment, marital and labour force status” which, again, supports the face validity of the questions.

The second is convergent validity - does the item correlate with other proxy measures for the same concept? Kahneman and Krueger (2006) list the following as correlates of both high life satisfaction and happiness: smiling frequency; smiling with the eyes (“unfakeable smile”); rating of one’s happiness made by friends; frequent verbal expressions of positive emotions; happiness of close relatives; self-reported health. In addition, OECD (2013) states “Diener (2011), summarising the research in this area, notes that life satisfaction predicts suicidal ideation (r=0.44) and the low life satisfaction scores predicted suicide 20 years later in a later epidemiological survey from Finland (after controlling for other risk factors [..]” Such items allow us to assess the measures from the perspective of falsifiability: if we expect that (say) those with low life satisfaction would commit suicide more often, but our measure of life satisfaction found those with high LS commit suicide more often, that would suggest the measure lacked validity. As it stands, the results support the validity of the experience and evaluation measures of SWB.

The third is construct validity - while convergent validity assesses how closely the measure correlates with other proxy measures of the same concept, construct validity concerns itself with whether the measure performs in the way we expect it to. From OECD (2013, pp51):

Measures of [SWB] broadly show the expected relationship with other individual, social and economic determinants. Among individuals, higher incomes are associated with higher levels of life satisfaction and affect, and wealthier countries have higher average levels of both types of subjective well-being than poorer countries (Sacks, Stevenson and Wolfers, 2010). At the individual level, health status, social contact and education and being in a stable relationship with a partner are all associated with higher levels of life satisfaction (Dolan, Peasgood and White, 2008), while unemployment has a large negative impact on life satisfaction (Winkelmann and Winkelmann, 1998). Kahneman and Krueger (2006) report intimate relations, socialising, relaxing, eating and praying are associated with higher levels of positive affect; conversely, commuting, working and childcare and housework are associated with low level of net positive affect. Boarini et al. (2012) find that affect measures have the same broad set of drivers as measures of life satisfaction, although the relative importance of some factors changes.

Major life events, such as unemployment, marriage, divorce and widowhood, are shown to result in long-term, substantial changes to SWB, just as one would expect them to. The time-series in figure 1, from Clark, Diener, Geogellis and Lucas (2007), displays the LS-impact of such events for males (controlling for other variables) before, during and after they occur (y-axis records the change in LS on a 0-10 scale; the results are similar for females). Note the time series shows anticipation of the event. We can see, for example, a decrease in LS leading up to a divorce, whereas widowhood is barely anticipated and comes as a huge shock.

Figure 1. The dynamic effects of life and labour market events on life satisfaction (male) (Clark, Diener, Geogellis and Lucas (2007) (Y-axis represents absolute change in of life satisfaction on 1-10 scale)

Figure 2 from Clark, Fleche, Layard, Powdthavee, Ward (2017, p100) shows a similar time-series, this time for disability from three different data-sets. Individuals seem to partially, rather than fully, adapt to disability.^¹¹ This is what we might suppose would happen: becoming disabled is very bad, but being disabled is somewhat less bad as one’s lifestyle and mindset adjusts. It’s worth noting here one major potential objection to the use of SWB measures is that people do not really adapt to changes in circumstances, they simply change how they use their scales. However, if scale re-norming did take place, we would expect to see adaptation to all conditions. Yet, we do not see this: the LS scores in figure 1 above show people adapt to some things and not others. Further, Oswald and Powdthawee (2008) find there is less adaptation to severe disability than to mild or moderate disability, suggesting scale norming is not occuring and that the SWB scores are reflecting reality.

Figure 2. Adaptation to disability in different country data-sets Clark et al.(2017, p100)

As mentioned before, if the SWB measures had produced counter-intuitive results (got the ‘wrong answers’) that could lead us to conclude they were not valid. The above seems to match our expectations.

One finding that might, at least at first, seem counterintuitive is the relationship between SWB and income. While there is little disagreement that richer people within a given country report higher SWB (both on experience and evaluation measures), and richer countries report higher SWB, there is less consensus over whether SWB increases over time as countries become wealthier. This is the so-called ‘Easterlin Paradox’, displayed in figure 3 below Clark et al. (2018, p203). A critical response to SWB measures could be made as follows “the Easterlin Paradox shows increasing overall economic prosperity doesn’t increase SWB. But it’s obvious increasing overall economic should raise SWB. Therefore, the SWB measures must be wrong”.

Such a response is too quick. First, the debate still rages over whether the Easterlin Paradox holds - Stevenson and Wolfers (2008) argues it does not, Easterlin et al. (2016) reply. Second, as Clark (2016) notes, a large body of research finds individual SWB depends not just on the individual’s own income, but also their income relative to that of the reference group she compared her income to. Thus, if I am a wealthier than you, I should expect to have higher SWB. However, if my income rises but the income of those I compare my income to also rises, these effects cancel out, leaving my SWB unchanged. Hence the Easterlin paradox can be explained in large part by the phenomenon of social comparison: we judge our lives against those of others.

Figure 3. Change in subjective well-being and GDP/head over time

In a particularly insightful study, Solnick and Hemenway (2005), individuals were asked to choose between different states of the world, as follows.

A: Your current yearly income is $50,000; others earn $25,000

B: Your current yearly income is $100,000; others earn $200,000

Absolute income is higher in B than in A, while relative income is higher in A than in B. Individuals express a clear preference for A, clearly suggesting the importance of relative income. Hence, with further analysis, the Easterlin paradox is not as counter-intuitive it might seem.

Overall, the evaluation and experience SWB measures seem both reliable and valid.

4. Can we compare individuals’ happiness scores?

We now move on from whether SWB measures are accurate, to whether they are comparable between individuals. Here’s a potential concern: for a given scale, say life satisfaction, is going from 7 to 8 for one person equivalent to another person going from 2 to 3? In jargon, this is the question of whether the scales exhibit interpersonal cardinality. Readers who are not interested in or concerned by this problem are welcome to skip to section 5.

I unpack this concern in stages.

The first question to ask is: does the underlying phenomena of interest - the thing the SWB scales are trying to measure - have a cardinal structure, or is it merely ordinal? That is, it represents something that can be quantified - like length, height, weight, etc. - or does it merely represent an ordering - like ‘A is taller than B’? (1st, 2nd, 3rd … are the ordinal numbers, 1, 2, 3, … are the cardinal numbers).

It is intuitively obvious happiness is cardinal, as revealed by our linguistic use. It is entirely sensible to say ‘X hurt twice as much as Y” or “I feel 10 times better than I did yesterday”.^¹² If happiness were ordinal, the most we could say would be “X hurts worse than Y” and “I feel better today than I did yesterday”.

If life satisfaction scales capture a psychological state of satisfaction, then this would be cardinal; as above, intuitively, one can feel twice as satisfied about X vs Y.^¹³

Given the underlying phenomena of interest has a cardinal structure, the next question is whether individual’s reporting on the scale is equal-interval (another term for this is linear), i.e. going from 5/10 to 6/10 is an equivalent improvement as going from 7/10 to 8/10. One worry is that individuals interpret SWB scales as logarithmic, like the Richter scale, where the magnitude of going from 6/10 to 7/10 is 10 times that of going from 5/10 to 6/10, rather than as linear/equal-interval.¹⁴

While possible, non-linear reporting seem unlikely.^¹⁵ Experimental evidence from Van Praag (1993) suggests that when presented with a number of (non SWB-related) points, respondents automatically treat the difference between points as roughly equal-interval. Further, it is intuitively much harder for ordinary people (i.e. non-mathematicians) to report how happy/satisfied they feel on a logarithmic scale than a linear one. If I ask myself “how happy am I right now in a 0-10 logarithmic scale?” to try to answer this question I first have to think “how happy am I on a linear 0-10 scale.” I then try to remember how logarithms work and convert from there. This is so much harder to do that I assume scale use must be equal-interval.

Given the scales have intrapersonal cardinality, the final question is whether they have interpersonal cardinality: is one person’s reported one point increase on a 0 to 10 scale equivalent to a one point increase for someone else?

There are two different concerns here. First, individuals could correctly report where they are between the minimum and maximum points of the scales, but have different capacities for SWB. There could be ‘utility monsters’ who experience 1000 times more happiness than others. Second, individuals could have the same maximum and minimum capacities, but use the scales differently. Suppose almost everyone reports a given sensation as 6/10, but a few people report the same feeling as an 8/10; keeping the same terminology, this latter group are ‘language monsters’.

We can make the same reply to both concerns. So long as these differences are randomly distributed, they will wash out as ‘noise’ across large numbers of people: there will be as many people with a greater capacity for SWB as those with less, and as many who use the scale too conservatively as use it too generously. Second, in response to utility monsters, it seems unlikely, given our shared biology, that the utility capacities of humans will, in practice, vary by very much.^¹⁶ Third, regarding language, I observe we do tend to, in general, regulate one another’s language use. For instance, if I say “I’m having a terrible day: I stubbed my toe” you are likely to say “Hold on. That’s not a terrible day. That’s a mildly bad day”. A hypothesis, which could conceivably be tested, is that this language regulation pushes us towards using SWB scales in a similar way. If language did not have a shared meaning, it would be of no use at all.

We might object to the this last point that, even if groups regulate their members’ language use, different groups could still use scales differently. As an empirical test on this, a study by Helliwell et al. (2016) of immigrants moving from 100 different countries to Canada found that, regardless of country of origin, the average levels and distributions of life satisfaction among immigrants mimic those of Canadians, suggesting LS reports are primarily driven by life circumstances. If there really was substantial cultural difference in LS scale use, this result would not occur.

Therefore, it seems reasonable to interpret SWB data as interpersonally cardinal. However, as this point seems important, more work here would be welcome.

5. Using life satisfaction as the common currency for well-being adjusted life years (WALYs)

Suppose we accept we can use LS scores to measure happiness. What next? One, straightforward option would be to measure LS (and other SWB metrics) impacts directly in RCTs. If we know the costs of a programme, we could then establish how much it costs to produce one ‘life satisfaction point-year’ or ‘LSP’ - equivalent to increasing life satisfaction for one person by one point on a 10 point scale for a year. This method is structurally similar to assessing cost per Quality-Adjusted Life Year (QALY) - which effective altruists are already familiar with and I won’t go into - except QALYs are measured on a 0-1 scale whereas LS is on a 0-10 scale.

Table 1. How adult life satisfaction (0-10) is affected by current circumstances (BHPS) (cross-section) (Clark et al. 2018, p199)

I expect many effective altruists will welcome the idea of using LSPs instead of QALYs. Most EAs already accept that, in principle, we need a measure of ‘well-being adjusted life-years’ (WALYs).^¹⁷ QALYs capture

health, and as I noted at the start, not only is health not all that matters, we will still need a common currency that allows health and non-health outcomes to be traded-off against one another, and a non-arbitrary method to determine the value of outcomes in this currency. LSPs could partially or fully fulfill the role of being the WALY metric. For those who think happiness is the only intrinsic good, LSPs should be sufficient - unless and until a better measure of happiness can be found. Those that value goods other than happiness will, presumably, value happiness to some extent, and inasmuch as they do, LSPs will be one aspect of WALYs they need to consider alongside other goods.^¹⁸

RCT data using LS will not always be available. Where it is not, an alternate way to determine how different outcomes affect LS is to rely on data from large population surveys. Using a multivariate regression analysis that controls for different circumstances, researchers can then estimate the strength of the correlations between LS and various other factors. Table 1 from Clark et al. (2018, p199) contains the results of such an analysis both for the impact a given change has on an individual’s LS and that which it has on others.

This information can be used to make inferences about the expected LS effect of a given outcome without requiring an RCT, at least if it’s straightforward to measure the outcome, as it is in cases of unemployment. In other cases the relationship between life satisfaction and other measures, such as particular health metrics, it will need to be established so other metrics can be converted in LS scores. Some of this work been done: see Layard (2016) for such a table converting LS scores into both other SWB measure and various health metrics.

Two sets of comments are worth mentioning before we turn to charity analysis. First, three remarks on the results in the table that will be relevant again shortly: 1) doubling income is associated with a constant increase in life satisfaction; 2) the gain one individual receives from a doubled income causes a nearly equally large equivalent loss in LS to others; 3) mental health, employment and partnership have a much bigger per-person impact that a doubling of income does.

Second, while it is already possible to estimate the LS effect of many outcomes, if effective altruists want to use SWB data to assess effectiveness, they should encourage researchers - most obviously those working in global development - to collect it alongside other variables. This only requires quickly surveying individuals at the start and end of an impact assessment. This generates extra work, but also allows direct measurement of the outcome that is (presumably) of most interest.

6. Evaluating the life satisfaction impact of (GiveWell) charities

GiveWell has identified the charities it considers do the most good per dollar. We can use the LS lens to assess how good these charities are at increasing life satisfaction. GiveWell’s top charities can be divided into (1) life-saving charities, such as the Against Malaria Foundation, and (2) life-improving charities, those that increase individuals’ well-being during their lives, such as GiveDirectly and the Schistosomiasis Control Initiative (SCI). According to GiveWell, the vast majority the benefit of their recommended life-improving charities arises due to eventual income and consumptions gains, rather than gains to health.^¹⁹ We’ve already seen that making people richer is a surprisingly unpromising way of increasing LS in developed countries (as increasing the wealth of some reduces the happiness of others). I show this also seems to happen even at a low-level, such that treating mental health via a charity like StrongMinds looks much more cost-effective. Then I illustrate how to compare life-saving to life-improving charities. I claim it’s unclear, due to some methodological issues, which of the interventions increases LS more effectively.

6.1 Life-improving charities

Let’s start with GiveDirectly, a charity which provides unconditional cash transfers to Kenyan farmers, as there have now been three studies conducted used life satisfaction data (alongside other SWB metrics). Research suggests GiveDirectly’s cash transfers increase life satisfaction by about 0.3 life satisfaction points - LSPs - on a 10 point scale.^²⁰ This was measured after 4.3 months on average, but let’s assume this effect lasts a whole year, this affect applies to everyone in the recipient household, and there are 5 people per household on average. The average cash transfer is $750, which generates 1.5 LSPs with our assumptions (0.3 x 1 x 5), implying a cost-effectiveness of 2 LSPs/$1000.

However, this estimate is likely to substantially overstate the effectiveness of cash transfers. It only accounts for the life satisfaction increase of recipients. Research into GiveDirectly has suggested that their cash transfers, while making some people wealthier (and so more satisfied with life) have negative spillovers: it makes non-recipients less satisfied. As Haushofer, Reisinger and Shapiro (2015, p1) state:

The decrease in life satisfaction induced by transfers to neighbors more than offsets the direct positive effect of transfers, and is largest for individuals who did not receive a direct transfer themselves.

This might seem surprising, but the finding that increasing wealth leaves aggregate LS relatively unchanged is entirely consistent with the findings in table 1 and results mentioned in Clark (2016) above.

We might hope these negative spillovers would dissipate eventually and, over the long run, cash transfers would be effective in increasing life satisfaction. However, a new 2018 study on the long-term (3 year) effects of GiveDirectly by Haushofer and Shapiro (2018, p. 22) finds recipients, compared to non-recipients in distant villages, have 40% more assets but that recipients do no better on a psychological well-being index. GiveWell discuss this study, note it suggests cash transfers are less effective than they thought, but state they are awaiting the results of GiveDirectly’s “general equilibrium” study, which aims to assess spillover effects, before updating their cost-effectiveness assessment.²¹

I also look forward to further research, but for the moment I think the evidence suggests it’s far from obvious cash transfers have a robust, positive effect on happiness (measured as life satisfaction) in either the short or the long-term. Many people assume cash transfers must increase happiness, but it’s unclear what evidence someone could produce to support this intuition.

There a few objections one could make to defend the effectiveness of cash transfers here.

First, one could simply ignore the negative spillovers. It’s unclear how this would be justified. Even if we did do this, as I will shortly show, StrongMinds, a mental health charity, looks more cost-effective anyway.

Second, we could say that LS scores have got the intuitively wrong answer here and therefore they are not a valid measure of happiness (specifically, the claim is they lack construct validity). This response seems unconvincing: the critic would need to explain how LS seemed to get the wrong result in this case whilst getting the right result in many other areas. If LS measures succeed in measuring individual’s life satisfaction, presumably they do so in general.

Third, one could claim that happiness is not all that matters. Yet, presumably, it is one of the things that matters. Hence, most (if not all) people will want to consider the impact on happiness. Plausibly, cash transfers could be justified on non-happiness grounds, such as autonomy promotion. Someone who pressed this would need to determine how to trade off happiness against autonomy and come to an overall decision about which charity did the most good; this is not a concern I can address here.²²

Fourth, we might hope there are long-term, societal effects of increasing wealth, even if it doesn’t increase the aggregate life satisfaction of the immediate recipients and their neighbours over the first 3 or so years. Note this would be a very different justification for donating to GiveDirectly from the usual one given, which is that it benefits the recipients. It would require substantially altering the cost-effectiveness analysis. Further, it’s not obviously true that increasing a country’s wealth will increase aggregate life satisfaction. I’ve already mentioned the Easterlin paradox, which refers to developed countries, but a particular arresting case of development from a low level failing to increase happiness is China: it’s SWB seems to have gone down been 1990 and 2015, even though per capita GDP increased by 5 times (Easterlin, Wang and Wang 2017).²³

Fifth, we might think this problem could be avoided if money were given to everyone in the village, rather than just to some. This ignores the concerns about social comparisons. If social comparisons occur, then we would expect making everyone in village A richer to reduce life satisfaction in village B, an adjacent non-recipient village. We could respond to this with “Fine. But what if we made everyone richer?” which is a restatement of the previous objection.

To be clear, the concern about GiveDirectly isn’t that it is an ineffective way of alleviating poverty. Rather, the concern is that alleviating poverty is surprisingly ineffective at increasing happiness (measured as LS). Thus, from the LS perspective, it is unsatisfactory to object that other top-rated GiveWell charities are more effective than GiveDirectly at alleviating poverty. What needs to be shown is that alleviating poverty increases happiness, and it’s unclear what evidence supports this thesis.

We should now be concerned about the happiness-increasing effectiveness of all of GiveWell’s life-improving charities, as those charities are deemed effective on the assumption they increase wealth and increasing wealth does good overall. To illustrate, the Schistosomiasis Control Initiative (SCI), a charity with treats children for intestinal worms, is a top-rated GiveWell charity that, in fact, GiveWell consider more cost-effective at doing good than GiveDirectly. We might think the worries about the ineffectiveness of poverty alleviation would not apply here as SCI provides a physical health treatment. Yet, although SCI provides a health intervention, GiveWell claim that only 2% of SCI’s impact comes from ‘short-term health gains’. The remaining 98% arises from ‘eventual income and consumption gains’: dewormed children earn more in later life, and their well-being rises as a result of this additional income. Hence, the same doubts extend to SCI as well.

An objection here is that we should expect interventions which help people earn their own money - such as by improving their health - to increase happiness by more than cash transfers, which simply give money to them.^²⁴ It seems unlikely this would be true: it’s generally argued the merit of cash transfers is people can invest the money and use it to earn more for themselves later. Hence the long-term value of cash transfers is from earned income too.

Now we turn to StrongMinds, a mental health charity that provides interpersonal group therapy to women in Uganda. As the LS analysis of Clark et al. (2018) suggests treating mental health is among the most cost-effective way for developed-world governments to increase happiness, it is the natural first place to look when searching for an effective charitable intervention. There is no research which has directly measured the LS impact of treating mental health, so I estimate its effectiveness using other data.^²⁵ To save space I have put this analysis into a spreadsheet. I infer that the treatment effect is 0.2 LSPs per year for 4 years.[41] StrongMinds say their per-participant costs are $102 (StrongMinds Q1.2018 report). That suggests the impact is 0.8 LSPs (4 years * 0.2 LS gain) per $102, or 8 LSPs/$1000 (rounding up from 7.84). There is not space here to go into the details of mental health treatments or argue they are effective.26

Through the LS lens, StrongMinds is more effective than GiveDirectly (and other poverty-alleviating charities) simply because it seems to clearly increase net happiness.^²⁷ It’s possible another charity or organisation will be much more effective at increasing happiness than StrongMinds, but that is a topic I plan to cover elsewhere.²⁸

For those interested in increasing happiness, this result is important. Using the empirical data on happiness illuminates a new category of intervention - treating mental health - that effective altruists have so far overlooked and now appears more cost-effective that alleviating poverty.^²⁹ Part of this must be due to effective altruists’ historic reliance on health metrics - QALYs and DALYs - which seem to underrate the happiness impact of mental health conditions relative to physical ones. I discuss this in Annex A.

6.2 Life-saving charities

As noted in the introduction, the key question in GiveWell’s cost effectiveness evaluation is “how many years of doubled consumption are as morally valuable as saving the life of an under-5-year old child?” This is what they need to compare the cost-effectiveness of life-saving to life-improving charities; I stated that it combines judgements about facts and judgements about value to answer this question.

Using LS scores we can take a different approach to making this comparison, which is to work out the cost-effectiveness of life-saving interventions in LSPs. First we need to know the cost to save a life. According to GiveWell’s estimates, the Against Malaria Foundation (AMF) saves a life for around $3,500 (i.e. prevents a premature death).³⁰

Second, we need the number of years that person would have lived for. Suppose AMF grants 60 counterfactual years of life.

Third, we need to establish how many net LSPs the person gains per year - how much better their lives are than the ‘neutral point’ equivalent to being dead. Average life satisfaction in Kenya, where AMF operates, is 4.4/10 (Helliwell, Layard and Sachs 2017, p28). Now we run into a problem. Life satisfaction surveys don’t ask people to specify what point on the 0 to 10 scale they would consider equivalent to not being alive. 0 is labelled not at all’ and 10 ‘completely satisfied’. Intuitively, the midpoint in the scale, 5, would be the neutral point. Yet, if that’s true, then saving lives through AMF would in, fact, be bad. 4.4 is the below the neutral point so AMF would be prolonging bad lives, lives worth not living.³¹

Let’s suppose instead the neutral point is 4. If this is so, saving the child is worth 0.4 life satisfaction points a year for 60 years, thus 24 LSPs (0.4 x 60).

Given the $3,500 cost, we can calculate cost-effectiveness as 6.9 LSPs/$1,000. Earlier, I estimated StrongMinds’ cost-effectiveness was around 8LSPs/$1000. Hence, we can now compare our life-saving and life-improving interventions in the same units of cost-effectiveness.

A problem for our analysis is that these cost-effectiveness numbers are highly dependent on an (so far) arbitrary decision about where the neutral point goes. If someone instead set the neutral point at 3, which intuitively seems too low, then AMF’s cost-effectiveness would leap to 24.4LSPs/$1,000 and it would be more cost-effective than StrongMinds.

How could we settle where the neutral point is? Two strategies seem possible. First, we could find out at what LS scores individuals report their lives are neutral on experience measures of SWB. Second, we could poll people to ask at what LS score out of 10 they would be indifferent between living with that score for the rest of their life or dying. I am unaware of any analysis which has been conducted along either lines. Thus, for the moment, this unfortunately remains a point of armchair conjecture.

Thus, using the LS scores, we can establish the net happiness of different outcomes. This is a question of fact, and while we are able to make much of the calculation without relying on our subjective judgements, we have had to do so regarding the location of the neutral point.

However, what is a moral judgement, and must be made implicitly or explicitly, is what the badness of death is. On the ‘life comparative’ account of the badness of death, the value of saving a life is the total well-being the person would have had if they’d lived. The numbers I’ve produced above implicitly assumed this was the correct view.

One popular, alternative view about the badness of death, and the view GiveWell staffers take, is the Time-Relative Interest Account (TRIA).^³² According to TRIA, it’s more important to save (say) 20-year olds than 2-year olds even though saving the 2-year old would, we suppose, cause around 18 more years of life to be lived. The reason to count the 2-year old for less is that very young children, being relatively underdeveloped, will have a weaker interest in continuing to live than a fully-developed 20-year old. We can see from the the ‘moral weights’ tab of GiveWell’s cost-effectiveness analysis GiveWell staffers seem to adopt TRIA:^³³ the value to save an over-5 year old is, depending on the staff member, 100% to 400% times that of saving an under-5 year old; the median weight is 200%.^³⁴

Note that advocates of TRIA will still need to know what the total well-being the person would have had: TRIA requires that as an input the equation where this is then discounts by the age of the person to determine the value of saving the life (ascertaining this discount is a moral judgement).

Let’s suppose we are TRIA advocates and now reduce the cost-effectiveness of AMF by half. If the neutral point is 3, then the cost-effectiveness of AMF is 12.2 LSPs/$1,000, and only about 50% more cost-effective than the estimate for StrongMinds.^³⁵ If the neutral point is 4, AMFis 3.5 LSPs/$1,000 and thus less cost-effective.

Further complications arise when we try to account for the ‘social value’ of saving lives - the impact saving a life has on everyone apart from the saved individual. We can divide this up into: (1) the effects of the death on friends and family; (2) concerns about under- or overpopulation;^³⁶ (3) the meat eater problem, the impact saving lives has on increasing animal suffering due to meat consumption.^³⁷ (2) and (3) are important but too much a diversion to our discussion here on using LS measures. Figures for (1) could be derived using LS data. Oswald and Powdthawee (2008) estimate that, shortly after the effect, the death of a child causes a 0.6 loss (on a 0-10 LS scale), a spouse a 1.3 loss, and a parent and 0.4 loss. Further work would needed to assess what the total counterfactual LS impact of saving a lives on friends and family is over time, and to adjust these figures for the developing country setting.³⁸

As I’ve explained the LS approach and shown how it reaches different results from GiveWell, perhaps the natural thing to do would be to explain what GiveWell’s method is and whether the differences emerge from disagreements about value or disagreements about facts. Unfortunately, this is not straightforward to do as GiveWells’s key metric (again - trading off years of doubled consumption against the value of saving a under-5-year old) combines judgements of facts and value. I understand some GiveWell staff incorporate SWB surveys into their judgements, but I am not aware any staff member who solely basing their analysis on. What’s more, as GiveWell take the median answer of their staff, the ‘GiveWell view’ is a composite held by no one in particular. Hence it’s impossible to know exactly where the disagreements lie. My approach has been to make it clear, on the best available evidence about happiness, what we can say about the happiness impact of various outcomes, information everyone, I expect, will need to take into account. Readers who do not believe the best outcome is the one with the largest sum of happiness will need to adjust the analysis accordingly.

To summarise this section: it is now practically possible to use self-reported life satisfaction scores to determine how much various outcomes increase happiness. I applied the LS approach to evaluating the cost-effectiveness of GiveWell’s top charities and showed poverty-alleviating charities are unpromising compared to mental health interventions; empirical and philosophical questions remain over comparing the value of life-saving to life-improving interventions.

7. What should effective altruists do next?

Relying on life satisfaction (and other SWB measures) to tell us how to maximise human happiness is a new approach for effective altruists. If - as I have argued we should - we think is the correct method and we accept its results, then it challenges the current assumptions within EA about how to increase happiness. Most obviously, it suggests looking at the best ways to improve mental health, which is currently not regarded as a priority. The above analysis, comparing developing world charities, is just the first step in ascertaining how individuals can maximise happiness with their time and money. Much more work is required. We will need new evaluations of interventions and charities. We will need to identify the relevant and useful players within the happiness-increasing space and build a community around it. We will need to think what high-impact careers look like for those who want to maximise human welfare.^³⁹ We will need to develop a research strategy and determine what the priorities on it are.

This is a large challenge and after EAGxNetherlands in 2018 a small ‘Human Welfare Task Force’ (HWTF) formed to think how we might take this forward. It currently consists of myself, Alex Lintz, Denisa Pop, Robin van Dalen, Siebe Rozendal, Peter Brietbart and Jessica van Haften. If you wish to be involved, please email siebe[at]eagroningen.org and michael.plant[at]philosophy.ox.ac.uk.

If you agree with the Manifesto, here are some other concrete actions you can take:

● Arrange a local EA gathering where you discuss this issue, possibly watching my talk on Maximising World Happiness from EA Global London 2017.

● Get yourself up to speed on the latest research. I’ve produced a Reading List: Happiness for Effective Altruists and Other Humans.

● We have started working on a Human Welfare Research Agenda of questions we think need answering. You are welcome to make suggestions or pick something from the list to begin investigating. If you would like to investigate something, please get in touch so we can coordinate more effectively. We are also collecting some other research documents in our Google Drive folder.

● Join the Facebook group Effective Altruism, Mental Health and Happiness.

● If you currently donate to anti-poverty charities, you could switch your donation to an effective mental health charity instead. That said, given the uncertainty about what the best way to promote human welfare is, you may want to wait for further information. We are considering the possibility of setting up a research organisation focus on human happiness. If you’re interested in funding research into this area, please email me.

Annex A: How good are QALYs and DALYs as proxies for happiness?

Effective altruists have tended to use health metrics - QALYs and DALYs - as the proxy for WALYs. However, these standard health metrics are misleading proxies for happiness. For ease, I quote at length from Clarke et al. (2018, p85):

In the QALY system, the impact of a given illness in reducing the quality of life is measured using the replies of patients to a questionnaire known as the EQ5D. Patients with each illness give a score of 1, 2, or 3 to each of five questions (on Mobility, Self-care, Usual Activities, Physical Pain, and Mental Pain). To get an overall aggregate score for each illness a weight has to be attached to each of the scores. For this purpose members of the public are shown 45 cards on each of which an illness is described in terms of the five EQ 5D dimensions. For each illness members of the public are then asked,“Suppose you had this illness for ten years. How many years of healthy life would you consider as of equivalent value to you?” The replies to this question provide 45N valuations, where there are N respondents. The evaluations can then be regressed on the different EQ5D dimensions. These “Time Trade-Off” valuations measure the proportional Quality of Life Lost (measured by equivalent changes in life expectancy) that results from each EQ5D dimension.

As can be seen, these QALY values reflect how people who have mostly never experienced these illnesses imagine they would feel if they did so. A better alternative is to measure directly how people actually feel when they actually do experience the illness.

The result would be very different. Figure [4] contrasts the outcomes from these two different approaches. The existing QALY weights are shown by the shaded bars of Figure [4]. This scale has been normalized so that the bars can be compared with those from a regression of life-satisfaction on the same variables. This latter regression is shown in the black bars in the figure—the magnitudes here are not β-statistics but the absolute impact of each variable on life-satisfaction (0–1). As can be seen from the lower part of the figure, the public hugely underestimated by how much mental pain (compared with physical pain) would reduce their satisfaction with life.

Figure 4. How life satisfaction (0-1) is affected by the EQ5D, compared with weights used in QALYs

QALYs are not a very good guide to what makes people satisfied because they are based on people’s preferences over how bad they imagine various health states are, rather than how bad they are when they experience them, and as noted earlier, we are not very good at imagining what makes us or others happy. To highlight a particularly outstanding discrepancy, Dolan and Metcalfe (2012, from whom the above figure 4 is derived) report subjects agreed to hypothetically give up as many years of their remaining life, about 15%, to be cured of ‘some difficulty walking’ as they would to be cured of ‘moderate anxiety or depression.’ However, from SWB measures ‘moderate anxiety or depression’ is associated with 10 times a greater loss to life satisfaction, and 18 times a greater loss to daily affect, than ‘some difficulty walking’ is. This seems compelling evidence, if we need any, that if we rely on people’s preferences about imagined futures we will get the wrong answers about what makes individuals happy. The explanation here is that, when imagining the future, we fail to anticipate that our ‘psychological immune system’ will ‘kick in’ and cause us to adapt to some circumstances but not others: what Gilbert et al. (2009) call ‘immune neglect’. Conditions such as mobility impairment are things we stop paying attention to, whereas mental illnesses are comparative ‘full-time’ and continue to affect our subjective experiences.

I’m unaware of any studies comparing DALYs and SWB measures directly, but given how DALYs are constructed - typically by asking experts for ratings - we would expect the same problems to occur. See Sassi (2006) for a comparison of the methodologies for QALYs and DALYs.

The implication of this analysis is that we should substantially reduce how cost-effective physical health interventions are compared to mental health interventions, assuming we’d previously judged them by QALYs and DALYs as Giving What We Can did in their reports into mental health (GWWC 2015, 2016). Thus, unless we find physical health interventions that are incredibly cheap compared to mental health treatments, we should be sceptical physical health interventions will turn out to be comparatively more cost-effective. The possible exception would be using opiates to treat severe pain: pain is clearly very bad for well-being and opiates can be very cheap (Knaul et al. 2018).

Perhaps the reason the effective altruism has largely overlooked mental illness largely because of the movement’s early reliance on QALYs/DALYs as an approximation of well-being. Given how much QALYs underrate the badness of mental health, it’s not much of a surprise individuals using those metrics would be led to the (false) conclusion mental health is comparatively unimportant.

1 I am very grateful to the following for their helpful comments on document: James Snowden, John Halstead, Sjir Hoeijmakers, Kellie Liket, Denisa Pop, Peter Brietbart, Jessica van Haaften-Carr, Siebe Rozendal and Alex Lintz. Special thanks much go to the latter two who also proofread multiple drafts and removed my innumerable mistakes.

2 Assuming we mean ‘best’ in a moral, rather than merely aesthetic, sense.

3 Technically, (a) and (b) suffice only to give us a fixed-population axiology. A third component (c), specifying who the bearers of value are (i.e. present, actual, necessary or possible people) is needed to give us a variable-population axiology. Perhaps confusing, what philosophers call a ‘population axiology’ usually just specifies (b) and (c).

4 Both think the value of a state of affairs is the total happiness of those who exist in that state of affairs (hedonic totalism?)

5 See Bronsteen et al. (2013) for the argument cost-benefit analysis should be replaced with ‘well-being analysis’, i.e. the use of SWB measures. Clark et al. (2018) argues SWB should be used to determine policy and sets out the state of the art on how this could be done. Dolan and Metcalfe (2012) and Dolan, Layard and Metcalfe (2011) make recommendations for how governments should measure and use well-being. ,

6 The relationship between what social scientists call ‘SWB’ and philosophers call ‘well-being’ is too big a digression for this document.

7 Arguably, an even better method would be to measure brain waves, assuming we could correlate happiness with brain states.

8 OECD (2013, p20) notes “during the 1990s there was an average of less than five articles on happiness or related subjects each year in the journals covered by the Econlit database. By 2008 this had risen to over fifty each year”

9 E.g. Thinking, Fast and Slow by Kahneman (2011)

10 E.g. see the Sen-Fitoussi-Stiglitz report: Commission on the Measurement of Economic and Social Progress (2009)

11 In a meta-analysis, Luhmann et al. (2012) compare the rates of adaptation on evaluative and experience measures of SWB, finding some differences.

12 A doubt here is whether, given the diversity of experiences, there is any one property that all happy (and unhappy) experiences have, such that we can quantify them on a single scale. Following Crisp (2006) I think there is: the property of pleasantness, or ‘hedonic tone’. Even though headaches and heartbreaks feel different, I find nothing confusing in saying one can feel as bad as another. For dissent, see, e.g. Nussbaum (2012)

13 An alternative motivation for measuring life satisfaction is that it capture a cognitive judgement of the extent to which someone’s preferences are satisfied - i.e. the world is going the way they want it to - rather than a feeling. If unclear if this cognitive judgement could be cardinal. As Hausmann (1995) argues at length, while we can order preferences (preferences are about how entire worlds go), there is in principle no conceptual unit of distance between our ranked preferences to generate cardinality. This is too much of a diversion to discuss further here. Given we are ultimately interested in happiness here, and evaluations are relevant only as proxies for, it seems convenient to side-step the problem say evaluations capture a felt strength of satisfaction.

14 The worry individuals may self-report in this way has been raised with me separately by Toby Ord and James Snowden.

15 If it turned out reported SWB was a function of log(actual SWB) that wouldn’t make it impossible to make interpersonal cardinal comparisons: we simply need to convert individuals’ scores onto a linear scale.

16 This same thinking will not extend to comparing humans to animals, or comparing humans to some hypothetical humans genetically modified to experience greater happiness.

17 MacAskill explicitly states this in Doing Good Better

18 Suppose someone thought there are two intrinsic goods, happiness and autonomy. They would need a measure of autonomy - autonomy-adjusted life years? (AALYs) - and they would need to set up a conversion rate between LSPs and AALYs to determine which outcome did the most good on the composite measure.

19 See the ‘results’ table of v4 of GiveWell’s CEA: only 2% of the benefit of deworming charities is due to health. I return to this momentaily.

20 References and calculations available in this spreadsheet: Life satisfaction impact of treating mental health vs alleviating poverty

21 Available at: https://blog.givewell.org/2018/05/04/new-research-on-cash-transfers/

22 See footnote 13.

23 The authors attribute this decline (or, lack of increase) to reductions in job security and vast labour movements which eroded the social fabric of society.

24 John Halstead suggested this concern to me.

25 According to Layard (personal conversation) no work has yet been done the measure the impact of mental health treatment using LS scores.

26 See Thrive by Layard and Clark (2015) for a good book-length summary.

27 StrongMinds is still four times more effective than GiveDirectly even if, for some unclear reason, GiveDirectly’s negative spillovers are ignored.

28 For example, Friendship Bench, a Zimbabwean mental health programme, has had an RCT conducted on its intervention (Chibanda et al. 2015). The study shows that intervention is more effective, in terms of per-participant improvement in standardised mental health scores, than either StrongMind’s evaluation of their own programme or my estimate of StrongMind’s programme. I do not (yet) have per-participant costs for Friendship Bench so I cannot estimate cost-effectiveness.

29 Neither MacAskill or Singer mention it in their 2015 books. GWWC does have two blog posts on mental health (2015, 2016) but both of which say - based on a DALY analysis - it looks less cost-effective than other health interventions; of the more than 700 international charities GiveWell has listed as ‘contacted’ or ‘considered’ only one appears related to mental health.

30 GiveWell, “2018 GiveWell Cost-Effectiveness Analysis — Version 4,” 2018, https://docs.google.com/spreadsheets/d/1moyxmsn4UjhH3CzFJmPwAN7LUAkmmaoXDb6bdW3WILg/edit#gid=1364064522.

31 Some people I’ve spoken to have suggested it’s bad to save lives solely on the grounds those in the developing world lead lives belows the neutral point.

32 TRIA is most associated with Jeff McMahan. See Liao (2007) for a discussion.

33 An alternative possibility is that they adopt the life comparative account but think it’s better to save older people because the ‘social value’ of saving lives - the impact on others - is greater for older than younger people.

34 Interesting, we can also use LS scores to as an intuition check on the relative weights GiveWell staffers attach to saving an under-5 vs doubling consumption for 1 year. The median GW staffer think saving an under-5 is 50 times more valuable that doubling incoming for 1 year. Assuming the child would live 60 years, that implies the child is 0.83 LSPs above the neutral line (plausible if child lives at 4.8/10 and death is 4) and that doubling 1 person’s income for a year has an 0.83 LSPs effect (not all plausible given earlier evidence).

35 It would be better to say that AMF is deemed as having a cost-effectiveness morally equivalent to a life-improving intervention of 12.2LSP/$1,000.

36 See Greaves (2018) for a summary arguing it’s unclear whether the Earth is under- or overpopulated

37 See e.g. Weathers (2016) and EA concepts (n.d.). Not the meat eater problem also provides a further reason to reduce the value of economic development, not just the value of saving lives: the longer people life and the richer they become, the more meat they eat.

38 If a loss of life is to be expected, it is presumably less bad.

39 Outside the EA world, there are people working on human happiness. Some economists work on what governments can do for their citizens. Some psychologists work on how individuals can find happiness for themselves. Some physicians work on how to improve mental health treatments. Some organisation, such as Action for Happiness, seek to teach individuals to promote happiness in their own lives and communities. What is lacking are any organisations that seek to answer the EA question: how can individuals use their spare resources to make other people as happy as possible? One answer to this question is, of course, ‘coordinate with one another’.

undefined @ 2018-11-26T04:21 (+9)

Thanks for writing this very detailed analysis. I especially enjoyed the arguments for why we can compare LS scores between people, like the Canadian immigrant study.

The section I found most suprising was the part on Givewell using the Time-Relative Interest Account. I've always thought of some kind of egalitarianism as being relatively important to EA - the idea that all people are in some sense equally deserving of happiness/welfair/good outcomes. We might save a young person over an old person, but this is only because by doing this we're counterfactually saving more life-years.

For example, here is Giving What We Can:

People[2] are equal — everyone has an equal claim to being happy, healthy, fulfilled and free, whatever their circumstances. All people matter, wherever they live, however rich they are, and whatever their ethnicity, age, gender, ability, religious views, etc. [emphasis added]

But the TRIA explicitly goes against this. It directly weighs a year of health for a 25 year old as being inherently more valuable than a year of health for a 5 year old - or a 50 year old. This seems very perverse. Is it really acceptable to cause a large amount of pain to a child, in order to prevent a smaller amount of pain for an adult? I think the majority of people would not agree with this - if anything people prefer to prioritize the suffering of children over that of adults.

undefined @ 2018-11-30T17:26 (+3)

Hello Larks. Glad your found it useful. On equality, it's going to turn on how you think equality should be understood. If you think we should give equal weight to the 'time-relative-interest-adjusted' value of people's life, you might think it is correct to believing saving a 25-year-old is better for that person than saving a 2-year-old is to that person.

FWIW, intuitions seem quite split on deprivationism vs TRIA about deaths. What people find weird about deprivationism is that there is some sharp point when someone starts to matter. Say someone begins to exist after 90 days after conception. Well, saving someone after 89 days would be morally unimportant, whereas saving them after 91 days would be hugely important. TRIA, by contrast, has a more gradually approach.

undefined @ 2018-11-30T23:49 (+9)

It seems to me that TRIA is really stretching the definition of 'equality'. Could I not equally suggest a Citizenship-Relative-Interest-Account? This would fit well with people's nationalistic intuitions. Indeed, if we look at the list of things GWWC claimed EAs do not discriminate based on, we could circumvent all of them with cunningly crafted X-Relative-Interest-Accounts.

I agree a moral discontinuity would be very perverse. But it seems there are many better options. For example, a totalist view - that people matter even before they are conceived - avoids this issue, and doesn't suffer from the various inconsistencies that person-affecting views do. Alternatively, if you thought that we should not value people who don't exist in any way, conception provides a clear discontinuity in many ways, such that it does not seem like it would be weird if there was a moral value discontinuity there also.

But I think the biggest problem is that, even if you accept TRIA, I suspect that most people's moral intuitions would produce a very different weighting distribution. Specifically, they would be more averse to causing pain to 5 year olds than adults - especially adult men. If I have time I might look into whether there has been any empirical research on the subject; it could be a useful project.

undefined @ 2018-12-01T11:47 (+3)

First, I want to say that I do not endorse TRIA. This post wanted to look at applying the SWB approach given what people's moral views seem to be, rather than evaluate how good those views are. GiveWell staff and many EAs (implictly) endorse TRIA, hence I discussed it.

FWIW, I don't think the concern that TRIA ignores equality really hits the mark. If you think what matters is interest, then you weight by the strength of interest, and - adding some further theory - young children don't seem to have such strong interest in survival as older humans. I think there are deep problems with TRIA, but I don't think concerns about equality is one of them.

undefined @ 2018-10-26T12:11 (+8)

Contrary to the post, I think non-linear reporting is quite likely. The problem is not so much in "reporting" but in "perception". More about it here:

https://www.lesswrong.com/posts/7Kv5cik4JWoayHYPD/nonlinear-perception-of-happiness#TMPkeCnPe85zqdZqn

Because of this, the dicussion in the direction of "Further, it is intuitively much harder for ordinary people (i.e. non-mathematicians) to report how happy/satisfied they feel on a logarithmic scale than a linear one." is mostly irrelevant - because the log() or other nonlinear transformation transformation is happening in the percepting stage, it is mostly hidden from the people themseves. (You can compare the arguments given with something like asking people to rate the star brightness on logarithmic scale)

On a positive note, I agree with 15. Having the right non-linear transformation would also probably explain away part of the results where people appear inconsistent or aggregating in wrong way. (ala Kahneman)(Their brains may be aggregating correctly the raw, untransformed quantity.)

Conclusion: IMO the possibility of nonlinear perception of happiness is a crucial consideration for this whole business and should be given research priority. I'm probably quite bad at explaining it (eg the above LW post which was probably incomprehensible for most readers)

undefined @ 2018-10-26T18:12 (+3)

I tend to agree with you that it is still unclear whether the relationship between 'raw' happiness (as mentioned in your post) and perceived happiness is linear or not. Log makes intuitive sense in some ways but the implication that happiness increases linearly with wealth is extremely counterintuitive to me.

I tend think something akin to prospect theory's weighting happiness relative to some set point may be plausible. I also think there will be some odd effects near the boundaries (0&10).

For me the project we're trying to push forward with this manifesto is in large part about trying to figure out satisfactory answers to questions like these, so I agree it's a priority.

Despite these open questions I tend to think it's useful to proceed as if it were linear for now as that seems to be status quo in the field and to me it seems more intuitive. Even if we're wrong about linearity this measurement system should lead to more accurate approximations of 'raw' happiness than, say, QALYs

undefined @ 2018-10-30T15:41 (+2)

Hello Jan.

I don't think I follow your post. One idea is that happiness functions like sound, where it takes twice as much of an increase in sound to cause the same increase in perceived loudness. This seems confused if we extend it to happiness. What are we supposed to say: it takes twice as much of increase in happiness to cause the same increase in perceived happiness? That would be crazy, because we only have one item, happiness, rather than two. If we labelled this on a graph, the axis would be the same.

An alternative is that it takes twice as much of an increase in brain activity (of a certain kind) to cause the same increase in perceived happiness. Okay. But people are reported their perceived happiness (or life satisfaction), not the brain states.

undefined @ 2018-10-30T18:32 (+2)

Imagine, for example, there exist some "S1 happiness" which is actually important for humans, and they are making decisions based on it, and some "S2 perception of happiness" which people report. People are incorrectly assuming (a) this is the same, and (b) you have good introspective access to it.

For more direct comparison than sound, imagine there isn't such thing as money, and you ask people to report their percieved wealth on a scale 1 to 10. Now introduce money, and compare how the 1..10 rating relate to monetary wealth. I would bet you would get nonlinear relationship in this case.

bfinn @ 2019-08-28T20:35 (+6)

Very good article. Re marriage, and also the Easterlin Paradox:

Re fig 1, marriage, more recent research shows the apparent reversion of satisfaction to singlehood level (or less) a few years after marriage actually just results from not controlling for age properly. Specifically, from the U-shaped happiness curve during the life course - people tend to marry before middle age, so get less happy as they approach middle age, not because of the marriage. (I think this may be mentioned in the recent Origins of Happiness book.)

Re fig 3 for UK, actually if you look at a longer time-series e.g. Eurobarometer (since 1973), UK life satisfaction has gone up significantly - from around 7.1 in the 1970s to 7.7/10 in 2018 (after transforming to 0-10 scale); and rather more since the only older survey I could find, 5.7/10 in 1948.

Also re China, you say elsewhere 'it’s SWB seems to have gone down been 1990 and 2015, even though per capita GDP increased by 5 times'. I haven't read the report you refer to, but I the World Value Survey on happiness (as opposed to life satisfaction) shows the proportion of Chinese who are happy rising from 67% in 1993 to 85% in 2014. (You can check it on the second chart here: https://ourworldindata.org/happiness-and-life-satisfaction# )

So maybe the Easterlin paradox does not exist, or at least, is limited.

MakoYass @ 2021-03-30T20:38 (+2)

I am really puzzled by those graphs, mm. But as to the Easterlin paradox, it's still alive: http://repec.iza.org/dp7234.pdf Happiness has been increasing, and so has GDP, but the rates of increase still don't seem to have much of a relationship.

bfinn @ 2021-03-30T20:50 (+2)

Thanks for this. There's been quite a bit more research since that paper, including by Easterlin, so not sure how relevant it is now. The latest I know FWIW is from last year's book by Richard Layard, Can We Be Happier?, which says it's unclear but maybe economic growth often increases happiness but not always.

undefined @ 2018-12-16T01:17 (+6)

Great topic, and hopefully I'm not coming off as pedantic when I say that we really need to make a distinction between happiness (which I argue is one of the fleeting emotions no different than fear or anger or embarrassment) and fulfillment (aka serenity, inner peace, contentment). People who are fulfilled embrace all of their emotions (including happiness) without judgement or an attempt to control, and they process them by connecting with themselves and others in healthy ways, which further leads to fulfillment. We are social, emotional apes, and all of the higher order emotions exist in order to bring the social group closer together.

So when it comes to measuring good, we should be measuring fulfillment, which itself is just a proxy for healthy emotional connections with other people. See the Harvard Longitudinal Study, Blue Zones, and Awakened Ape and even what it means to be a prosocial ape as some sources that point towards this theory.

undefined @ 2018-12-17T06:14 (+2)

I do feel the need to clarify that I'm talking about people who have already met Maslow's first and second levels. I do agree that there are people out there who lack in basic food/water, clothing, shelter, hygiene, health and safety. What I'm talking about here is essentially levels 3 and 4.

michaelchen @ 2019-07-30T20:41 (+5)

Diener et al. (2019) makes some statements that suggest that life satisfaction isn't a great proxy for subjective well-being:

In the current article, we review evidence from the first representative sample of humanity, the Gallup World Poll, and include many more nations that are very poor and troubled. We find that the majority of people are above neutral in affect balance but not life satisfaction.

people around the globe are more likely to score in the negative zone on judgment measures of life satisfaction than on experiential measures—life satisfaction versus affect balance. We found more negative scores for life satisfaction than for affect balance. This might be because affect balance tends to depend more on the meeting of universal needs, such as for social support, and societies on the whole have developed reasonably good methods for meeting these needs, even when poverty or even conflict are present. In contrast, life satisfaction depends on comparing one’s life to standards that change to some degree with culture and expectancies. Thus, many people in our sample might have scored low on life satisfaction because they desire the material life they perceive in wealthy nations, and conclude that they fall far short of this standard. It could be that life satisfaction is more influenced by movable standards compared with affect, and this means that even after substantial progress, people may believe that they fall short of their goals (Graham & Pettinato, 2006). The divergent patterns of the results for life satisfaction and for affect balance highlight the importance of measuring multiple aspects of SWB (Diener, 1984).

It also suggests that lacking material needs and lacking social support have a large impact on happiness, so it might be worthwhile to look into improving that in addition to working on mental illness:

Are there circumstances in which most people are no longer happy? To answer this question, we selected the respondents who had experienced five adverse events during the past year: (a) had been assaulted, (b) had property or money stolen, (c) had health problems, (d) did not have enough money for food, and (e) did not have enough money for shelter. In this group, only 26% of respondents (n = 796) evaluated their life as above 5, and only 51% (n = 1,567) reported more positive than negative affect. Thus, life circumstances may override the natural coping responses that most mentally healthy people possess. Finally, we examined people who had the above five adverse life events but also two other negative social circumstances—(a) nobody they could call on for support in an emergency and (b) feeling that they were not respected.4 Thus, this group had experienced physical and material problems during the past year as well as lack of social support. In this group of unfortunate individuals, only 20% had a positive affect-balance score (n = 72), 10% had a neutral score (i.e., they experienced the same number of positive and negative feelings, n = 37), and 70% had a negative score (n = 257).

For life satisfaction, only 12% (n = 43) had a score above neutral, 14% (n = 51) were at the neutral point, and 74% (n = 271) were below the neutral point. Thus, we found that most individuals who experienced very negative life events and lacked social support were far from happy. Although most people in the globe were happy, those who experienced physical, financial, and interpersonal problems were not.

Diener, E., Diener, C., Choi, H., & Oishi, S. (2018). Revisiting “Most People Are Happy”—And Discovering When They Are Not. Perspectives on Psychological Science, 13(2), 166–170. https://doi.org/10.1177/1745691618765111

undefined @ 2018-10-28T08:00 (+3)

Have you seen Bond & Lang 2018? Their abstract:

We replicate nine key results from the happiness literature: the Easterlin Paradox, the ‘U-shaped’ relation between happiness and age, the happiness trade-off between inflation and unemployment, cross-country comparisons of happiness, the impact of the Moving to Opportunity program on happiness, the impact of marriage and children on happiness, the ‘paradox’ of declining female happiness, and the effect of disability on happiness. We show that none of the findings can be obtained relying only on nonparametric identification. The findings in the literature are highly dependent on one's beliefs about the underlying distribution of happiness in society, or the social welfare function one chooses to adopt. Furthermore, any conclusions reached from these parametric approaches rely on the assumption that all individuals report their happiness in the same way. When the data permit, we test for equal reporting functions, conditional on the existence of a common cardinalization from the normal family. We reject this assumption in all cases in which we test it.

The paper seems extremely relevant to section 4.

Also, in contrast to your experience in trying to think of happiness nonlinearly, I find it quite easy to do. Intuitively, the difference between 1 (maximal unhappiness) and 2 is much bigger than between 2 and 3. Same for 8 vs 9 and 9 vs 10.

undefined @ 2018-10-31T09:14 (+2)

Hello Guzey. Yes, this paper has caused quite a stir. It's very hard to understand (at least as a non economist) what the paper is saying as it's filled with jargon and formulae, and the argument seems to turn on statistical considerations that are outside my scope of expertise. I had to ask a couple of economists to explain it to me.

As I now understand it, the authors' main objection is to the use of 3-point scales. What you can infer from such scales depends on what you think the underlying distribution of the data is that's being allocated into those three categories. If you make very different assumptions (e.g. utility is unbounded and the points on the scale are not 'equal-interval', i.e. the same distance apart) you can reverse the results. Such a reminder is useful and it's important to examine these assumptions. That said, their argument is not specifically about happiness measures, but about the use of scales with only a few points, so it's somewhat confusing that that's where they've leveled their critique. It's increasing thought that 3-point measures are unreliable and the modern literature predominantly uses 10-point scales. The authors don't test the robustness of their claims against 10-point scales (where, I gather, they would be less plausible).

I don't think I've update my views on the basis of this article. I still don't fully get what the paper is doing, so I'm relying on those I consider my epistemic superiors in this domain - the economists I've spoken to - and they don't think this poses a problem (relies on odd assumptions, doesn't apply elsewhere, etc.).

undefined @ 2018-11-04T04:14 (+1)

I think that the evidence you present in section 4, e.g. that people interpret scales as equal-interval and that immigrants have similar SWB, could be a good response to this paper, though, because it suggests that we can interpret the discrete life satisfaction scale as cardinal and just aggregate it instead.

undefined @ 2018-11-01T04:18 (+1)

The theoretical results don't depend on the scale being 3-point. Their argument deals directly with the assumed underlying normal distributions and transforms them into log-normal distributions with the order of the expected values reversed, so it doesn't matter how you've estimated the parameters of the normal distributions or if you've even done it at all.

In the case of life satisfaction scales, is there any empirical evidence we could use to decide the form of the underlying continuous distribution?

They do suggest that you could "use objective measures to calibrate cardinalizations of happiness", e.g. with incidence of mental illness, or frequencies of moods, as the authors have done something similar here https://www.nber.org/papers/w19243.

undefined @ 2018-11-30T17:30 (+2)

Well, I confess I don't fully understand the paper and a further social scientist I've since spoken to had a different take on what the paper said altogether. I'll try to bring this up with a few more people.

undefined @ 2018-10-26T20:25 (+3)

When considering observer moments, as in analysis of immune neglect, I suspect chronic pain shoots to the top pretty fast. My guess is that testing mental health Interventions along with a gold standard checklist of intervention strategies would be a potential top charity.

undefined @ 2018-10-31T09:18 (+1)

This could be true, but needs much more analysis. I've spoken about increasing opiates for pain relief elsewhere (e.g. my EAG 2017 talk, forum posts here on drug policy reform). It's not easy to compare systemic change - which is what opiate reform would be - to micro-interventions in general. Also I'm not aware of there being any data on the badness of chronic pain on a 1-10 LS scale to be able to make comparisons anyway.

ben.smith @ 2021-03-30T21:05 (+2)

I was a bit worried about some possible methodological issues with the GiveWell measures of life satisfaction. I looked into the data, and the issue doesn't completely undermine the result, but I think having looked closely I am now moderately less convinced that the negative spillover effect observed is a real problem.

Some measures of "life evaluation" use a technique like the Cantril Ladder; this is often used as a measure of happiness or subjective well-being (e.g., in the World Happiness Report 2021). In the words of that report, the Cantril ladder question "asks respondents to think of a ladder, with the best possible life for them being a 10, and the worst possible life being a 0. They are then asked to rate their own current lives on that 0 to 10 scale".

When measuring the "negative spillover" effects discussed above in Section 6, the Cantril would be an inappropriate measure to use. That's because, when respondents think of the ladder and imagine the "best possible life" and the "worst possible life", many are likely to anchor on to exemplars that are salient/at hand, like, for instance, the distribution of people in their local community.

Imagine two people in a community, Andrew and Bob. You give Andrew $100 and Bob $0, and then ask Bob to rate his own life on a 0 to 10 scale. Bob might think of Andrew, who just got $100 for nothing, and think of that as particularly good. His idea of how good life can get just got a little bit higher. (If you think this is silly, rather than imagining, I don't know, Elon Musk or some other fantastically privileged person, keep in mind that respondents probably have a few seconds to answer this questionnaire and what is salient is very important. for more discussion on the importance of salience in questionnaire respondents, consult the work of psychologist Norbert Schwarz at USC.) So Bob would be relatively lower on the ladder and he'd give himself a lower life evaluation score. But it isn't clear this really reflects any of Bob's subjective experience of happiness from day to day, other than times when he's asked to rate himself on Cantril ladders. That would depend on him actually making those subjective comparisons and feeling bad about them on a regular basis.

Fortunately, the GiveWell study did not use a Cantril Ladder. As explained by Haushofer, Reisinger, and Shapiro (2015), they used four well-being measures:

the “happiness” and “life satisfaction” questions from the World Values
Survey; the total score on the Center for Epidemiologic Studies Depression scale (CESD) (Radloff
1977); and total score on the Perceived Stress Scale (PSS)(Cohen, Kamarck, and Mermelstein 1983).

The happiness and life satisfaction from the World Values Survey can be viewed here and are:

"Taking all things together, would you say you are very happy, rather happy, not very happy, or not at all happy?"
"All things considered, how satisfied are you with your life as a whole these days? Using this card on which 1 means you are “completely dissatisfied” and 10 means you are “completely satisfied” where would you put your satisfaction with your life as a whole?"

Of the four well-being measures, the negative spillover result relates to the Life Satisfaction question specifically. It is possible that when answering (2) a respondent might use the same kind of social comparison processes that Bob used in the example above. But importantly, these weren't suggested to the respondent by the surveyer. If the respondent had used social comparison, they did so unprompted, and it's not so hard to imagine they do that often in their life in a way that affects their emotional affect.

Nevertheless, I'd be even more convinced if an effect on the happiness question had been observed, and the fact that the study observed a negative spillover result on Life Satisfaction and not Happiness does suggest that perhaps social comparison is occurring for Life Satisfaction in a way that doesn't seem to impact on the Happiness measure. If you think that Life Satisfaction tells us something additional to the Happiness measure about basic intrinsic hedonic utils, you should probably still be concerned about the negative spillover. If you think that Life Satisfaction is only important to the extent that it affects self-reported happiness, you should be cautious about interpreting the result of the negative spillover.

What the survey didn't do, because it's very expensive and hard, and require respondents to at least be able to text on a cellphone, is to measure basic momentary positive and negative affect through a method like Ecological Momentary Assessment. I'd be interested in a future study looking at that and seeing whether we observe (1) any effect of the GiveWell intervention on momentary positive and negative affect and (2) whether there are negative spillover effects there.

undefined @ 2018-11-27T16:16 (+2)

First of all, to Michael and team: Thanks a lot for writing this. I am very happy that this topic is now receiving more attention.

A couple of thoughts on two parts of the manifesto:

1) “The evaluative and experience measures do correlate, suggesting evaluative judgements are, in part if not in whole, determined by how happy people are. “ (OECD 2013, p32-34).

Looking into the source for this claim reveals that the correlation between life satisfaction and positive/negative affect is actually fairly low (r=(-)0.25). Also, this does not tell us much about which determines which (as a matter of causality). This suggests to me that relying only on life satisfaction as a proxy might be potentially misleading if we want to measure happiness as an affective component of SWB. On the other hand, as most studies to date have not measured this component, there does not seem to be a viable alternative (or is anyone aware of other potential proxies?). In any way, (as stated) we will need to be very cautious in interpreting LS outcomes.

2) "Let’s suppose instead the neutral point is 4. If this is so, saving the child is worth 0.4 life satisfaction points a year for 60 years, thus 24 LSPs (0.4 x 60)."

This calculation does not account for potential changes in the happiness of people in Kenya in the future. The WHR (https://s3.amazonaws.com/happiness-report/2018/WHR_web.pdf; p.24) shows that happiness in Kenya has increased by 0.27 points between 2008-2010 and 2015-2017 . This indicates that LS is actually increasing and saving lives might thus do more good than initially calculated. However, I am currently unaware of more data to come up with a reasonable proposal for an alternative calculation. Does anyone know more?

undefined @ 2018-11-30T17:20 (+3)

Hello Jasper, thanks for these.

On 1), I agree the correlation is only partial, which is why I said that we should use the LS data cautiously, keeping in mind when the two measures come apart. I think it would be worth writing up where they conflict and diverge.

In the case of mental health vs poverty, I think moving to affect measures from life satisfaction would would leave the priority ranking unchanged: Dolan and Metcalfe (2012) indicate mental health has a bigger impact on affect than LS, and Kahneman and Deaton (2008) that income has a bigger impact on LS than affect, hence if we ignore the negative externalities of the income transfers on LS, given StrongMinds seems 4x more cost-effective, this would only increase the comparative cost-effectiveness of mental health. I accept this is somewhat complication, and should be written up in greater depth.

It's also unfortunate that my claim here is hypothetical rather than based on actually affect measure of poverty alleviate vs mental health treatment. I'm currently talking to a couple of economists in the hope we can actually find this out!

2) You're quite right. I thought about getting into that but reckoned it was too complicated for an already long piece. I agree it would be worth thinking about someone's LS would arc of the course of their whole life. Another worthy research questions!

undefined @ 2018-11-04T06:05 (+1)

Regarding interpersonal comparisons,

So long as these differences are randomly distributed, they will wash out as ‘noise’ across large numbers of people

I think this is a crucial assumption that may not hold when comparing groups, i.e. there could be group differences (which could involve the same people, but before and after some event) in interpretations of the scales, due to differences in experiences between the two groups, e.g. a disability.

That immigrants to Canada seem to use the scales similarly to Canadians doesn't mean they weren't using the scales differently before they came to Canada. I think we actually discussed scale issues with life satisfaction on Facebook before (prompted by you?), and differences after adjusting for item responses seem to suggest different interpretations of the scale (or the items), or different relationships between the items. Two examples (cited in one of the papers in your reading list, https://www.researchgate.net/publication/230603396_Theory_and_Validity_of_Life_Satisfaction_Scales):

https://www.researchgate.net/profile/Robert_Biswas-Diener/publication/225502271_The_Divergent_Meanings_of_Life_Satisfaction_Item_Response_Modeling_of_the_Satisfaction_with_Life_Scale_in_Greenland_and_Norway/links/570184a808aea6b7746a7df9.pdf?origin=publication_list

http://www.people.virginia.edu/~so5x/IRT%20JRP%20revision%202.pdf

But there's an obvious response here: we should use IRT to adjust for different scale interpretations.

undefined @ 2018-10-26T01:08 (+1)

It looks like there might be confounders in the time series because there is a negative "effect" on life satisfaction prior to becoming disabled or unemployed. (With divorce and widowhood it's plausible that some people would see it coming years in advance.)

undefined @ 2018-10-26T08:15 (+1)

Good point.

I think it does make some sense for unemployment though as at least some proportion of people will forsee becoming unemployed either because they think they'll get fired or because they are dissatisfied with their jobs to the point they want to quit.

The data on being disabled it is a bit more troubling, but (without having read the report) one reason might be that this was official registration (or even official diagnosis) for disability which, for disabilities with slow onset (arthritis, some disabling diseases like MS, etc..), there would perhaps be a period of discomfort and pain before becoming officially disabled. Keep in mind that only some proportion of subjects need to have slow onset conditions to skew the average toward a negative dip before diagnosis (governmental or medical)

undefined @ 2018-11-30T17:31 (+2)

FWIW, some disabilities you might seen coming - if you have a worsening health state, say - and I think the analysis didn't have data on what was causing the disability, so it's a bit hard to say.

A Happiness Manifesto: Why and How Effective Altruism Should Rethink its Approach to Maximising Human Welfare

Abstract:

Table of contents

1. Introduction

2. The relationship between happiness and measures of subjective well-being

3. Can happiness really be measured?

4. Can we compare individuals’ happiness scores?

5. Using life satisfaction as the common currency for well-being adjusted life years (WALYs)

6. Evaluating the life satisfaction impact of (GiveWell) charities

7. What should effective altruists do next?

Annex A: How good are QALYs and DALYs as proxies for happiness?