Are there good introductory materials that explain how charity evaluation works?

By Eevee🔹 @ 2022-05-06T03:03 (+26)

I'm interested in how the major charity evaluators (GiveWell, Charity Navigator/ImpactMatters, SoGive, etc.) measure the impact of charities, how their approaches differ, and what the pros and cons of each approach are.

Luke Freeman @ 2022-05-06T05:57 (+13)

Most of the organisations publish this but I'm not aware of a single resource that comprehensively compares them.

The closest I've seen is the "choosing a charity" article on Giving What We Can.

Also, many of the specific details of the different approaches change over time. And some organisations use different methods depending on the cause.

Samuel Dupret @ 2022-05-23T10:24 (+5)

Hello,
I am a research analyst at the Happier Lives Institute where we do cost-effective analyses (CEAs) of charities and interventions such as StrongMinds and GiveDirectly.

I am not aware of a document that collects all of the groups who do CEAs and compares and contrasts their methods. I would be quite interested in seeing one. The creation of such a document could look like a table with evaluating groups (GiveWell, OpenPhil, SoGive, Founders Pledge, Charity Entrepreneurship, Happier Lives Institute, etc.) as rows and important methodological questions as columns. This could be useful as it allows evaluators to learn from each other's methodology and allows people to see the different assumptions under which different evaluators are operating.

This raises the question, what are the important elements/questions/assumptions to consider? I will give my list based on thinking and methods at the Happier Lives Institute.

In a CEA, first you need to think about the effect. In EA, we are concerned with doing (the most) good. This leads to the first big question, what is ‘good’? A lot of people might agree that income or health are maybe not intrinsic goods that we want to increase, but instead that these are good for people, they are instrumental. Instead, an intrinsic good (which is good in of itself) would be wellbeing. There are different theories of wellbeing, and I am not going to get into them here, but you can read some summaries here and here.

Say that, like the Happier Lives Institute, we think wellbeing is the good we want to maximise and that we take a subjective wellbeing interpretation of this: what is good are the mental states of people, their evaluations of their lives and the positive and (lack of) negative feelings they experience. Great, we have defined the good/the effect. Note that you don’t have to 100% agree with an evaluator for their evaluation to be useful, you might value elements outside of human wellbeing and still find someone’s evaluation in terms of wellbeing very useful for your prioritisation about charities.

Next step, how do we measure this good/effect? This isn’t trivial as we can agree on the good we want and measure it in very different ways. For example, we - as evaluators - could make a list of things we believe improve people’s subjective wellbeing (e.g., income and health) and make inferences about how much they improve wellbeing (e.g., for every 1 DALY prevented, X units of wellbeing are gained). Alternatively, and this is the method we use at the Happier Lives Institute, we could use the most direct measures of subjective wellbeing: Answers to questions about how happy people are, how satisfied with life people are, how many positive or negative feelings people have been experiencing.

Once we know what we are measuring and how, there are a bunch of methodological questions that can be asked, at different levels of precision. For example, what sort of data do people value? Only RCTs, or any sort of empirical data? If we combine multiple studies, what kind of technique do you use (what kind of meta-analysis)? Do you use predictions from experts? Do you combine the priors (subjective beliefs) of evaluators - which might or might not be informed by empirical data - with the data? Should evaluators give subjective estimates for variables there isn’t any data for? Should we make subjective discounts concerning the quality of the data? [At the Happier Lives Institute we are into meta-analyses of multiple studies, if possible RCTs, and we insert some reasoning about causal mechanisms but we try to avoid subjective estimates as best we can.]

I’d like to mention some bonus - but quite important - considerations about how we go about the effects at the Happier Lives Institute. We are not just interested in the effect of people when they receive the intervention (e.g., when they receive a cash transfer), but also the longterm effect: How long does the effect last? Does it decay? Does it grow? How quickly does it change? Then we also want to think about the effects on people beyond the recipients of the intervention; spillovers. How much does helping one individual affect their household or their community?

Finally, there’s the cost. I think this can be simple for charities at point of delivery: How much money do they receive divided by how many interventions they deliver gives you the cost to deliver an intervention to one person. But this can likely be more complicated in other situations. For example, what is the cost of influencing policy?

A meta consideration when one is comparing evaluators, how much time and how many people are involved in doing the analysis? Is this the work of 2 weeks or 2 months?

These are all really interesting questions and getting an idea of how different evaluators answer them is IMHO useful for the public, current evaluators, and future evaluators trying to learn how to do CEAs.

Charles He @ 2022-05-06T07:59 (+3)

To onlookers: I’m not exactly answering the OP’s question and there’s high mansplaining potential, but my guess is that this comment has a chance of being useful.

Below is a narrative, not something I fully believe, because I think it’s incomplete (and I didn’t actually read the sources I quoted). But I think there is a chance this narrative is useful for the perspective it tries to share.

Narrative:

Basically, the root underlying value of evaluators comes from deep competence and powerful “models of reality”, this value does not come from a method.

Evaluators always share something like a “public method”, like spreadsheets and long essays, but these are not at all a complete presentation of how they reached their views, so reliance on them is problematic.

To be clear, all evaluators use numbers and tools like spreadsheets, and develop systems and processes. But in the end, these are surface methods, sort of a window dressing, explanation or heuristic to explain decisions whose information are stretched over a much deeper latent pool of skill and understanding. This skill and understanding isn’t legible to most people.

This is directly relevant to some motivations of the OP’s question, if they are trying to find and compare each org’s "Way to Measure Charities". This is because, in the narrative you are reading, it’s hard to learn where the actual content comes from, and copying the surface presentation is an anti-pattern.

Charles He @ 2022-05-06T08:18 (+3)

To see this, here's some stories:

A. GiveWell has deep understanding of science

This statement is incredible if you stop to think about it:

[Rob Wilbin says that GiveWell said:]
“There’s a high probability that the impact that this has is much smaller than what’s suggested in that study. And it might even be nothing.”
“There’s actually a decent chance that this will just not replicate, and in fact, that won’t have any impact at all.”

(I’m not sure what GiveWell actually said, but let's go with this paraphrase.)

Notice how much skill these statements convey. You would have to understand the process of scientific replication, and the related quality and robustness of the specific paper and subdomain. In scientific literatures, there’s many one-off papers like this—how did the they choose this one?

Also, even though GiveWell had this understanding, it’s very hard for anyone to measure this meta knowledge of science that lets them make this statement. How would you even go about asking about this?

Note that this is the right thing to do, getting a replication or doing a further investigation might take an impractically long time, during which many lives might be lost. It’s a deeply impressive statement. It's hard to publicly be able to say that the impact could collapse with high probability and still respectfully ask for donations.

B. Givewell is backing off EEVs

https://blog.givewell.org/2011/08/18/why-we-cant-take-expected-value-estimates-literally-even-when-theyre-unbiased/

The author says that EEV has flaws, we can’t just take expected values to make decisions.

They then use "Bayesian priors” in a labored, lengthy way to explain how decisions are made. (Because this Bayesian explanation is poorly seated, this stirs up a wobbly consequent debate.)

No one actually thinks anyone is updating some statistical model here. Maybe a shorter explanation the author could have made was, “Look, we are really good, and have deep models of the world, the data is only one layer.”

(I didn't actually read most of the above links).

Note that I’ve used GiveWell for all these examples. I think this is because GiveWell is the institution that, at the very least, stands well above the rest in EA in evaluators.

In less generous views, that EAs should be aware of, other evaluators might be much lower simulacra, sort of hiring a webmaster to copying the patterns into a new website.

Givewell’s quality is because of the people who moved through it over the years, not because of a spreadsheet or formula that can be handed down.

So this comment claims to touch on the OP's question.

Charles He @ 2022-05-06T08:38 (+3)

In addition to touching on the OP's question, I think this is extremely important to point out because, of all the meta institutions that can be miscarried, evaluators or a “GiveWell for X” are structurally the most prone to misuse, and this is dangerous.

To explain:

Recall the narrative above, that evaluator skill is hard to see. As a result, the public face of an evaluator is inherently superficial (and often deliberately dressed for public consumption and socialization). Few actually see the underlying knowledge or ability.
Evaluators must also be independent—they can’t just be voted down by the entities they review. They must resist public pressure, sort of like a judge.

The danger is that these features can be cruelly captured by parties who talk the talk, and occupy the spaces for these evaluator institutions.

There is the further danger that, once they start polishing their public copy, it’s hard to draw the line. It’s a slippery place to be. It’s not difficult to glide into long or wacky worldviews, or introduce technical methods that are shallow, and defeat examination through rhetoric or obfuscation.

(While it can occur) the issues often aren’t driven by malice.

To see this, click on my name. You’ll see writing in many comments. Some of it touches on purported deep topics. It sounds impressive, but is it true?

What if most of my comments and ideas are sophomoric and hardly informative at all? If I lack deep understanding, I’ll never be able to see that I’m a hack.

I can’t tell I’m merely a person with just enough knowledge to write a coherent story. For many observers, they will know even less—but regardless, me, I’ll continue on whether I'm right or wrong.

In the same way that someone can write endlessly about meta or talk about politics, it can be tempting to feel that evaluation is easy.

The underlying patterns are similar, unfortunately the inertia and power of the institutions created can be large. Once started, it's unclear what, if any, mechanism is available to stop it.

Ines @ 2022-05-08T17:35 (+1)

Much of SoGive's methodology is outlined on this blog, which I think is pretty accessible for beginners (but I think some parts are out of date)