How we work, #3: Our analyses involve judgment calls

By GiveWell @ 2023-12-06T22:09 (+32)

This is a linkpost to https://blog.givewell.org/2023/12/05/how-we-work-judgment-calls/

Author: Isabel Arjmand

This post is the third in a multi-part series, covering how GiveWell works and what we fund. Through these posts, we hope to give a better understanding of our research and decision-making.

How we work, #1: Cost-effectiveness is generally the most important factor in our recommendations
How we work, #2: We look at specific opportunities, not just general interventions

Our goal is to recommend funding to the programs we believe have the greatest impact per dollar donated. There's no simple algorithm for this question. Answering it necessarily involves making judgment calls. Our first post in this series discussed the importance of cost-effectiveness analyses and the many factors we consider; in this post, we'll share:

How we make subjective choices in the face of imperfect information
Some examples of judgment calls that illustrate our approach:
- Combining data and intuition: Estimating the effect of water chlorination on mortality
- Valuing disparate outcomes: Comparing clubfoot treatment to life-saving programs
- Anticipating the likely decisions of other actors: Predicting the impact of technical assistance for syphilis screening and treatment

Making decisions with imperfect evidence

Our work relies heavily on evidence, but the available evidence never answers a question with certainty.

Academic literature and its limitations

We don't take the results of any given study at face value.^[1] Instead, we often make adjustments along the way to come to a final estimate. As part of that process, we might consider:

The methodological limitations of the available studies
The likelihood of publication bias or spurious results
Whether the study results are likely to represent the impact of the specific program we're considering funding, which requires looking at potential differences in contexts and in the programs being implemented
How plausible the results seem when considering other relevant information, including whether there's a known mechanism by which a program might have a certain effect
The opinions of expert advisors
Other factors not listed here

Some questions can't easily be addressed by studies but are still important for assessing the impact of a program. Those include topics like:

Will another funder support this program if we don't?
Will this program be successfully transitioned to the government?
How likely is it that new research will provide information that changes our minds two or three years from now?
How bad is the experience of having disease A compared to the experience of having disease B?

Considering multiple perspectives

Some donors and other experts might reasonably disagree with our overall estimates; while we try to come to the best answer we can, our analyses necessarily involve subjective judgments.

In setting values for important parameters, we don't just rely on a single GiveWell researcher's opinion. In addition to reviewing scientific literature, we inform our views via:

Conversations with experts. In the course of a grant investigation, we often speak with experts to learn from their views. See this page for many examples.
Internal peer review. We subject our work to peer review. In a typical grant investigation, two or three GiveWell researchers who didn't investigate the grant might carefully review the work and share their feedback on the strengths and weaknesses of the case for the grant. We might also hold workshops to discuss key questions. See our review of New Incentives's program for some examples of peer feedback.
External expert review. Sometimes, we commission external consultants to review our work or particular scientific studies. For example, David Roodman has looked into tricky, complex questions like the effect of childhood deworming on later-in-life consumption; see his posts here and here. More recently, we commissioned a report from an iron and anemia expert to inform our view on concerns raised by an entry to our Change Our Mind Contest.

Case studies

Combining data and intuition: Estimating the effect of water chlorination on mortality

Mortality reduction is the single largest driver of the cost-effectiveness of water quality interventions in our models, but it's not straightforward to estimate how many lives clean water can save. For several years, we didn't recommend substantial funding to water quality programs; our best estimates placed their cost-effectiveness below that of our top charities. New results shared with us in 2020 led us to update our view: a meta-analysis by Michael Kremer and his team pooled mortality results from studies representing several types of water quality interventions in settings with unsafe water, providing stronger evidence for the effect of water quality on mortality as well as a higher point estimate for that effect than we'd previously used.

While this meta-analysis was a substantial update, we estimate the effect of water quality on mortality is smaller than the study results imply. At face value, the meta-analysis estimated a reduction in all-cause mortality among young children of about 30%—not only higher than could be explained by a reduction in diarrhea alone, but also higher than we'd expect when taking into account indirect effects (i.e., deaths that occur because waterborne diseases may make people more vulnerable to other causes of death).^[2]

To come to our estimate of the effect size, we completed our own meta-analysis, and we commissioned two external experts (see here and here) to review the original meta-analysis so that we could consider their views. We pooled data from the five RCTs in the Kremer meta-analysis that focus on chlorination (as opposed to other water quality interventions) and have follow-up lengths of at least one year (to exclude studies in which we think publication bias is more likely).^[3] We excluded one RCT that meets our other criteria because we think the results are implausibly high such that we don't believe they represent the true effect of chlorination interventions (more in footnote).^[4] It's unorthodox to exclude studies for this reason when conducting a meta-analysis, but we chose to do so because we think it gives us an overall estimate that is more likely to represent the true effect size.

Our estimate suggests that chlorination reduces all-cause mortality among children under five by roughly 12%, as compared to 30% in the Kremer meta-analysis on water quality interventions.^[5] (We then adjust this 12% figure for the particular context in which a program takes place, as described in our previous blog post.)

Our approach to calculating this effect has sparked thoughtful comments from others, such as Witold Więcek and Matthew Romer and Paul Romer Present. We think reasonable researchers might make different choices about how to interpret the literature and how to come to an overall estimate, and we're still considering other approaches. At the same time, we're currently comfortable with the approach we've chosen as a way of reaching a sensible bottom line.

Valuing disparate outcomes: Comparing clubfoot treatment to life-saving programs

In early 2023, we made a grant to support MiracleFeet's treatment of clubfoot, a painful congenital condition that limits mobility. To compare the cost-effectiveness of this program to other programs we fund, we need an estimate of how valuable it is to successfully treat a child's case of clubfoot compared to how valuable it is to avert the death of a child.

This question—how bad is a case of clubfoot?—doesn't have a "correct" numerical answer. The Institute for Health Metrics and Evaluation's Global Burden of Disease (GBD) study is our usual source for estimates of how bad different diseases and disabilities are, but it doesn't have an estimate specifically for clubfoot. We spoke with a disability expert and reviewed related literature to get a better sense of clubfoot’s impact on an affected person. We wanted to understand how painful clubfoot is, how much it impacts a person's ability to do things, and how stigma related to clubfoot affects well-being and opportunities.

Ultimately, we chose to use the GBD weight for "disfigurement level 2 with pain and moderate motor impairment due to congenital limb deficiency" as a proxy for bilateral clubfoot (i.e., clubfoot affecting both legs). Our understanding is that this GBD weight accounts for a physical disability that is sore and itchy, where the person has some difficulty moving around, and where other people might stare and comment. We selected it based on our understanding of the experience of untreated clubfoot, informed by the literature we read and our conversations with a disability expert and MiracleFeet. We use 75% of that value as a proxy for unilateral clubfoot (i.e., clubfoot affecting just one leg).^[6] With these values, each case of clubfoot averted over the course of someone's lifetime is one-quarter as valuable as averting the death of a young child.^[7] But, we're still very uncertain about the true value—our best estimate is that there's a 50% chance that the "true" value is between one-eighth as good and one-third as good as averting a death.^[8]

These estimates imply that MiracleFeet, which we believe averts a case of clubfoot for roughly $1,200,^[9] is about as cost-effective as our top charities. That's why we've funded it and are interested in learning more.

Anticipating the likely decisions of other actors: Predicting the impact of technical assistance for syphilis screening and treatment

We recommended a grant to Evidence Action for syphilis screening and treatment during pregnancy, which reduces adverse outcomes including neonatal mortality and stillbirths. Rather than directly delivering tests and antibiotics, Evidence Action is working to support the governments in Cameroon and Zambia in providing the program. The goal is to support the Ministries of Health in switching from using HIV rapid tests to using HIV/syphilis dual rapid tests for pregnant people and providing penicillin to people who test positive for syphilis.

For policy-oriented work, we have to make predictions and think about counterfactuals: What will happen with Evidence Action's support, and what would have happened without it? In assessing the expected cost-effectiveness of this grant, two key questions are:

Would Zambia or Cameroon scale up dual HIV/syphilis testing and treatment even in the absence of Evidence Action's support?
Will the program be successfully transitioned to the government such that it ultimately continues without ongoing support from Evidence Action?

Focusing just on Zambia, we estimated a 50% chance that the dual test would have been adopted by the government in the absence of Evidence Action's support, and we also estimated that had it been adopted by the government, it would have happened about a year later than with Evidence Action's support. With Evidence Action's support, we estimate an 80% chance of successful adoption, and a 70% chance that the program successfully transitions to the government. Much of the projected impact comes from increases in treatment; for Zambia, we're projecting 70% of people who test positive will receive treatment with Evidence Action's support, as compared to 45% (which we think is very roughly the current treatment rate) without it.

These figures are subjective, and come from our perspective on factors like the apparent level of government interest in implementing dual testing, the simplicity of the intervention, and the fact that it leverages existing HIV testing infrastructure.

We made predictions about the outcome of this grant here and will be able to look back and compare our predictions to what actually happens, though of course we won't be able to find out what would have happened if we hadn't funded the program.

Conclusion

We aim to create estimates that represent our true beliefs. Our cost-effectiveness analyses are firmly rooted in evidence but also incorporate adjustments and intuitions that aren't fully captured by scientific findings alone. (When there are gaps in the evidence base, we also fund additional research where feasible—more on that in a later post.)

Ultimately, we make decisions based on our best estimates even though some of the values are subjective and even though we aren't sure we're right. Our work requires taking action despite incomplete information. We make recommendations that reflect our true beliefs about where we believe funding can do the most good, and we share our decision-making process transparently so that people can evaluate our conclusions.

Notes

As an aside, GiveWell is often associated with randomized controlled trials (RCTs), which underpin many of our recommendations, but we also consider other experimental or observational evidence, especially when RCTs aren't feasible. RCTs provide particularly strong causal evidence because of how they're designed: the idea is to randomly allocate people or sets of people (e.g., villages or schools) to either a "treatment" group (which receives the intervention in question) or a "control" group (which doesn't). Randomization should mean the two groups are similar as long as the study population is large enough, so any substantial differences in outcomes can be attributed to the intervention.

Examples of recent research questions for which we've reviewed observational evidence and for which experimental evidence is hard to come by (for ethical or other reasons) include:

The effect of community delivery of oral rehydration solution on mortality from diarrhea
The effect of benzathine penicillin G (an antibiotic) in preventing negative health outcomes from congenital syphilis
The health effects of alcohol consumption

Studies on the mortality impact of historical water quality improvements suggest that water quality interventions may reduce mortality from non-waterborne diseases. This is called the Mills-Reincke phenomenon and provides a plausible explanation for the unexpectedly large estimate of the impact of chlorination on mortality in children under five. ↩︎

See this section of our intervention report for more detail. ↩︎

Briefly, Haushofer et al. 2021 finds a 63% reduction in all-cause mortality in children under five (95% confidence interval of 13% to 105%). It estimates that chlorination was 18 percentage points higher in the treatment than the control group. At face value, that result implies that drinking clean water averts most of the deaths that would otherwise occur among young children and/or there were very large spillover effects from water chlorination programs (even at the low end of the 95% confidence interval). Our view is that this is unlikely to be the case. See footnote 39 on this page for more detail. ↩︎

See footnote 2 here. ↩︎

See this section of our cost-effectiveness analysis. ↩︎

Or, to give another point of comparison, we're using an estimate where averting a year of clubfoot is half as good as doubling a person's income for a year. For a table of our moral weights for averting deaths and increasing consumption, see this document. ↩︎

See this grant page, including the table in the summary. The figures in the above paragraph don't include the additional benefit we estimate from potential increases in income in adulthood if clubfoot is successfully treated. ↩︎

See this grant page for data. ↩︎

SummaryBot @ 2023-12-07T13:05 (+1)

Executive summary: GiveWell makes judgment calls in cost-effectiveness analyses due to imperfect information, aiming to recommend the most impactful giving opportunities.

Key points:

There are gaps and uncertainties in academic evidence that require subjective assessments and adjustments.
GiveWell draws on literature reviews, experts, internal review, and external consultants to inform decisions.
Case studies illustrate judgment calls, like adjusting mortality estimates, valuing disability averted, and predicting government adoption of interventions.
Final recommendations reflect GiveWell's true beliefs given available data, recognizing some subjectivity.
Transparency allows evaluation of GiveWell's reasoning behind conclusions.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.