My answers to Animal Charity Evaluators’ questions about cost-effectiveness analyses

By Vasco Grilo🔸 @ 2025-02-08T09:55 (+25)

Vince Mak, charity evaluations manager at Animal Charity Evaluators (ACE), asked me on January 2 for feedback on the methods they use in their cost-effectiveness analyses. Below are Vince’s questions, and my answers on January 6 with very minor edits. ACE mentioned they already received adequate feedback on them from myself and various other experts, and does not currently plan to engage further on this post.

1. Systematically estimating years of the impact of different interventions (also described as the amount of time the change was counterfactually brought forward). We know of a few estimates for corporate campaigns and policy change, but for most CEAs years of impact are one of the most uncertain and guessed parameters.

I agree it makes sense to model the benefits of corporate campaigns and legislative reform as making the proposed changes happen sooner by a certain number of years, and that this is a very uncertain parameter, but I have not thought about a systematic way to estimate it. Martin Gould and Saulius Šimčikas have thought about this.

Note interventions may have some benefits given failure too. I would estimate the overall expected benefits in animal-years from ("probability of success"*"years of impact given success" + (1 - "probability of success")*"years of impact given failure")*"number of animals that can be affected". The expected benefits from failure will be the main driver of the overall expected benefits if the probability of success is sufficiently low.

2. Similarly, we are looking to improve our estimates of the likelihood of success of a project for prospective CEAs. This estimate has to be a guess, but we are looking for a more systematic way to make these guesses.

I would estimate the probability of success of the target action being assessed from w_1*o_1 + w_2*o_2 + ... + w_n*o_n, where w_i and o_i are the weight and outcome of reference action i. The weights should add up to 1. Actions which are not considered are implicitly weighted at 0. o_i is 0 if the reference action i failed, and 1 if it succeeded. Reference actions which are more similar to the target action should be weighted more heavily. For example, for the public broiler welfare campaign that started in Portugal in December 2024, one can consider all the past broiler welfare campaigns (implicitly weighting at 0 all other actions), and weight more heavily ones which are public (similar tactic), in Portugal (if there were any, but there are not; same country), in the European Union (same region), in high income countries (similar income), or recent (similar period).

Here is a quick example of how to calculate the probability of success. Say we have a reference class with 3 actions:

The estimated probability of success for the target action would be 50 % (= 0.4*1 + 0.5*0 + 0.1*1).

If the weight is the same for all actions, the probability of success of the target action becomes "number of successful reference actions"/"number of reference actions", which is the base rate among the reference class. For the reference class above, it would be 2/3.

To weight each country the same, w_i = 1/"number of countries covered by the reference actions"/"number of reference actions in the country of reference action i". Likewise for tactics, regions, country income groups, periods, or other sets of actions. One can replace "country" with "set of actions" in the formula.

If one has information about the base rates in countries, but not the outcomes of the actions there, the probability of success can be calculated from "weight of country 1"*"base rate among the reference actions in country 1" + "weight of country 2"*"base rate among the reference actions in country 2" + ... + "weight of country N"*" base rate among the reference actions in country N". Likewise for tactics, regions, country income groups, periods, or other sets of actions. The weights should add up to 1. Vicky Cox knows about some base rates.

I think the method I described is better than directly guessing the probability of success, and probably achieves the best trade-off between simplicity and accuracy, but it still requires guessing the weights. To avoid this, one can model the probability of success as the logistic function p(x_1, x_2, ..., x_n) = 1/(1 + e^-(a_1*x_1 + a_2*x_2 + ... + a_n*x_n + b)), where x_1, x_2, ..., and x_n are the variables influencing the probability of success, and a_1, a_2, ..., a_n, and b are the parameters describing their effect. For example, if the probability of success mainly depended on the spending on the action (s), and real GDP per capita of the country of the action in the year it started (g), one could model the probability of success as p(s, g) = 1/(1 + e^-(a_1*s + a_2*g + b)). To weight all reference actions the same, the parameters are determined with a logistic regression of the outcome on the spending and real GDP per capita, among a given reference class. To weight reference actions differently, one can use a weighted logistic regression.

3. We use something resembling 90% subjective confidence intervals in our CEAs but would like to have more nuance around where we set upper and lower bounds.

I have run lots of Monte Carlo simulations, but have mostly moved away from them. I strongly endorse maximising expected welfare, so I think the final point estimate of the expected cost-effectiveness is all that matters in principle if it accounts for all the considerations. In practice, there are other inputs that matter because not all considerations will be modelled in that final estimate. However, I do not see this as an argument for modelling uncertainty per se. I see it as an argument for modelling the considerations which are currently not covered, at least informally (more implicitly), and ideally formally (more explicitly), such that the final point estimate of the expected cost-effectiveness becomes more accurate.

That being said, I believe modelling uncertainty is useful if it affects the estimation of the final expected cost-effectiveness. For example, one can estimate the expected effect size linked to a set of RCTs with inverse-variance weighting from w_1*e_1 + w_2*e_2 + ... + w_n*e_n, where w_i and e_i are the weight and expected effect size of study i, and w_i = 1/"variance of the effect size of study i"/(1/"variance of the effect size of study 1" + 1/"variance of the effect size of study 2" + ... + 1/"variance of the effect size of study n"). In this estimation, the uncertainty (variance) of the effect sizes of the studies matters because it directly affects the expected aggregated effect size.

Holden Karnofsky's post Why we can’t take expected value estimates literally (even when they’re unbiased) is often mentioned to point out that unbiased point estimates do not capture all information. I agree, but the clear failures of point estimates described in the post can be mitigated by adequately weighting priors, as is illustrated in the post. Applying inverse-variance weighting, the final expected cost-effectiveness is "mean of the posterior cost-effectiveness" = "weight of the prior"*"mean of the prior cost-effectiveness" + "weight of the estimate"*"mean of the estimated cost-effectiveness" = ("mean of the prior cost-effectiveness"/"variance of the prior cost-effectiveness" + "mean of the estimated cost-effectiveness"/"variance of the estimated cost-effectiveness")/(1/"variance of the prior cost-effectiveness" + 1/"variance of the estimated cost-effectiveness"). If the estimated cost-effectiveness is way more uncertain than the prior cost-effectiveness, the prior cost-effectiveness will be weighted much more heavily, and therefore the final expected cost-effectiveness, which integrates information about the prior and estimated cost-effectiveness, will remain close to the prior cost-effectiveness.

It is still important to ensure that the final point estimate for the expected cost-effectiveness is unbiased. This requires some care in converting input distributions to point estimates, but Monte Carlo simulations requiring more than one distribution can very often be avoided. For example, if "cost-effectiveness" =  ("probability of success"*"years of impact given success" + (1 - "probability of success")*"years of impact given failure")*"number of animals that can be affected"*"DALYs averted per animal-year improved"/"cost", and all these variables are independent (as usually assumed in Monte Carlo simulations for simplicity), the expected cost-effectiveness will be E("cost-effectiveness") = ("probability of success"*E("years of impact given success") + (1 - "probability of success")*E("years of impact given failure"))*E("number of animals that can be affected")*E("DALYs averted per animal-year improved")*E(1/"cost"). This is because E("constant a"*"distribution X" + "constant b") = a*E(X) + b, and E(X*Y) = E(X)*E(Y) if X and Y are independent. Note:

4. When working on more speculative and long-term interventions (research, meta-interventions, long-term policy change), in some cases researchers opt to create a speculative CEA (such as this one about research by AIM), and in other cases, they decide that a CEA is too speculative (for example Animal Advocacy Careers do not measure their final impact on animals, but rather their outcomes). We want a more principled way to determine how much weight to give to a CEA for decision-making. We are also looking to better understand approaches to modeling the impact of speculative and long-term interventions.

Elie Hassenfeld, the CEO of GiveWell, said on the Clearer Thinking podcast that "GiveWell cost-effectiveness estimates are not the only input into our decisions to fund malaria programs and deworming programs, there are some other factors, but they're certainly 80% plus of the case". GiveWell also clarified that "The numerical cost-effectiveness estimate in the spreadsheet is nearly always the most important factor in our recommendations, but not the only factor".

Quantifying the benefits of animal welfare interventions is less reliable than quantifying those of global health and development ones, because there is much more data on these. So a final cost-effectiveness estimate in DALY/$ or similar will have more weight on decisions about recommendations and grants in global health and development than in animal welfare. However, I would still like to see more quantification in animal welfare.

Not quantifying implies not thinking about trade-odds (explicitly), but these will always be made (implicitly). For example:

Both nearterm and longterm policy change can be modelled as accelerating by a given number of years the target policy changes.

Research interventions can be modelled as accelerating by a given number of years the discovery or improvement of direct interventions. Then one can look into how this changes funding decisions. For example, if a research project from Faunalytics or Rethink Priorities costing 100 k$ accelerated by 3 years:

Fundraising interventions can be modelled as moving funds from less to more cost-effective interventions. For instance, if FarmKind spends 100 k$, and moves 50 k$ from 0.1 to 1 DALY/$, and 200 k$ from 0 to 1 DALY/$, their impact would be 290 kDALY (= 100*10^3*(1 - 0.1) + 200*10^3*(1 - 0)), and their cost-effectiveness would be 2.90 DALY/$ (= 290*10^3/(100*10^3)), 2.90 (= 2.9/1) times the cost-effectiveness of the interventions to which they moved funds.

Personal consumption can be assumed to have a cost-effectiveness of 0 in the context of fundraising for top animal welfare interventions. If each doubling of net income is worth averting 0.5 DALYs, as assumed by Open Philanthropy, personal consumption of people with a low net income of 5 k$/year (roughly the minimum wage in Portugal) has a cost-effectiveness of 10^-4 DALY/$ (= 0.5/(5*10^3)). This is still way lower than the cost-effectiveness of the organisations promoted by FarmKind. I estimate cage-free campaigns have a cost-effectiveness of 4.59 DALY/$.

Career change interventions can be modelled as accelerating by a number of years changes in jobs. The impact of a career change can be estimated from "acceleration in years"*("increased impact via direct work in DALY/year" + "increased impact via donations in DALY/$"), where:

5. Final unit for measuring animal suffering. In our most recent cycle, we started using AIM’s new unit for animal suffering, Suffering-Adjusted Days (SADs). We are considering using it again, but want to review other options and better understand the upsides or downsides of this unit.

You can ask Vicky for AIM’s doc with ways to improve their estimation of SADs, and search for “Vasco” to see my suggestions. My main criticism is that AIM's pain intensities dramatically underestimate the intensity of excruciating pain, and therefore the cost-effectiveness of interventions decreasing it. To illustrate, I estimate the past cost-effectiveness of the Shrimp Welfare Project (SWP) is 639 DALY/$, or 173 times as cost-effective as cage-free campaigns. For AIM’s pain intensities, and my guess that hurtful paint is as intense as fully healthy life, I calculate the past cost-effectiveness of SWP is 0.484 DALY/$, which is only:

In other words, I estimate SWP is way more cost-effective than cage-free campaigns, whereas AIM's pain intensities, together with my guess that hurtful paint is as intense as fully healthy life, imply their cost-effectiveness is not that different. I currently much prefer using my own guesses for the pain intensities, as I did in my cost-effectiveness analyses of SWP and corporate campaigns for chicken welfare.

Vicky said they will start integrating the feedback they have received on SADs in February, although I do not know when updated estimates will be available.


SummaryBot @ 2025-02-10T20:07 (+3)

Executive summary: The author provides detailed feedback on Animal Charity Evaluators' (ACE) cost-effectiveness analysis (CEA) methods, suggesting ways to systematically estimate years of impact, probability of success, uncertainty modeling, and the evaluation of speculative interventions, while also critiquing the Suffering-Adjusted Days (SADs) metric.

Key points:

  1. Estimating years of impact: The author supports modeling corporate campaigns and legislative reforms as accelerating change but acknowledges the difficulty in systematically estimating this effect. Expected benefits should account for both success and failure probabilities.
  2. Probability of success estimation: The author proposes a weighted reference class approach to estimate success probability, favoring a logistic regression model over direct guessing to improve accuracy.
  3. Modeling uncertainty: While Monte Carlo simulations are common, the author prefers maximizing expected welfare through improved modeling rather than focusing on uncertainty estimates, emphasizing the importance of unbiased point estimates.
  4. Assessing speculative and long-term interventions: The author advocates for more quantification in animal welfare CEAs and suggests modeling research, policy, and fundraising interventions as accelerating beneficial changes, similar to GiveWell’s approach.
  5. Final unit for measuring animal suffering: The author critiques AIM’s Suffering-Adjusted Days (SADs) for underestimating intense pain, arguing that it undervalues high-impact interventions like shrimp welfare and proposing alternative pain intensity estimates.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.