GWWC's 2024 evaluations of evaluators

By Giving What We Can🔸, Aidan Whitfield🔸, Sjir Hoeijmakers🔸 @ 2024-11-27T08:44 (+177)

Introduction

The Giving What We Can research team is excited to share the results of our 2024 round of evaluations of charity evaluators and grantmakers!

In this round, we completed three evaluations that will inform our donation recommendations for the 2024 giving season. As with our 2023 round, there are substantial limitations to these evaluations, but we nevertheless think that they are a significant improvement to a landscape in which there were no independent evaluations of evaluators’ work.

In this post, we share the key takeaways from each of our 2024 evaluations and link to the full reports. We also include an update explaining our decision to remove The Humane League from our list of recommended programs. Our website has now been updated to reflect the new fund and charity recommendations that came out of these evaluations.

Please also see our website for more context on why and how we evaluate evaluators.

We look forward to your questions and comments!

Key takeaways from each of our 2024 evaluations

The three evaluators that we evaluated in our 2024 round of the evaluating evaluators project were:

Global health and wellbeing

Founders Pledge Global Health and Development Fund (FP GHDF)

Based on our evaluation, we have decided not to currently include FP GHDF in our list of recommended charities and funds and do not plan to allocate a portion of GWWC’s Global Health and Wellbeing Fund budget to the FP GHDF at this point. However, we still think FP GHDF is worth considering for impact-focused donors and we will continue to host the program on the GWWC donation platform.

Some takeaways that inform this decision include:

Our belief that FP GHDF evaluations do not robustly demonstrate that the opportunities they fund exceed their stated bar of 10x GiveDirectly in expectation, due to the presence of errors and insufficiently justified subjective inputs we found in some of their BOTECs that could cause the estimated cost-effectiveness to fall below this threshold.
Our uncertainty about the future quality of evaluations following the recent departure of the senior researcher responsible for managing the fund and our belief that there are currently insufficient peer review processes, such as red-teaming, in place to reassure us that the fund quality will remain at least consistent following this transition.

Taken together, these issues left us unable to justifiably conclude that the FP GHDF is currently competitive with GiveWell’s recommendations.

While our decision is not to recommend FP GHDF at this time, we would like to emphasise that we did not conclude that the marginal cost-effectiveness of the GHDF is unambiguously not competitive with GiveWell’s recommended charities and funds — in fact, we think the GHDF might be competitive with GiveWell now or in the near future. Instead, our finding is that we can’t currently justifiably conclude that the fund is competitive with GiveWell in expectation, based on the evidence we have seen in our evaluation and the uncertainty surrounding staffing changes.

We also highlight positive findings from our evaluation. For example, we think that FP GHDF’s strategy of funding promising early-stage organisations plausibly has significant positive externalities, and the fact that FP GHDF has made several grants to organisations before GiveWell indicates they have some track record for identifying these promising funding opportunities.

For more information, please see our 2024 evaluation report for FP GHDF.

Animal welfare

Animal Charity Evaluators’ Movement Grants (ACE MG)

Based on our evaluation, we have decided to include ACE MG in our list of recommended charities and funds and we expect to allocate half of GWWC’s Effective Animal Advocacy Fund to ACE MG.

The key finding informing this decision is that ACE MG’s recent marginal grants and overall grant decision-making look to be of sufficiently high quality to be competitive with our current top recommendation in animal welfare: EA Funds’ Animal Welfare Fund (EA AWF) — by our evaluation of EA AWF in 2023. This is in large part due to our view that ACE MG has improved meaningfully since our previous evaluation, when we concluded that ACE MG performed slightly less strongly on several proxies for the marginal cost-effectiveness of its grants. In particular, we noted improvements in the quality of ACE MG’s:

Marginal grantmaking
Weighting of geographical neglectedness
Approach to grant selection and sizing

We did find what we think to be room for improvement in ACE MG’s approach to more clearly integrating scope, particularly when comparing between applicants implementing different types of interventions. However, we don’t think this room for improvement affects the ACE MG’s position as being — to our knowledge — among the best places to recommend to donors.

We do not believe that our investigation identified ACE MG to be justifiably better or worse than EA AWF in terms of marginal cost-effectiveness — based on our 2023 evaluation of EA AWF — and so we continue to recommend EA AWF as a donation option alongside ACE MG, and to allocate half of our Effective Animal Advocacy Fund to EA AWF.

For more information, please see our 2024 evaluation report for ACE MG.

Animal Charity Evaluators’ Charity Evaluation Program

Based on our evaluation, we have decided not to currently include ACE’s Recommended Charities or the Recommended Charity Fund (RCF) in our list of recommended charities and funds nor to allocate a portion of GWWC’s Effective Animal Advocacy Fund to these programs at this point in time. However, we still think ACE’s Recommended Charities are worth considering for impact-focused donors and we may consider inviting some of these programs to apply to become supported programs on the GWWC donation platform in early 2025.

Some takeaways that informed our decision include:

While ACE has made improvements to its approach for capturing scope and cost-effectiveness since our 2023 evaluation, we are not yet convinced that these play a sufficient role in ACE’s ultimate charity recommendation decisions for us to be confident that ACE reliably makes scope sensitive recommendations.
We believe that ACE’s Charity Evaluation Program’s current approach does not sufficiently emphasise marginal cost-effectiveness as the most decision-relevant factor in evaluation decisions. For example, ACE primarily evaluates on the basis of organisations’ existing programs, rather than explicitly focusing their evaluation on those programs that are most likely to be funded on the margin. This is in direct contrast to ACE’s Movement Grants, which explicitly evaluates programs that ACE would be influencing funding to on the margin.

Additionally, while our decision is not to rely on ACE’s Charity Evaluation program, we would like to emphasise that we are not making the case that the marginal cost-effectiveness of (any particular) ACE Recommended Charity is clearly not competitive with ACE MG or the EA AWF. In fact, we think it is plausible that some of these charities are competitive with our recommended funds, and may be great options to consider for donors with particular worldviews and who have time to delve deeper into ACE’s materials. Instead, our conclusion is that we can’t justifiably conclude that the charity evaluation program consistently identifies charities competitive with the funding opportunities supported by our current recommendations (ACE MG and EA AWF), based on the evidence we have seen in our evaluation.

For more information, please see our 2024 evaluation report for ACE’s Charity Evaluation Program.

Additional recommendation updates

The Humane League’s corporate campaigns program

As part of our review of our recommended programs for the 2024 Giving Season, we have made the decision to remove The Humane League’s (THL) corporate campaigns program from our list of recommended programs. Our reason for this is that we believe we can no longer justifiably recommend THL’s program alongside our existing recommendations in animal welfare.

While our decision was to no longer recommend THL’s program, we want to be clear that our decision does not reflect any negative update on THL’s work. Instead, this is reflective of evidence being out of date, us maintaining the scope of the evaluating evaluators project, and keeping to our principles of usefulness, justifiability and transparency.

Specifically, this conclusion is predominantly based on the following:

We generally base our charity and fund recommendations on our evaluating evaluators project, for reasons explained here. This year, the evaluators and grantmakers we’ll rely on – based on our evaluations – will be EA Funds’ Animal Welfare Fund and ACE’s Movement Grants, and neither of these currently recommends THL as a charity (as neither makes charity recommendations).
In THL’s case last year, as explained in our report, we went beyond the scope of our usual project by using evidence provided by three further evaluators (Founders Pledge, Rethink Priorities and Open Philanthropy) to supplement the recommendation of an evaluator we had evaluated but decided not to rely on at that point (ACE) in order to ultimately recommend THL’s program. This was based on our overarching principles of usefulness, transparency and justifiability, via our judgement that providing a competitive alternative to the EA AWF – if we transparently and justifiably could – would be valuable.
However, because we expect the approach of grantmakers/evaluators to change more slowly than the work/programs of an individual charity, we are not comfortable relying on the information we used last year without undertaking a reinvestigation,^[1] which we don’t expect to do (see below)
Given we now have two recommendations in animal welfare, we consider it somewhat less useful to look further into THL (or any other individual program in animal welfare) and we don’t think we can currently justify going beyond the scope of our evaluating evaluators project in the same way as we could last year.

THL’s corporate campaigns remain a supported program on our platform for donors who wish to continue supporting their work, and though we can no longer justifiably recommend them, we still think they are an option worth considering for impact-focused donors with particular worldviews and time to engage with the evidence supporting their corporate campaigns program.

Conclusion

We have created a webpage for those interested in learning more about our 2024 iteration of evaluating evaluators.

In the future, we plan to continue to evaluate evaluators – extending the list beyond the seven we’ve covered across our first two rounds, improving our methodology, and reinvestigating evaluators we’ve previously looked into to see how their approach/work has or has not changed.

^{^}
Note that we do maintain our recommendation of EA AWF, as (like we stated above) we expect the quality of evaluations of a grantmaker – and by extension, the quality of the programs it funds when provided with an extra dollar – to change more slowly than the program funded by providing an extra dollar to any individual charity.

Matt_Lerner @ 2024-12-03T20:18 (+81)

FP Research Director here.

I think Aidan and the GWWC team did a very thorough job on their evaluation, and in some respects I think the report serves a valuable function in pushing us towards various kinds of process improvements.

I also understand why GWWC came to the decision they did: to not recommend GHDF as competitive with GiveWell. But I'm also skeptical that any organization other than GiveWell could pass this bar in GHD, since it seems that in the context of the evaluation GiveWell constitutes not just a benchmark for point-estimate CEAs but also a benchmark for various kinds of evaluation practices and levels of certainty.

I think this comes through in three key differences in perspective:

Can a grant only be identified as cost-effective in expectation if lots of time is spent making an unbiased, precise estimate of its cost-effectiveness?
Should CEAs be the singular determinant of whether or not a grant gets made?
Is maximizing calculated EV in the case of each individual grant the best way to ensure cost-effectiveness over the span of an entire grantmaking programme?

My claim is that, although I'm fairly sure sure GWWC would not explicitly say "yes" to each of these questions, the implication of their approach suggests otherwise. FP, meanwhile, thinks the answer to each is clearly "no." I should say that GWWC has been quite open in saying that they think GHDF could pass the bar or might even pass it today — but I share other commenters' skepticism that this could be true by GWWC's lights in the context of the report! Obviously, though, we at FP think the GHDF is >10x.

The GHDF is risk-neutral. Consequently, we think that spending time reducing uncertainty about small grants is not worthwhile: it trades off against time that could be spent evaluating and making more plausibly high-EV grants. As Rosie notes in her comment, a principal function of the GHDF has been to provide urgent stopgap funding to organizations that quite often end up actually receiving funding from GW. Spending GW-tier effort getting more certain about $50k-$200k grants literally means that we don't spend that time evaluating new high-EV opportunities. If these organizations die or fail to grow quickly, we miss out on potentially huge upside of the kind that we see in other orgs of which FP has been an early supporter. Rosie lists several such organizations in her comment.

The time and effort that we don't spend matching GiveWell's time expenditure results in higher variance around our EV estimates, and one component of that variance is indeed human error. We should reduce that error rate — but the existence of mistakes isn't prima facie evidence of lack of rigor. In our view, the rigor lies in optimizing our processes to maximize EV over the long-term. This is why we have, for instance, guidelines for time expenditure based on the counterfactual value of researcher time. This programme entails some tolerance for error. I don't think this is special pleading: you can look at GHDF's list of grantees and find a good number that we identified as cost-effective before having that analysis corroborated by later analysis from GiveWell or other donors. This historical giving record, in combination with GWWC's analysis, is what I think prospective GHDF donors should use to decide whether or not to give to the Fund.

Finally - a common (and IMO reasonable) criticism of EA-aligned or EA-adjacent organizations is an undue focus on quantification: "looking under the lamppost." We want to avoid this without becoming detached from the base numeric truth, so one particular way we want to avoid it is by allowing difficult-to-quantify considerations to tilt us toward or away from a prospective grant. We do CEAs in nearly every case, but for the GHDF they serve an indicative purpose (as they often do at e.g. Open Phil) rather than a determinative one (as they often do at e.g. GiveWell). Non-quantitative considerations are elaborated and assessed in our internal recommendation template, which GWWC had access to but which I feel they somewhat underweighted in their analysis. These kinds of considerations find their way into our CEAs as well, particularly in the form of subjective inputs that GWWC, for their part, found unjustified.

Jason @ 2024-12-05T03:37 (+12)

[highly speculative]

It seems plausible to me that the existence of higher degrees of random error could inflate a more error-tolerant evaluator's CEAs for funded grants as a class. Someone could probably quantify that intuition a whole lot better, but here's one thought experiment:

Suppose ResourceHeavy and QuickMover [which are not intended to be GiveWell and FP!] are evaluating a pool of 100 grant opportunities and have room to fund 16 of them. Each has a policy of selecting the grants that score highest on cost-effectiveness. ResourceHeavy spends a ton of resources and determines the precise cost-effectiveness of each grant opportunity. To keep the hypo simple, let's suppose that all 100 have a true cost effectiveness of 10.00-10.09 Units, and ResourceHeavy nails it on each candidate. QuickMover's results, in contrast, include a normally-distributed error with a mean of 0 and a standard deviation of 3.

In this hypothetical, QuickMover is the more efficient operator because the underlying opportunities were ~indistinguishable anyway. However, QuickMover will erroneously claim that its selected projects have a cost-effectiveness of ~13+ Units because it unknowingly selected the 16 projects with the highest positive error terms (i.e., those with an error of +1 SD or above). Moreover, the random distribution of error determined which grants got funded and which did not -- which is OK here since all candidates were ~indistinguishable but will be problematic in real-world situations.

While the hypo is unrealistic in some ways, it seems that given a significant error term, which grants clear a 10-Unit bar may be strongly influenced by random error, and that might undermine confidence in QuickMover's selections. Moreover, significant error could result in inflated CEAs on funded grants as a class (as opposed to all evaluated grants as a class) because the error is a in some ways a one-way rachet -- grants with significant negative error terms generally don't get funded.

I'm sure someone with better quant skills than I could emulate a grant pool with variable cost-effectiveness in addition to a variable error term. And maybe these kinds of issues, even if they exist outside of thought experiments, could be too small in practice to matter much?

Karthik Tadepalli @ 2024-12-08T14:20 (+2)

It's definitely true that all else equal, uncertainty inflates CEAs of funded grants, for the reasons you identify. (This is an example of the optimizer's curse.) However:

This risk is lower when the variance in true CE is large, especially if its larger than the variance due to measurement error. To the extent we think this is true in the opportunities we evaluate, this reduces the quantitative contribution of measurement error to CE inflation. More elaboration in this comment.
Good CEAs are conservative in their choices of inputs for exactly this reason. The goal should be to establish the minimal conditions for a grant to be worth making, as opposed to providing precise point estimates of CE.

Aidan Whitfield🔸 @ 2024-12-04T23:37 (+5)

Thanks for the comment, Matt! We are very grateful for the transparent and constructive engagement we have received from you and Rosie throughout our evaluation process.

I did want to flag that you are correct in anticipating that we do not agree that with the three differences in perspectives that you note here nor do we think our approach implies we do agree:

1) We do not think a grant can only be identified as cost-effective in expectation if a lot of time is spent making an unbiased, precise estimate of cost-effectiveness. As mentioned in the report, we think a rougher approach to BOTECing intended to demonstrate opportunities meet a certain bar under conservative assumptions is consistent with a GWWC recommendation. When comparing the depth of GiveWell’s and FP’s BOTECs we explicitly address this:

[This difference] is also consistent with FP’s stated approach to creating conservative BOTECs with the minimum function of demonstrating opportunities to be robustly 10x GiveDirectly cost-effectiveness. As such, this did not negatively affect our view of the usefulness of FPs BOTECs for their evaluations.

Our concern is that, based on our three spot checks, it is not clear that FP GHDF BOTECs do demonstrate that marginal grants in expectation surpass 10x GiveDirectly under conservative assumptions.

2) We would not claim that CEAs should be the singular determinant of whether a grant is made. However, considering that CEAs seem decisive in GHDF grant decisions (in that grants are only made from the GHDF when they are shown by BOTEC to be >10x GiveDirectly in expectation), we think it is fair to assess these as important decision-making tools for the FP GHDF as we have done here.

3) We do not think maximising calculated EV in the case of each grant is the only way to maximise cost-effectiveness over the span of a grantmaking program. We agree some risk-neutral grantmaking strategies should be tolerant to some errors and ‘misses’, which is why we checked three grants, rather than only checking one. Even after finding issues in the first grant we were still open to relying on FP GHDF if these seemed likely to be only occurring to a limited extent, but in our view their frequency across the three grants we checked was too high to currently justify a recommendation.

I hope these clarifications make it clear that we do think evaluators other than GiveWell (including FP GHDF) could pass our bar, without requiring GiveWell levels of certainty about individual grants.

Rosie_Bettle @ 2024-12-05T09:39 (+13)

Hey Aidan,

I want to acknowledge my potential biases for any new comment thread readers (I used to be the senior researcher running the fund at FP, most or all of the errors highlighted in the report are mine, and I now work at GiveWell.) these are personal views.

I think getting further scrutiny and engagement into key grantmaking cruxes is really valuable. I also think this discussion this has prompted is cool. A few points from my perspective-

As Matt’s comment points out- there is a historical track record for many of these grants. Some have gone on the be GiveWell supported, or (imo) have otherwise demonstrated success in a way that suggests they were a ‘hit’. In fact, with the caveat that there are a good number of recent ones where it’s too early to tell, there hasn’t yet been one that I consider a ‘miss’. Is it correct to update primarily from 3 spot checks of early stage BOTECs (my read of this report) versus updating from what actually happened after the grant was made? Is this risking goodharting?
Is this really comparing like for like? In my view, small grants shouldn’t require as strong an evidence base as like, a multimillion grant, mainly due to the time expenditure reasons that Matt points out. I am concerned about whether this report is getting us further to a point where (due to the level of rigour and therefore time expenditure required) the incentives for grantmaking orgs are to only make really large grants. I think this systematically disadvantages smaller orgs, and I think this is a negative thing (I guess your view here partially depends on your view on point ‘3’ below.)
In my view, a really crucial crux here is really about the value of supporting early stage stuff, alongside other potentially riskier items, such as advocacy and giving multipliers. I am genuinely uncertain, and think that smart and reasonable people can disagree here. But I agree with Matt’s point- that there’s significant upside through potentially generating large future room for funding at high cost effectiveness. This kind of long term optionality benefit isn’t typically included in an early stage BOTEC (because doing a full VOI is time consuming) and I think it’s somewhat underweighted in this report.
I no longer have access to the BOTECs to check (since I’m no longer at FP) and again I think the focus on BOTECs is a bit misplaced. I do want to briefly acknowledge though that I’m not sure that all of these are actually errors (but I still think it’s true that there are likely some BOTEC errors, and I think this would be true for many/ most orgs making small grants).

Aidan Whitfield🔸 @ 2024-12-11T04:58 (+3)

Hi Rosie, thanks for sharing your thoughts on this! It’s great to get the chance to clarify our decision-making process so it’s more transparent, in particular so readers can make their own judgement as to whether or not they agree with our reasoning about FP GHDF. Some one my thoughts on each of the points you raise:

We agree there is a positive track record for some of FP GHDF’s grants and this is one of the key countervailing considerations against our decision not to rely on FP GHDF in the report. Ultimately, we concluded that the instances of ‘hits’ we were aware of were not sufficient to conclude that we should rely on FP GHDF into the future. Some of our key reasons for this included:
1. These ‘hits’ seemed to fall into clusters for which we expect there is a limited supply of opportunities, e.g., several that went on to be supported by GiveWell were AIM-incubated charities. This means, we expect these opportunities to be less likely to be the kinds of opportunities that FP GHDF would fund on the margin with additional funding
2. We were not convinced that these successes would be replicated in the future under the new senior researcher (see our crux relating to consistency of the fund).
Ultimately, what we are trying to do is establish where the next dollar can be best spent by a donor. We agree it might not be worth it for a researcher to spend as much time on small grants, but this by itself should not be a justification for us to recommend small grants over large ones (agree point 3 can be a relevant consideration here though).
We agree that the relative value donors place on supporting early stage and riskier opportunities compared to more established orgs could be a crux here. However, we still needed a bar against which we could assess FP GHDF (i.e., we couldn’t have justifiably relied on FP GHDF on the basis of this difference in worldview, independent of the quality of FP GHDF’s grantmaking). As such, we tried to assess whether FP GHDF grant evaluations convincingly demonstrated that opportunities met their self-stated bar. As we have acknowledged in the report, just because we don’t think the grant evaluations convincingly show opportunities meet the bar, doesn’t mean they really don’t (e.g., the researcher may have considered information not included in the grant evaluation report). However, we can only assess on the basis of the information we reviewed.
Regarding our focus on the BOTECs potentially being misplaced, I want to be clear that we did review all of these grant evaluations in full, not just the BOTECs. If we thought the issues we identified in the BOTECs were sufficiently compensated for by reasoning included in the grant evaluations more generally this would have played a part in our decision-making. I think assessing how well the BOTECs demonstrate opportunities surpass Founders Pledge’s stated bar was a reasonable evaluation strategy because: a) As mentioned above, these BOTECs were highly decision relevant — grants were only made if BOTECs showed opportunities to surpass 10x GiveDirectly and we know of no instances where an opportunity scored above 10x GiveDirectly and would not have been eligible for FP GHDF funding. b) The BOTECs are where many of the researcher’s judgements are made explicit and so can be assessed. At least for the three evaluations we reviewed in detail, a significant fraction of the work in the grant evaluation was justifying inputs to the BOTECs. On the other point raised here, it is true that not all of the concerns we had with the BOTECs were errors. Some of our concerns related to inputs that seemed (to us) optimistic and were, in our view, insufficiently justified considering the decision-relevant effect they had on the overall BOTEC. While not errors, these made it more difficult for us to justifiably conclude that the FP GHDF grants were in expectation competitive with GiveWell.

Animal Charity Evaluators @ 2024-11-27T09:28 (+79)

We would like to extend our gratitude to Giving What We Can (GWWC) for conducting the "Evaluating the Evaluators" exercise for a second consecutive year. We value the constructive dialogue with GWWC and their insights into our work. While we are disappointed that GWWC has decided not to defer to our charity recommendations this year, we are thrilled that they have recognized our Movement Grants program as an effective giving opportunity alongside the EA Animal Welfare Fund.

Movement Grants

After reflecting on GWWC’s 2023 evaluation of our Movement Grants (MG) program we made several adjustments, all of which are noted in GWWC’s 2024 report. We’re delighted to see that the refinements we made to our program this year have led to grantmaking decisions that meet GWWC’s bar for marginal cost-effectiveness and that they will recommend our MG program on their platform and allocate half of their Effective Animal Advocacy Fund to Movement Grants.

As noted by GWWC, ACE’s MG program is unique in its aims to fund underserved segments of the global animal advocacy movement and address two key limitations to effectiveness within the movement:

Limited evidence about which interventions are effective and in which contexts
Disproportionate attention devoted to some regions and animal groups

For impact-focused donors seeking opportunities to build an evidence-based and resilient global animal advocacy movement, our Movement Grants program is an effective giving opportunity which supports brilliant animal advocates all over the world.

Alongside their recommendation of our MG program, GWWC has outlined several areas for improvement that we are grateful for and will reflect on.

We agree with these suggestions by GWWC:

Improving the MG model to reflect our movement-building strategy—we have already started to revise our theory of change for the MG program to explicitly separate the pathways to impact for grants that are made primarily on the basis of movement-building and which are made on a more ‘direct impact’ basis. This will help us better account for movement building in our model and make future assessments about whether we want to maintain this approach to our grantmaking.
Revising our use of Impact Potential (IP) scores in the MG model—the IP scores played a minimal role in our final grant decisions this year. Before the GWWC evaluation, we had already decided to revise our use of these scores because they do not model scope in a sufficiently useful way and they run the risk of combining parameters in our model in such a way that clouds the decision-relevant information.
Better integrating scope comparisons—while our current MG model has scope baked into some of the factors (e.g. theory of change, long-term impact), we agree with GWWC that a more robust approach would be to make scope a factor that our grant reviewers score separately. We intend to make this adjustment in our next round of Movement Grants and we appreciate the specific logistical suggestions from GWWC on how we might do this.
Improving the documentation of our reasoning for making grant decisions—this is mainly related to our internal processes that don’t have any bearing on our grant decisions; however, we agree with GWWC that despite our diligent record keeping for how our thinking evolves through the grant review process, we need a better record that summarizes, in one place, the main rationale and cruxes for each grant decision. This is something we intend to implement in our next granting round.

We also want to note the following challenge:

GWWC recommends that we introduce a clearer framework for prioritizing between interventions. We agree with this recommendation—of the possible interventions available to help animals, some are already excluded from applying or rejected at an early stage from our grantmaking based on the scope of impact. However, while we intend to make further improvements in scope comparison between interventions, there remain challenges due to the many externalities that affect intervention effectiveness and our ability to estimate them. GWWC notes in the report that we appear resistant to doing this because it would be unhelpfully speculative. We want to clarify that we are willing to make speculations where we think they will be useful while highlighting the challenges. There is a difference between the more speculative forward-looking cost-effectiveness analyses (CEAs) we would be undertaking as a grantmaker compared with the CEAs the Charity Evaluations team undertakes on completed work. To overcome this, we may also consider comparing the known cost-effectiveness of the most similar organizations’ previous work or leveraging CEAs that have used the total available information on an intervention (e.g. this estimate). However, we remain cautious about spending time and resources on trying to find comparable cost estimates between interventions when doing so might not sufficiently increase the overall marginal cost-effectiveness of our grant decisions. We also want to note that this is a challenge for any animal advocacy funder, not just ACE. This is an area where we expect to continue to try different approaches and improve year-on-year, balancing available information and our team’s capacity.

We are grateful for the rare opportunity to reflect deeply on our work and to learn from GWWC’s perspectives so that we can award grants that are the most impactful for animals. We are especially thankful too for the larger GWWC and EA community that is willing to support highly promising projects to help some of the most neglected individuals who suffer greatly.

Charity Evaluations

On the other hand, we are disappointed that GWWC does not find our Charity Evaluations program justifiably competitive with MG (and the EA Animal Welfare Fund) and believe that donors might miss out on some of the most impactful donation opportunities because of GWWC’s decision. We will elaborate on the relationship between our Charity Evaluations and Movement Grants below, but first address some points specific to Charity Evaluations.

We agree with some of GWWC’s conclusions and suggestions for improvement, which appear in their 2024 report. We think that focusing on these will improve the quality of our recommendations moving forward:

Devote more resources to quantitatively modeling cost-effectiveness, including better defining upper and lower bounds and exploring new ways to assess speculative programs (such as using a mix of qualitative and quantitative arguments).
Explicitly assess charities’ strategic prioritization, such as their own focus on cost-effectiveness. While we develop a sense of this internally as we evaluate charities, we think it could be helpful to take it into account more systematically.
Explore other methods for more comprehensively capturing the magnitude of impact and limiting factors. This might mean adapting the theory of change analysis to include systematically assessing the weakest-seeming and most fundamental assumptions.
Reconsider the minimum and maximum sizes of grants we’re willing to make through the Recommended Charity Fund (RCF), for example by potentially reconsidering the safeguards in the disbursement model that lead to more consistent grant amounts over time.

However, there are also areas where we disagree with GWWC’s conclusions. While we acknowledge these parts of our methods have room for improvement, we think the changes that they suggest may not make a meaningful difference to the quality of our recommendations:

Include all decision-relevant factors in the published charity reviews. GWWC has highlighted their concerns based on our public reviews, such as recommendation decisions that seemed to insufficiently take into account scope (e.g., comparing ÇHKD and Sinergia’s cost-effectiveness results, arguments for longer-term paths to impact, etc.). While we agree that it’d be ideal to more clearly publicly demonstrate and formalize cross-charity comparisons and the other thinking that leads to our decisions, we also want to note that we already consider much of what GWWC suggests in our internal decision-making. We’ve made significant adjustments to our charity reviews in the past year and will continue to refine them to ensure they are transparent and informative while remaining accessible and encouraging effective giving. We think it’s possible that the Charity Evaluations program may have been disadvantaged in this evaluation by our in-depth charity reviews since GWWC’s comments seem to be informed more by what we published rather than the quality of our internal decisions.
Update the decision-making process so that it directly compares all recommended charities on marginal cost-effectiveness. Our basis for deciding whether to add a Recommended Charity is whether we think it would lead to more animals being helped on the margin (compared to having a smaller number of Recommended Charities), which is conceptually different from ranking charities. Given the types of uncertainty currently faced by the animal advocacy movement when it comes to calculating cost-effectiveness, we decide whether a charity should be recommended based on a range of decision criteria rather than scoring and ranking charities based on our sense of their relative marginal cost-effectiveness. In the future, if we had sufficiently robust evidence to form reliable cost-effectiveness estimates, including evidence or good proxies for speculative work with complex long-term theories of change, it’s possible we would move more toward the kind of ranking approach that GWWC suggests. Additionally, we consider relative cost-effectiveness during each Recommended Charity Fund distribution, where we adjust the size of each grant depending on the most up-to-date plans that charities share with us.
Increase emphasis on the marginal dollar. We think our current methods adequately track where influenced funding would go. With the exception of granting restricted funding, we already implement GWWC’s suggestions. For example, we consider how charities’ programs are likely to change with gaining or losing an ACE recommendation and we also take into account charities’ strategic prioritization (although we don’t assess it systematically, as noted above, and some of this information is kept confidential so it doesn’t appear in our charity reviews). Additionally, this consideration only comes into play relatively infrequently; in most cases, ACE-influenced funding does not go toward novel programs but toward expansions of current programs, investments in the charity as a whole, or funding gaps.
Give more consideration to cost-effectiveness analyses (CEAs) in decisions not to recommend charities. Similar to GWWC’s evaluations of evaluators, ACE looks for evidence to justifiably conclude that a charity should be recommended, and we do not recommend charities in the absence of that evidence. When we say that a CEA did not play a meaningful role in our decision not to recommend a charity, we mean that it was insufficient to justify a recommendation. From that perspective, CEAs play just as much of a role to recommend as to not recommend a particular charity. While there are improvements we can make to our CEAs so that they are more indicative of cost-effectiveness (or lack thereof), we don’t see this as an indicator of a lack of scope sensitivity.

GWWC also suggests some broader strategic shifts in our programs that we plan to consider as a part of upcoming strategic planning. While these changes would help align our Charity Evaluations program more closely with GWWC’s criteria, we’re currently unsure if they would do the most good for the animal advocacy movement and for animals. These include:

Making restricted recommendations to specific programs, granting restricted funding, and removing the constraint that the fund grants to all current Recommended Charities each round. We need to carefully consider any trade-offs between the benefits of this idea and downstream considerations such as the needs of our donor base and losing the benefits of allowing Recommended Charities the flexibility to pursue what they need the most to maximize their impact. Also, restricting funding likely introduces some degree of funging, reducing any cost-effectiveness gains.
Allowing Recommended Charities to apply for Movement Grants. Theoretically, this would mean Recommended Charities could apply for targeted Movement Grants funding for their marginal programs (which we don’t currently permit). However, this might risk negatively impacting the movement-building goals of the Movement Grants program.
Changing the evaluation cycle from two years to one year. In theory, this would allow for a more precise assessment of charities’ upcoming programs, which is why ACE used a one-year cycle until 2021. However, we changed to a two-year cycle largely based on feedback from the charities we evaluated that annual evaluations were overly demanding/time-consuming. We’ll carefully consider the balance between timely evaluations, the burden on charities that we evaluate, and our team capacity (for comparison, ACE has an evaluations team of 5.5 FTE compared to GiveWell’s ~39 researchers).

Moving forward, we will continue evolving the Charity Evaluations program to find the organizations that can do the most good with additional donations and we thank GWWC for critically engaging with our work. We also appreciate that they acknowledge the difficulties of our work and the inherent differences between evaluators and funders in the animal advocacy space. However, given those difficulties, we’re currently not sure whether ACE nor any other evaluator that recommends whole charities in this space would be seen by GWWC as competitive with charitable funds that give restricted grants. Because of this, we’re not sure whether we think that GWWC’s current approach leads to the best outcomes for animals.

How ACE views Movement Grants vs. Charity Recommendations

We want to acknowledge that the language GWWC uses implies that they see our Movement Grants (MGs) and Charity Evaluations programs as “competitive” with each other. This is not a view we share—we see them as complementary.

Although there’s some overlap between charities that are a fit for each program, they serve different purposes:

The Recommended Charity Fund (RCF)
- provides high-impact evidence-based donation opportunities, and
- ensures critical organizations can do their underfunded and neglected work.
The Movement Grants program
- funds highly promising projects,
- is more likely to fund hits-based opportunities, and
- focuses on movement-building.

The programs are complementary and supplement each other:

MGs support charities that may go on to become Recommended Charities (RCs)—this is evidenced by the fact that the majority of our current RCs are former MG recipients.
Some charities are ineligible for our Charity Evaluations because RCs need to be able to absorb more funding and have more of a track record, and ACE needs to have confidence in them over a longer time horizon (the entire two-year recommendation cycle). This means that the RCF misses some of the highest-impact funding opportunities because it doesn't fund independent projects, brand-new charities, or only specific projects at a charity that might do a mixture of more and less cost-effective work.
We don’t give MGs to RCs for the duration of their recommendation, so the MG program also misses some of the highest-impact funding opportunities.
MGs can support former RCs if they re-enter an exploratory phase and need project- or program-specific funding to test out their ideas.

They also serve different donors:

In our view neither clearly has a higher expected value than the other. However, since the MG program funds projects with a smaller track record or projects that can have a longer-term impact in building the movement, which leads to more uncertainty and variance in outcomes, it is more suited to donors with a higher risk tolerance. On the other hand, the RCF includes more established, highly cost-effective organizations with a strong track record and a clear path to impact and appeals to more risk-averse donors.
Funds work well for individual donors but some users of ACE’s and GWWC’s services, such as certain Effective Giving Initiatives and other third parties, cannot collect donations for regranting funds. They must either set up their own fund composed of established, well-researched charities that can take in large amounts of unrestricted additional donations, and/or direct donations to specific charities.
The effective animal advocacy space is still deeply neglected and mostly funded by existing animal advocates. Therefore, we see ACE’s unique role as critical in securing counterfactually new donations to high-impact charities. We aim to engage audiences who are currently unfamiliar with both effective giving and EAA, and who might never change their individual behavior (for instance by changing their diet). To convert them to support farmed and wild animal welfare requires more accessible and simple solutions. Individual Recommended Charities are more intuitive and easy to understand and may be more appealing to this audience.

We are proud to support all of our current Recommended Charities and Movement Grantees, and would like to take this opportunity to celebrate the impactful work they do to help make the world a kinder place for animals.

- ACE Team

Aidan Whitfield🔸 @ 2024-12-02T04:07 (+4)

Thank you for your comment! We’ve really appreciated the open and thoughtful way ACE engaged with us throughout these evaluations.

We are excited to be adding Movement Grants to our list of recommended programs this year, and we think the improvements we observed since our last evaluation are a testament to ACE’s constructive approach to engaging with feedback. We are also excited to continue highlighting opportunities like the Recommended Charity Fund and several of ACE’s Recommendations as supported programs on our platform.

Vasco Grilo🔸 @ 2024-11-28T14:01 (+19)

From your 2024 evaluator research page:

Note: we were highly time-constrained in completing this second iteration of our project. On average, we spent about 10 researcher-days per organisation we evaluated (in other words, one researcher spending two full work weeks), of which only a limited part could go into direct evaluation work — a lot of time needed to go into planning and scoping, discussion on the project as a whole, communications with evaluators, and so on.

I think you did great based on the time you spent. Going through your evaluations, I thought to myself "good point" (or similar) so many times! Have you considered fundraising specifically for your evaluator research? I would be interested in supporting you as long as some (e.g. 1/3) of the additional funds go towards evaluating animal welfare evaluators.

Aidan Whitfield🔸 @ 2024-12-02T04:11 (+4)

Thanks for the positive feedback! We are actively considering the future of the GWWC research team, including whether we invest additional resources in future evaluating evaluators projects. To my knowledge, we have not considered fundraising specifically for the evaluating evaluators project. This might be an option we consider at some point, but I think we are probably unlikely to do so in the near future.

Vasco Grilo🔸 @ 2024-11-27T21:37 (+19)

Thanks, Aidan and Sjir! I really like this project.

While our decision is not to recommend FP GHDF at this time, we would like to emphasise that we did not conclude that the marginal cost-effectiveness of the GHDF is unambiguously not competitive with GiveWell’s recommended charities and funds — in fact, we think the GHDF might be competitive with GiveWell now or in the near future.

I do not understand why you "think the GHDF might be competitive with GiveWell now". From your report, it seems quite clear to me that donating to GiveWell's funds is better now. You "looked into [3] grants that were relatively large (by FP GHDF standards), which we expect to be higher effort/quality", and found major errors in 2 grants, and a major methodological oversight in the other.

Grant 1
When the grant was made, the opportunity was determined to be 42x GiveDirectly. However, on a re-evaluation for a later grant to the same organisation (where the original BOTEC was re-purposed), an extra zero was discovered in one of the inputs that had been used — which, when remedied, reduced the overall estimate to 12x GiveDirectly.
[...]
Grant 2
In the second grant we looked into, the effect size used to estimate the impact of the grant applied to children above a certain age, but annualised total childhood mortality in the region was used as the assumed base mortality rate to which the effect size was applied. Because most childhood mortality occurs in the first few months of life, we believe this is a mistake that overestimated the impact of the intervention by about 50%. When we applied adjustments to the model to account for this, the overall cost-effectiveness of the BOTEC fell below 10x GiveDirectly.
[...]
Grant 3
In the third grant we reviewed, our major concern was that FP modelled the probability that a novel technology would be significantly scaled up at 35%, with what we considered to be insufficient justification. This was particularly concerning because small reductions to this estimate resulted in the BOTEC going below 10x GiveDirectly.
Notably, in a subsequent evaluation conducted about 6 months after the first, FP adjusted this probability down to 25%. While we think it is positive FP re-evaluated this:
1) We still think this probability could be too high (for similar reasons to those noted in Grant 1).
2) Plugging this number into the original BOTEC sends the estimated cost-effectiveness below 10x GiveDirectly, which reinforces the point that the original BOTEC should not have been relied on in the original grant without further justification.
3) In the more recent evaluation — which repurposed the original grant evaluation and BOTEC, but where the same probability was modelled at 25% rather than 35% — there was no reference to why the number was updated or additional information about why the estimate was chosen. This increases our concerns around FP’s transparency in justifying the inputs used in their BOTECs.

jackva @ 2024-11-27T22:01 (+12)

Worth pointing out that FP staff who could reply on this is on Thanksgiving break, so will probably take until next week.

Rosie_Bettle @ 2024-11-27T22:56 (+80)

Hey Vasco, these are my personal thoughts and not FP’s (I have now left FP, and anything FP says should take precedence). I have pretty limited capacity to respond, but a few quick notes—

First, I think it’s totally true that there are some BOTEC errors, many/ most of them mine (thank you GWWC for spotting them— it’s so crucial to a well-functioning ecosystem, and more selfishly, to improving my skills as a grantmaker. I really value this!)

At the same time—these are hugely rough BOTECs, that were never meant to be rigorous CEA’s: they were being used as decision-making tools to enable quick decision-making under limited capacity (i do not take the exact numbers seriously: i expect they're wrong in both directions), with many factors beyond the BOTEC going into grantmaking decisions.

I don’t want to make judgments about whether or not the fund (while I was there) was surpassing GiveWell or not— super happy to leave this to others. I was focused on funders who would not counterfactually give to GW, meaning that this was less decision-relevant for me.

I think it's helpful to look at the grant history from FP GHDF. Here’s all the grants that I think have been made by FP GHDF since Jan 2023, apologies if i’ve missed any:

* New Incentives, Sightsavers, Pure Earth, Evidence Action

* r.i.c.e, FEM, Suvita (roughly- currently scaling orgs, that were considerably smaller/ more early stage when we originally granted)

* 4 are ecosystem multiplier-y grants (Giving Multiplier, TLYCS, Effective Altruism Australia, Effektiv Spenden)

* 1 was an advocacy grant to 1DaySooner (malaria vax roll out stuff), 1 was an advocacy grant to LEEP

* 5 are recent young orgs, that we think are promising but ofc supporting young orgs is hit and miss (Ansh, Taimaka, Essential, IMPALA, HealthLearn)

* 1 was deworming research grant

* 1 was an anti-corruption journalism grant which we think is promising due to economic growth impacts (OCCRP)

I think it's plausible that i spent too little time on these grant evals, and this probably contributed to BOTEC errors. But I feel pretty good about the actual decision-making, although I am very biased:

* At least 2 or 3 of these struck me as being really time-sensitive (all grants are time-sensitive in a way, but I'm talking about ‘might have to shut down some or all operations’ or ‘there is a time-sensitive scaling opportunity’).
* I think there is a bit of a gap for early-ish funding, and benefits to opening up more room for funding by scaling these orgs (i.e. funding orgs beyond seed funding, but before orgs can absorb or have the track record for multi million grants). Its still early days, but feel pretty good about the trajectories of the young orgs that FP GHDF supported.
* Having a few high EV, ‘big if true’ grants feels reasonable to me (advocacy, economic growth, R and D).

I hope this context is useful, and note that I can’t speak to FP’s current/ future plans with the FP GHDF. I value the engagement- thanks.

Aidan Whitfield🔸 @ 2024-12-02T04:16 (+6)

Thanks for the comment! While we did find issues that we think imply Founders Pledge’s BOTECs don’t convincingly show that the FP GHDF’s grants surpass 10x GiveDirectly in expectation in terms of marginal cost-effectiveness, we don’t think we can justifiably conclude from this that we are confident these grants don’t pass this bar. As mentioned in the report, this is partly because:

FP may have been sufficiently conservative in other inputs to compensate for the problems we identified
There are additional direct benefits (for example, morbidity benefits) to the grants that FP acknowledged but decided not to model
There may be additional large positive externalities from funding these early-stage and more neglected opportunities

Rosie’s comment also covers some other considerations that bear on this and provides useful context that is relevant here.

huw @ 2024-12-03T02:12 (+13)

I’ve been going through the evaluation reports and it seems like GWWC might not be as confident in Longview’s Emerging Challenges Fund or the EA Long-Term Future Fund as they are in their choices for GHD and Animal Welfare. The reports for these funds often include some uncertainties, like:

There were several limitations to the evaluation, including various conflicts of interest, limited access to relevant information (in part because the LTFF was not set up to be evaluated in this way), our lack of direct expertise and only limited access to expert external input, and the general difficulty of evaluating funding opportunities in the global catastrophic risk cause area.

We don’t know of any clearly better alternative donation option in reducing GCRs

On the other hand, the Founders Pledge GHD fund wasn’t fully recommended due to more specific methodological issues:

Our belief that FP GHDF evaluations do not robustly demonstrate that the opportunities they fund exceed their stated bar of 10x GiveDirectly in expectation, due to the presence of errors and insufficiently justified subjective inputs we found in some of their BOTECs that could cause the estimated cost-effectiveness to fall below this threshold.

Until I read various posts around the forum and personally looked into what LTFF in particular was funding, I was under the impression—partly from GWWC’s messaging—that the LTFF was at least comparable to a GiveWell or even an ACE. This is partly because GWWC usually recommend their GCR funds at the same time as these other funds.

It might be on me for having the wrong assumptions, so I wrote out my chain of thinking, and I’m keen to find out where we disagree:

There might be different evaluation standards between cause areas
These varying standards could mean a higher risk of wastage or even fraud in more risky funding opportunities; see this analysis
GWWC’s messaging and presentation might not fully convey these differences (the paragraph for these two funds briefly mentions lower tractability and an increased need for specialised grantmaking)
It might be challenging for potential donors to grasp these differences without clear communication

(I originally posted this to the 2024 recommendations but thought it might be more constructive / less likely to cause any issues over in this thread)

Michael Townsend🔸 @ 2024-12-03T11:59 (+15)

(I no longer work at GWWC, but wrote the reports on the LTFF/ECF, and was involved in the first round of evaluations more generally.)

In general, I think GWWC's goal here is to "to support donors in having the highest expected impact given their worldview" which can come apart from supporting donors to give to the most well-researched/vetted funding opportunities. For instance, if you have a longtermist worldview, or perhaps take AI x-risk very seriously, then I'd guess you'd still want to give to the LTFF/ECF even if you thought the quality of their evaluations was lower than GiveWell's.

Some of this is discussed in "Why and how GWWC evaluates evaluators" in the limitations section:

Finally, the quality of our recommendations is highly dependent on the quality of the charity evaluation field in a cause area, and hence inconsistent across cause areas. For example, the state of charity evaluation in animal welfare is less advanced than that in global health and wellbeing, so our evaluations and the resulting recommendations in animal welfare are necessarily lower-confidence than those in global health and wellbeing.

And also in each of the individual reports, e.g. from the ACE MG report:

As such, our bar for relying on an evaluator depends on the existence and quality of other donation options we have evaluated in the same cause area.
In cause areas where we currently rely on one or more evaluators that have passed our bar in a previous evaluation, any new evaluations we do will attempt to compare the quality of the evaluator’s marginal grantmaking and/or charity recommendations to those of the evaluator(s) we already rely on in that cause area.
For worldviews and associated cause areas where we don’t have existing evaluators we rely on, we expect evaluators to meet the bar of plausibly recommending giving opportunities that are among the best options for their stated worldview, compared to any other opportunity easily accessible to donors.

huw @ 2024-12-03T12:25 (+2)

Mmm, so maybe the crux is at (3) or (4)? I think that GWWC may be assuming too much about how viewers are interpreting the messaging and presentation around the evaluations. I think there is probably a way to signal the differences in evaluation strength while still maintaining the BYO worldview approach?

Michael Townsend🔸 @ 2024-12-03T18:11 (+7)

Just speaking for myself, I'd guess those would be the cruxes, though I don't personally see easy-fixes. I also worry that you could also err on being too cautious, by potential adding warning labels that give people an overly negative impression compared to the underlying reality. I'm curious if there are examples where you think GWWC could strike a better balance.

I think this might be symptomatic of a broader challenge for effective giving for GCR, which is that most of the canonical arguments for focusing on cost-effectiveness involve GHW-specific examples, that don't clearly generalize to the GCR space. But I don't think that indicates you shouldn't give to GCR, or care about cost-effectiveness in the GCR space — from a very plausible worldview (or at least, the worldview I have!) the GCR-focused funding opportunities are the most impactful funding opportunities available. It's just that the kind of reasoning underlying those recommendations/evaluations are quite different.

ElliotJDavies @ 2024-12-04T14:28 (+6)

canonical arguments for focusing on cost-effectiveness involve GHW-specific examples, that don't clearly generalize to the GCR space.

I am not sure I understand the claim being made here. Do you believe this to be the case, because of a tension between hits based and cost-effective giving?

If so, I may disagree with the point. Fundamentally if you're an "hit" grant-maker, you still care about (1) The amount of impact as a result of a hit (2) the odds on getting a hit (3) Indicators which may lead up to getting a hit (4) The marginal impact of your grant.

1&2) Require solid theory of change, and BOTEC EV calculations
3) Good M&E

Fundamentally, I wouldn't see much of a tension between hits based and cost-effective giving, other than a much higher tolerance for risk.

huw @ 2024-12-05T01:49 (+2)

I suppose to tack onto Elliot’s answer, I’m curious about what you see the differences in reasoning to be. If it is merely that GCR giving opportunities are more hits-based / high variance, I could see, for example, a small label being applied on the GWWC website next to higher-risk opportunities with a link to something like the explanations you’ve written above (and the evaluation reports).

That kind of labelling feels like only a quantitative difference from the current binary evaluations (as in, currently GWWC signals inclusion/exclusion, but could extend that to signal for strength of evaluation or risk of opportunity).

ElliotJDavies @ 2024-12-04T14:17 (+4)

Good job on highlighting this. While I very much understand GWWC's angle of approach here, I can see that there's essentially a dynamic that could be playing out whereby some areas (Animal Welfare and Global Development) get increasingly rigorous, while other areas (Longtermist problem-areas and Meta-EA) don't receive the same benefit.

Aidan Whitfield🔸 @ 2024-12-05T08:03 (+3)

Thanks for the comment! While we think it could be correct that the quality of evaluations differs between our recommendations in different cause areas, my view is that the evaluating evaluators project applies pressure to increase the strength of evaluations across all cause areas. In our evaluations we communicate areas where we think evaluators can improve. Because we are evaluating multiple options in each cause area, if in future evaluations we find one of our evaluators has improved and another has not, then the latter evaluator is less likely to be recommended in future, which provides an incentive for both evaluators to improve their processes over time.

Aidan Whitfield🔸 @ 2024-12-04T23:51 (+3)

Thanks for your comment, Huw! I think Michael has done a great job explaining GWWC’s position on this, but please let us know if we can offer any clarifications.

KarolinaSarek🔸 @ 2024-12-03T13:12 (+12)

Thank you for these thorough reports and the project as a whole! As Chair of the EA Animal Welfare Fund, I'm very grateful for GWWC's continued work evaluating evaluators and grantmakers in the animal welfare space and personally grateful for their work across all cause areas. I think this sort of meta-evaluation is incredibly valuable. This year, I'm particularly excited to see ACE's Movement Grants join the recommended list this year - their improvements are great developments for the field. Last year's evaluation of AWF was also very helpful for our team, and we're looking forward to the re-evaluation next year. It's encouraging to see the evaluation and funding ecosystem becoming increasingly robust. Thank you for your work!

Aidan Whitfield🔸 @ 2024-12-04T23:39 (+3)

Thanks so much for your comment, Karolina! We are looking forward to re-evaluating AWF next year.

MHR @ 2024-11-27T15:10 (+12)

Thank you Sjir and Aidan for this excellent work! I think it's quite valuable for community epistemics to have someone doing this kind of high-quality meta-evaluation. And it seems like your dialogue with ACE has been very productive.

Selfishly as somone who makes a number of donations in the animal welfare space, I'm also excited by the possibility of some more top animal charities becoming supported programs on the donation platform :)

Aidan Whitfield🔸 @ 2024-12-02T04:19 (+5)

Thanks so much for your comment, we appreciate the positive feedback! We plan to open applications for our supported programs status in Q1 2025 and intend to invite ACE-recommended charities to apply. We will make decisions about which charities we onboard based on our inclusion criteria, which we’ll update early next year but expect to not change dramatically.

Caroline Mills @ 2024-12-03T23:52 (+10)

Hi there, writing on behalf of The Humane League!

THL is grateful to GWWC for their support of our corporate campaigns to reduce farmed animal suffering, previously as a recommended program and moving forward as a supported program. GWWC provides an important service to the philanthropic community, and the funds they have directed to THL have driven highly impactful victories to reduce animal suffering, including a recent groundbreaking victory that will spare 700,000 hens from immense suffering annually once implemented and kickstart the cage-free supply in Asia, a neglected yet highly important region for corporate campaigns.

While we are disappointed that our corporate campaigns to reduce farmed animal suffering will no longer be listed as a recommended program, we understand GWWC’s decision to limit the scope of their recommendations in line with their evaluating evaluators project. We hope that donors will continue to support our corporate campaigns and the Open Wing Alliance as highly impactful giving opportunities to reduce the suffering of millions of sentient beings and build an effective global animal advocacy movement, given the strong marginal impact of our supported regranting programs in neglected regions. We are grateful that GWWC will continue accepting gifts on THL’s behalf; in addition gifts can be made directly to THL, or through our other international charity partners for donors outside the US seeking tax deductions.

If you have any questions about THL’s programs or high-impact funding opportunities, please reach out to Caroline Mills at cmills@thehumaneleague.org.

With gratitude,

The THL team

Aidan Whitfield🔸 @ 2024-12-04T23:32 (+1)

Thanks for your comment, Caroline! We are excited to continue hosting THL as a supported program on the GWWC platform, so donors can continue supporting your important work.

Vasco Grilo🔸 @ 2024-11-28T17:34 (+6)

I very much agree with your decision of not recommending ACE's RCF, and I think your evaluation was great.

Another example where we think ACE could have engaged more robustly with their CEAs is in cases where ACE did not reach estimates for Suffering-Adjusted Days (SADs) averted per dollar.

Did you have the chance to look into the pain intensities Ambitious Impact uses to calculate SADs? They are not public, but you can ask Vicky Cox for the sheet. I think they hugely underweight excruciating pain, such that interventions addressing this (like Shrimp Welfare Project’s Humane Slaughter Initiative) have their cost-effectiveness underestimated a lot. Feel free to ask Vicky for my comments on the pain intensities.

Nitpick. ACE's additional funds discount for low uncertainty should be higher than 0 % (and lower than 20 %), or there is a typo in the table below?

ACE considers charities’ priorities for additional funds — in other words, how the charity would use unexpected additional funding — and assesses ‘uncertainty about effectiveness and tractability of plans’ — how likely they think each line item in the charity’s plan is to experience diminishing returns or not play out as expected under the charity’s growth plan. ACE assesses each line item in a charity’s proposed future plans on a scale from very low uncertainty to very high uncertainty, with a corresponding discount applied to the cost of the line item.
Category Discount
Very low uncertainty 0%
Low uncertainty 0%
Moderate/low uncertainty 20%
Moderate uncertainty 40%
Moderate/high uncertainty 60%
High uncertainty 80%
Very high uncertainty 100%

Aidan Whitfield🔸 @ 2024-12-02T04:36 (+4)

Thanks for the comment! We didn’t look deeply into the SADs framework as part of our evaluation, as we didn’t think this would have been likely to change our final decision. It is possible we will look into this more in future evaluations. I currently expect use of this framework to be substantially preferable to a status quo where there is not a set of conceptually meaningful units for comparing animal welfare interventions.

On ACE’s additional funds discounts, our understanding is that the 0% discount for low uncertainty is not a typo. ACE lists their uncertainty categories and the corresponding discounts under Criterion 2 on their evaluation criteria page.

mjkerrison🔸️ @ 2024-11-29T05:12 (+5)

Love to see these reports!

I have two suggestions/requests for 'crosstabs' on this info (which is naturally organised by evaluator, because that's what the project is!):

As-of-today, which evaluators/charities sit where on the recommendation scale. The info for that is mostly on GWWC's website but not quite organised as such. I'm thinking of rows for cause areas, columns for buckets, e.g. 'Recommended' at one end and 'Maybe not cost-effective' at the other (though maybe you'd drop things off altogether). Just something to help visualise what's moved and by how much, and broadly why are things sitting where they are (e.g. THL corporate campaigns sliding off the recommended list for 'procedural' reasons, so not in the Recommended column but now in a 'Nearly' column or something).
I'd love a clear checklist of what you think needs improvement per evaluated program to help with making the list a little more evergreen. I think all that info is in your reporting, but if you called it out I think it would
1. help evaluated programs and
2. help donors to
  1. get a sense for how up-to-date that recommendation is (given the rotating/rolling nature of the evaluation program)
  2. and possibly do their own assessment for whether the charity 'should' be recommended 'now'.

Aidan Whitfield🔸 @ 2024-12-02T05:17 (+3)

Thanks for the comment — we appreciate the suggestions!

With respect to your first suggestion, I want to clarify that our goal with this project is to identify evaluators that recommend among the most cost-effective opportunities in each cause area according to a sufficiently plausible worldview. This means among our recommendations we don’t have a view about which is more cost-effective, and we don’t try to rank the evaluators that we don’t choose to rely on. That said, I can think of two resources that might somewhat address your suggestion:

This section on our 2024 evaluating evaluators page explains which programs have changed status following our 2024 evaluations and why
In the other supported programs section of our donation platform, we roughly order the programs based on our preliminary impression of which might be most interesting to impact-focused donors in each cause area. To do this we take into account factors like if we’ve previously recommended them and if they are currently recommended by an impact-focused evaluator.

With respect to your second suggestion, while we don’t include a checklist as such, we try to include the major areas for improvement in the conclusion section of each report. In future we might consider organising these more clearly and making them more prominent.

Vasco Grilo🔸 @ 2024-11-28T11:26 (+5)

From your evaluation of ACE MG:

Using quite an expansive definition of institutional diet change advocacy, we identified that two of the 30 EA AWF grants (constituting 0.27% of funding) went to these kinds of activities, compared to six out of 28 MG grants (constituting 11.12% of funding).
[...]
Are institutional diet change advocacy campaigns competitive with other opportunities?
To understand whether this difference constituted a concern, we asked experts for their views on the promisingness of institutional diet change advocacy.
[...]
On existing evidence, some institutional diet change advocacy campaigns can meet a cost-effectiveness bar, but this is not typical.
It’s probable that most of the impact of institutional diet change advocacy runs through direct impact rather than movement building.

Which diet change interventions are more cost-effective than cage-free corporate campaigns? Meaningfully reducing meat consumption is an unsolved problem, and I suspect many diet change interventions are harmful due to leading to the replacement of beef and pork with poultry meat, eggs (which can be part of a vegetarian diet), fish and other seafood.

In any case, I do not think this is that important for your recommendation. You found only 11.1 % of the money granted by ACE MG went to diet change interventions, and it does not look like marginal grants were super related to diet change.

In support of being more funding constrained, ACE also provided us with a summary of how they would likely have disbursed an additional $200K–300K USD in the 2024 round. This included a mixture of fully funding more of the opportunities that they only partially funded, and funding some opportunities that missed out altogether in the round — in particular, two opportunities related to insect farming. We attempted to naively model the scope of some of the opportunities that MG indicated they would have funded and concluded that we expect these opportunities to have relatively high marginal cost-effectiveness — competitive with other opportunities ACE MG funded.

Aidan Whitfield🔸 @ 2024-12-02T06:17 (+4)

Thanks for the comment! I first want to highlight that in our report we are specifically talking about institutional diet change interventions that reduce animal product consumption by replacing institutional (e.g., school) meals containing animal products with meals that don’t. This approach, which constitutes the majority of diet change programs that ACE MG funds, doesn’t necessarily involve convincing individuals to make conscious changes to their consumption habits.

Our understanding of a common view among the experts we consulted is that diet change interventions are generally not competitive with promising welfare asks in terms of cost-effectiveness, but that some of the most promising institutional diet change interventions plausibly could be. For example, I think some of our experts would have considered the grant ACE MG made to the Plant-Based Universities campaign worth funding. Reasons for this include:

The organisers have a good track record
The ask is for a full transition to plant-based catering, which reduces small animal replacement concerns
The model involves training students to campaign, meaning the campaign can reach more universities than the organisation could by going school-to-school themselves

As noted in the report, not all experts agreed that the institutional diet change interventions were on average competitive with the welfare interventions ACE MG funded. However, as you noted, this probably has a fairly limited impact on how cost-effective ACE MG is on the margin, not least because these grants made up a small fraction of ACE MG’s 2024 funding.

Vasco Grilo🔸 @ 2024-12-02T10:45 (+2)

Thanks, Aidan. For reference, I estimated corporate campaigns for chicken welfare are 25.1 times as cost-effective as School Plates, which is a program aiming to increase the consumption of plant-based foods at schools and universities in the United Kingdom.

Vasco Grilo🔸 @ 2024-11-27T21:50 (+5)

However, because we expect the approach of grantmakers/evaluators to change more slowly than the work/programs of an individual charity, we are not comfortable relying on the information we used last year without undertaking a reinvestigation,^[1] which we don’t expect to do (see below)

I think the rate of change is overwhelmingly a function of size, not of whether it respects grantmakers/evaluators or work/programs. I would expect THL's work/program to change more slowly than the Animal Welfare Fund. In any case, I agree with your decision of not recommending THL. As you say, it is more consistent with the scope of your project:

We generally base our charity and fund recommendations on our evaluating evaluators project, for reasons explained here. This year, the evaluators and grantmakers we’ll rely on – based on our evaluations – will be EA Funds’ Animal Welfare Fund and ACE’s Movement Grants, and neither of these currently recommends THL as a charity (as neither makes charity recommendations).

Aidan Whitfield🔸 @ 2024-12-02T05:20 (+4)

Hi Vasco, thanks for the comment! I should clarify that we are saying that we expect marginal cost-effectiveness of impact-focused evaluators to change more slowly than marginal cost-effectiveness for charities. All else equal, we think size is plausibly a useful heuristic. However, because we are looking at the margin, both the program itself and its funding situation can change, and as THL hasn’t been evaluated for how it allocates funding on the margin or starts new programs, but just on the quality of its marginal programs at the time of evaluation, there is a less robust signal there than there is for EA AWF, which we did evaluate on the basis of how it allocates funding on the margin. I hope that makes sense!

Vasco Grilo🔸 @ 2025-07-31T17:09 (+4)

From your evaluation of ACE's Movement Grants:

Removal of funding caps
ACE announced that they removed the $50K USD funding cap on Movement Grants applications and the $150K USD lifetime cap on Movement Grants applicants as of 2024. We were concerned in our 2023 evaluation that these caps constrained ACE MG’s ability to size grants in a way that was optimal for marginal cost-effectiveness. As such, we think that this reflects an improvement. Some early benefits of removing these caps have already been realised, with ACE MG advising two promising-seeming grants this year for amounts over $50K USD.
Seven grantees received a grant of $50K USD in 2024, which we were initially concerned might reflect that a $50K USD cap is still implicitly used by ACE MG. However, among the six of these that we have information on, five applied for $50K and received full funding; only the sixth received partial funding. For this sixth grantee, we do not think that the award of $50K is good evidence of an implicit $50K cap, as there is information in MG’s notes that they were also considering a smaller amount for this applicant and increased this to $50K — implying the grant was not intended to be larger and then simply rounded down to an arbitrary bar. We are also aware of other instances where MG was considering grants larger than $50K USD, but decided against making them for reasons not related to their previous cap.
Overall, we think that the removal of the arbitrary cap on grant size is a positive update that has enabled ACE MG to grant to highly promising opportunities that would otherwise not be available to the program.

Among the 24 Movement Grants ACE announced on 30 July 2025, the amount granted of 1/3 (= 8/24) fell between 49.7 k and 51 k$. Is there an informal cap, @eleanor mcaree?

From the same evaluation:

In the future, we would hope for ACE MG to have a clearer framework for prioritising between interventions. However, given that our review of more than 10 individual grants across our various investigations indicates that ACE MG’s grants are generally promising, we do not think this concern is sufficient to constitute a meaningful difference in the expected cost-effectiveness of ACE MG and EA AWF. This is particularly true as EA AWF was not doing clearly better on this metric in our 2023 evaluation than ACE MG is now:
[W]e saw some references to the numbers of animals that could be affected if an intervention went well, but we didn’t see any attempt at back-of-the-envelope calculations to get a rough sense of the cost-effectiveness of a grant, nor any direct comparison across grants to calibrate scoring. — GWWC’s 2023 evaluation of AWF

I wish there was a greater focus on cost-effectiveness analyses.

eleanor mcaree @ 2025-08-14T14:09 (+3)

Hey Vasco,

Sorry I was out of the country without internet so only just getting back to this!

Of those eight grants, six received 100% of what they applied for. One organisation was partially funded because we collaborated with the Strategic Animal Funding Circle and they were excited to partially fund this organisation, otherwise we likely would have fully funded their grant. The final organisation we have been funding at the same level for a few years and weren't as excited about what they would do with additional funding compared to the other projects we were considering.

In general, our experience is that organisations that are seeking larger grants tend to apply to be evaluated by ACE, as the Recommended Charity Fund raises more money, so usually disburses larger grants and offers more stable funding. As we raise more funds for Movement Grants we might see orgs applying for larger amounts, as well as proactively encouraging applicants to think about what they would do with more funding.

We have started making improvements in our cost-effectiveness analyses, although we are constrained by a number of factors, namely limited team capacity, the volume of applications, the exploratory nature of applications, so I want to be honest in saying there is still room for improvement here.

Thanks for the questions, and do keep posting or email me if you have any other thoughts on how we can improve our grant making!

Vasco Grilo🔸 @ 2025-08-16T18:57 (+2)

Thanks for clarifying, Eleanor!

Category	Discount
Very low uncertainty	0%
Low uncertainty	0%
Moderate/low uncertainty	20%
Moderate uncertainty	40%
Moderate/high uncertainty	60%
High uncertainty	80%
Very high uncertainty	100%