Mo Putera's Quick takes

By Mo Putera @ 2023-09-26T09:56 (+4)

null

Mo Putera @ 2025-06-19T13:19 (+66)

My favorite midsized grantmaker is Scott Alexander's ACX Grants, mainly because I've enjoyed his blog for over a decade and it's been really nice to see the community that sprang up around his writing grow and flourish, especially the EA stuff. His recent ACX Grants 1-3 Year Updates is a great read in this vein. Some quotes:

The first cohort of ACX Grants was announced in late 2021, the second in early 2024. In 2022, I posted one-year updates for the first cohort. Now, as I start thinking about a third round, I’ve collected one-year updates on the second and three-year updates on the first. ...
The total cost of ACX Grants, both rounds, was about $3 million. Do these outcomes represent a successful use of that amount of money? ...
It’s harder to produce Inside View estimates, because so many of the projects either produce vague deliverables (eg a white paper that might guide future action) or intermediate results only (eg getting a government to pass AI safety regulations is good, but can’t be considered an end result unless those regulations prevent the AI apocalypse). Because we tend towards incubating charities and funding research (rather than last-mile causes like buying bednets), achieved measurable deliverables are thin on the ground. But here are things that ACX grantees have already accomplished:
Improved the living/slaughter conditions of 30 million fish.
Helped create Manifold Markets, a prediction market site with thousands of satisfied users, whose various spinoffs play a central role in the rationalist/EA community.
Helped create thousands of jobs in Rwanda and other developing countries
Passed an instant runoff vote proposition in Seattle.
Saved between a few dozen and a few hundred lives in Nigeria through better obstetric care.
And here are some intermediate deliverables from grantees:
Made Australian government take AI x-risk more seriously (estimated from 50th percentile to 60th percentile outcome)
Gotten the End Kidney Deaths Act (could save >1000 lives and billions of dollars per year) in front of Congress, with decent odds of passing by 2026.
Plausibly saved 2 billion chickens from painful death over next decade².
Antiparasitic medication oxfendazole continues to advance through the clinical trial process.
And here are some things that have not been delivered yet but that I remain especially optimistic about:
Creation of anti-mosquito drones that provide a second level of defense along with bednets.
Revolutionize diagnosis of traumatic brain injury
Improve dietary guidelines in developing countries
Continue to support research and adoption of far UV light for pandemic prevention
Reduce lead poisoning in Nigeria
I think these underestimate success since many projects have yet to pay off (or to convince me to be especially optimistic), and others have paid off in vague hard-to-measure ways.

This is a beautifully crosswise oriented slice of the entire collective endeavor of effective altruism, and quite a lot of good done (or poised to be done) helped by a not-that-large sum of $3M over 2 cohorts given that GW and OP move 2 OOMs more $ per year.

It's also been quite intellectually enriching to just see the sheer diversity of proposals to make the world better in these cohorts; e.g. I was a bit let down to learn that the Far Out Initiative didn't pan out ($50k to fund a team to work on pharmacologic and genetic interventions to imitate the condition of Jo Cameron, a 77-year old Scottish woman who is both incapable of experiencing any physical or psychological suffering and has lived an astonishingly well-adjusted life despite that, by creating painkillers to splice into farm animals to promote cruelty-free meat and "end all suffering in the world forever").

Of Scott's lessons learned, this one stood out to me in light of the recent elitism in EA survey I just took, I think because I was leaning towards the same hope he had:

One disappointing result was that grants to legibly-credentialled people operating in high-status ways usually did better than betting on small scrappy startups (whether companies or nonprofits). For example, Innovate Animal Ag was in many ways overdetermined as a grantee - former Yale grad and Google engineer founder, profiled in NYT, already funded by Open Philanthropy - and they in fact did amazing work. On the other hand, there were a lot of promising ACX community members with interesting ideas who were going to turn them into startups any day now, but who ended up kind of floundering (although this also describes Manifold, one of our standout successes). One thing I still don't understand is that Innovate Animal Ag seemed to genuinely need more funding despite being legibly great and high status - does this screen off a theoretical objection that they don't provide ACX Grants with as much counterfactual impact? Am I really just mad that it would be boring to give too many grants to obviously-good things that even moron could spot as promising?

The other takeaway of his that gave me mixed feelings was this one, I think because I'd been secretly hoping for some form of work-life balance compatibility with really effective (emphasis) direct-work altruism:

Someone (I think it might be Paul Graham) once said that they were always surprised how quickly destined-to-be-successful startup founders responded to emails - sometimes within a single-digit number of minutes regardless of time of day. I used to think of this as mysterious - some sort of psychological trait? Working with these grants has made me think of it as just a straightforward fact of life: some people operate an order of magnitude faster than others. The Manifold team created something like five different novel institutions in the amount of time it's taken some other grantees to figure out a business plan; I particularly remember one time when I needed something, sent out a request to talk about it with two or three different teams, and the Manifold team had fully created the thing and were pestering me to launch a trial version before some of the other people had even gotten back to me. I take no pleasure in reporting this - I sometimes take a week or two to answer emails, and all of the predictions about my personality that this implies would be correct - but it's increasingly something that I look for and respect. A lot of the most successful grants succeeded quickly, or at least were quick to get on a promising track. Since everything takes ten times longer than people expect, only someone who moves ten times faster than people expect can get things done in a reasonable amount of time.

Edited to add: I appreciated this comment by Alex Toussaint, an ACX grantee:

Tornyol (anti-mosquito drones) is based in France and we couldn't have got the support from ACX Grants from a local VC. ...
VCs, like potential employees or clients, have reading grids (i.e. rubrics, a transliteration of « une grille de lecture ») to evaluate pitches. The great thing I found about ACX Grants is that the grid is different, and encourages different kinds of projects. Founder obsession for a problem seems to be encouraged in ACX Grants, although it's clearly discouraged for very early VC funding. VCs like very well made slides, communication abilities, and beautiful people in general, while I've found no such bias for ACX Grants. Being based outside the US is a big minus for American VCs, but ACX Grants almost seems to be favoring it. VCs tend to think a lot by analogy (the Uber for X, the Cursor for Y ...) while I found ACX Grants to be much more thinking from first principles than the median VC I met.
I'm not criticizing the VC reading grid. It obviously comes from experience and it tends to work financially for them. But you have to remember that a large part of the decision comes down to the potential for a quite early (3-4 years) and billion-dollar exit option. Not all projects fit that and it's a good thing to support the other. The other advantage of it is that it selects founders that can go through the hoops of making their project fit the grid. That proves VCs the founders are capable of adapting their message to their interlocutors, which is highly necessary when raising further money, recruiting or discussing with any partner. That's something ACX Grants does not seem to value much.
All in all, ACX Grants is great in that it provides funding with a very unique reading grid, so it helps projects that could get no help anywhere else.

NickLaing @ 2025-06-20T08:38 (+4)

Fantastic summary love it!

Made some comments on the small org vs. Big org thing, was going to reply here but it became a mini essay so put it on my quicktakes lol.

Mo Putera @ 2025-06-20T10:58 (+12)

Thanks Nick, and great take as usual (for others' convenience, here it is)

I myself work at a CE-incubated charity, so I'm of course inclined to agree with you on the reasons you listed as to how CE's approach mitigates the disadvantages smaller orgs and individuals have vs larger ones.

(As a tangent this is also why I have incredible respect for what you've managed to build at OneDay Health, AFAICT you don't have any of those advantages we benefit from! Seriously: since 2017, 53(!) nurse-led health centers launched leading to 340k patients treated, >$600k saved by patients, 165k malaria cases treated, 125k under-5s treated is phenomenal. I wish you gave a talk at EAG on how you and the team did this, lots of lessons for aspiring "moral entrepreneurs" I'm sure. Sorry btw if this makes you feel awkward I've always wanted to express this)

That said I do think Scott is pointing to a slightly different thing than big vs small orgs, which is traditionally impressive credentials and ways of working vs non-traditional credentials or the lack thereof. I took Scott's hope (which I shared) to be that there are a lot more people than we think who are "diamonds in the rough" — they may not have gone to Oxbridge / Ivy etc or have training & experience in medicine / law / consulting / tech / whatever prestigious career, and their ideas for making the world better may not be the usual ideas everyone agrees is "best" but oddball ones that make you go "... huh?", and most talent-spotters filter for these kinds of markers and would exclude them — but Scott (who doesn't have a traditionally-impressive background himself) would see their potential and give them a shot, and the follow-up would hopefully prove him right. He's disappointed that this doesn't seem to be true, which suggests that those traditionally impressive credentials really do give a lot of hard-to-fake signal of your projects panning out. I mean I don't think this is all that surprising, but also this is grist for the mill of the discussion around EA feeling elitist and exclusive to people with more "relatable" or less privileged backgrounds who nevertheless really want to contribute meaningfully to the whole "doing good better" project.

NickLaing @ 2025-06-20T11:11 (+5)

Yeah I think he might be combining/ conflating both the elitism and the bigger org issues actually. Based on "grants to legibly-credentialled people operating in high-status ways usually did better than betting on small scrappy startups" and "there were a lot of promising ACX community members with interesting ideas who were going to turn them into startups any day now, but who ended up kind of floundering".

It makes me sad too, but I do agree on the traditionally impressive credentials front. There are definitely diamonds in the rough but they ain't so easy to find!

Tristan Williams @ 2025-06-21T11:40 (+3)

Love the quick thoughts with quotes, wouldn't have read it otherwise and now glad to have sat through some of the insights :)

Mo Putera @ 2025-11-03T06:21 (+64)

I just learned via Martin Sustrik about the late Sofia Corradi,

the spiritual mother of Erasmus, the European student exchange programme, or, in the words of Umberto Eco, “that thing where a Catalan boy goes to study in Belgium, meets a Flemish girl, falls in love with her, marries her, and starts a European family.”

Sustrik points out that none of the glowing obituaries for her mention the sheer scale of Erasmus. The Fulbright in the US is the 2nd largest comparable program, but it's a very distant second:

So far, approximately sixteen million people have taken part in the exchanges. That amounts to roughly 3% or the European population. And with the ever growing participation rates the ratio is going to get even gradually even higher.
Is short, this thing is HUGE.

Sustrik argues that the Erasmus programme is gargantuan-scale social engineering done right:

Substantial portion of students actually does want to spend some time abroad. It’s no different from the Western European marriage pattern, where young people left their parental homes to work as servants, farmhands, or apprentices before they married and set up their own households.
The much-maligned idea of social engineering, in this case, doesn’t mean forcing people to do something they don’t want to do. It means removing the obstacles that prevent them from doing what they already want.
Before Erasmus, studying abroad was seen as having fun rather than as serious academic work, something to be punished rather than rewarded. Universities were reluctant to recognize studies completed elsewhere. Erasmus, with its credit transfer system, changed that and thus unleashed a massive wave of student exchanges.

The backstory to how Sofia came to focus on Erasmus is touching:

In 1957, in her fourth year of studies, she received the opportunity to study in the United States thanks to a Fulbright scholarship. She spent a year at Columbia University where she attended a master’s course in comparative university legislation.^[3]^[4] Upon her return to Rome in 1958, however, her degree was not recognised by the Italian educational system.^[4]^[5] She recalled how she felt humiliated in front of other students as her time in the US was dismissed as a "vacation", and how a functionary had told her "Columbia, you say? I've never heard of that before".^[5]^[6] She had to spend an extra year to obtain her Italian degree.^[5] The experience led her to the idea of creating a system of recognition of courses taken abroad and the promotion of university exchanges.^[5]^[7]
Such ideas had already been put forward in Italy, but without any concrete results.^[7] After graduating Corradi pursued research on the right to education at the United Nations and became a scientific consultant for the Association of Rectors of Italian Universities at the age of 30.^[5]^[6] It was a post she gained in part to her diploma from Columbia, and she used her position to lobby intensively for her idea of a university exchange programme and mutual recognition. ... (more on Wikipedia)

I've previously wondered what a shortlist of people who've beneficially impacted the world at the scale of ~100 milliBorlaugs might look like, and suggested Melinda & Bill Gates and Tom Frieden. (A "Borlaug" is a unit of impact I made up, it means a billion lives saved.) If you buy Corradi's argument that the Erasmus programme is at heart really a peace programme and that it deserves some credit for the long period of relative peace we've experienced globally post-WWII, then Sofia Corradi seems eminently deserving of inclusion.

Gemini 3 Pro's attempt to visualise Sofia Corradi's beneficial impact in Shapley value terms:

Let's define our terms:
Y-Axis: "Level of European Youth-Driven Integration" (0-100%)
This is not economic integration (like the Euro) or political integration (like the Parliament).
It specifically measures the socio-cultural intermingling, mutual understanding, and reduction of nationalistic stereotypes among young Europeans. This is the "peace program" aspect.
It starts at a low baseline post-WWII, as even with the EEC, borders remained strong culturally.
X-Axis: Time (1950 - 2025)
Key Event 1: Treaty of Rome (1957) - Establishes the EEC. A step towards economic integration, but limited youth movement.
Key Event 2: Erasmus Program Launch (1987) - The crucial inflection point.
Key Event 3: Schengen Agreement (1995) - Eliminates internal border checks. Facilitates Erasmus, but Erasmus already laid the cultural groundwork.
Key Event 4: Euro Adoption (1999/2002) - Further economic integration, making cross-border life easier.
Now, let's plot two scenarios:
"Actual Timeline: With Corradi & Erasmus" (Solid Blue Line): Represents the observed trajectory of youth integration.
"Counterfactual Timeline: Without Corradi & Delayed Erasmus" (Dashed Red Line): This is where we attribute Corradi's Shapley value.
Delayed Launch: As argued, without her tireless 30-year lobbying, a pan-European student exchange might have emerged, but likely much later (e.g., 2002).
Fragmented Design: Even if it launched, it would likely have been a collection of bilateral agreements, lacking the standardized credit transfer (her key design contribution). This means a slower, less efficient ramp-up of integration.
The "Area Between the Curves" will visually represent Sofia Corradi's Shapley Value, showing the accelerated and enhanced integration due to her efforts.

Erich_Grunewald 🔸 @ 2025-11-07T00:52 (+4)

Thanks for sharing this. I did an Erasmus exchange year in Italy in 2010-11 that was very important for my personal growth, although it was not particularly beneficial professionally or academically.

Yarrow Bouchard🔸 @ 2025-11-04T16:39 (+3)

Quite interesting!

Mo Putera @ 2025-03-18T12:12 (+46)

Counting people is hard. Here are some readings I've come across recently on this, collected in one place for my own edification:

Oliver Kim's How Much Should We Trust Developing Country GDP? is full of sobering quotes. Here's one: "Hollowed out by years of state neglect, African statistical agencies are now often unable to conduct basic survey and sampling work... [e.g.] population figures [are] extrapolated from censuses that are decades-old". The GDP anecdotes are even more heartbreaking
Have we vastly underestimated the total number of people on Earth? Quote: "Josias Láng-Ritter and his colleagues at Aalto University, Finland, were working to understand the extent to which dam construction projects caused people to be resettled, but while estimating populations, they kept getting vastly different numbers to official statistics. To investigate, they used data on 307 dam projects in 35 countries, including China, Brazil, Australia and Poland, all completed between 1980 and 2010, taking the number of people reported as resettled in each case as the population in that area prior to displacement. They then cross-checked these numbers against five major population datasets that break down areas into a grid of squares and estimate the number of people living in each square to arrive at totals... According to their analysis, the most accurate estimates undercounted the real number of people by 53 per cent on average, while the worst was 84 per cent out."
David Nash's Nigeria's Missing 50 Million People argues that (quote) "Nigeria's official population (~220-230 million) may be significantly inflated and could be closer to 170-180 million (another article claims 120 million) likely driven by political and financial incentives for states". The comments are insightful too, e.g. David's comment that Uganda and Burkina Faso have the opposite problem ("in Burkina Faso the issue was that GDP per capita numbers were calculated from industrial output divided by population estimates so in order to look good, local government had an incentive to underestimate population so they seemed richer"), and Sjlver's comment comparing AMF's population data from distributing bednets to every household to UNFPA data; I've copied their table below:

jablevine @ 2025-03-21T17:47 (+5)

Good links. My favorite example is Papua New Guinea, which doubled their population estimate after a UN Population Fund review. Chapter 1 of Fernand Braudel's The Structures of Everyday Life is a good overview of the problem in historical perspective.

Mo Putera @ 2025-03-22T01:13 (+3)

Wow, that's nuts. Thanks for the pointer.

Mo Putera @ 2024-08-02T13:24 (+46)

Striking paper by Anant Sudarshan and Eyal Frank (via Dylan Matthews at Vox Future Perfect) on the importance of vultures as a keystone species.

To quote the paper and newsletter — the basic story is that vultures are extraordinarily efficient scavengers, eating nearly all of a carcass less than an hour after finding it, and farmers in India historically relied on them to quickly remove livestock carcasses, so they functioned as a natural sanitation system in helping to control diseases that could otherwise be spread through the carcasses they consume. In 1994, farmers began using diclofenac to treat their livestock, due to the expiry of a patent long held by Novartis leading to the entry of cheap generic brands made by Indian companies. Diclofenac is a common painkiller, harmless to humans, but vultures develop kidney failure and die within weeks of digesting carrion with even small residues of it. Unfortunately this only came to light via research published a decade later in 2004, by which time the number of Indian vultures in the wild had tragically plummeted from tens of millions to just a few thousands today, the fastest for a bird species in recorded history and the largest in magnitude since the extinction of the passenger pigeon.

When the vultures died out, far more dead animals lay around rotting, transmitting pathogens to other scavengers like dogs and rats and entering the water supply. Dogs and rats are less efficient than vultures at fully eliminating flesh from carcasses, leading to a higher incidence of human contact with infected remains, and they're also more likely to transmit diseases like anthrax and rabies to people. Sudarshan and Frank estimate that this led to ~100,000(!) additional deaths each year from 2000-05 due to a +4.2%(!) increase in all-cause mortality among the 430 million people living in districts that once had a lot of vultures, which is staggering; this is e.g. more than the death toll in 2001 from HIV/AIDS (92,000), malaria (53,000), and alcohol use disorders (14,000).

(Cause X, anyone? Preventing a hundred thousand deaths a year for less than half a billion dollars annually clears the GiveWell top charity-level threshold, and half a billion is in the ballpark of Open Philanthropy's entire annual grantmaking...)

So what to do? For vultures in particular, Sudarshan and Frank say their results "inform current vulture recovery efforts in India, and conservation efforts elsewhere" e.g. parts of Africa and Spain, albeit without elaborating. More broadly, they hope their paper informs better policymaking by providing "a particularly stark example of the type of hard-to-reverse and unpredictable costs that must be accounted for when evaluating the introduction of new chemicals into fragile and diverse ecosystems", stating "it is plausible that a counterfactual policy regime in India that tested chemicals for their toxicity to at least keystone species might have avoided the collapse of vultures". They conclude:

In the absence of empirical estimates of the social benefits conferred by different species, conservation policy may be heavily influenced by existence values unrelated to utility. The vulture is not a particularly attractive bird and evokes rather different emotions at first sight than do more charismatic poster-animals of wildlife conservation such as tigers and panda bears. Nevertheless
our results suggest that subjective existence values alone may not be the best way to formulate conservation policy.

The remark that vultures are not particularly attractive reminds me of the overlooked plight of farmed chickens, shrimp, insects etc for not being charismatic fauna. (I am admittedly sort of emotionally conflating the welfare of vultures with their ecosystem importance as a keystone species here.)

Mo Putera @ 2025-04-15T16:19 (+43)

I just learned about Zipline, the world's largest autonomous drone delivery system, from YouTube tech reviewer Marques Brownlee's recent video, so I was surprised to see Zipline pop up in a GiveWell grant writeup of all places. I admittedly had the intuition that if you're optimising for cost-effectiveness as hard as GW do, and that your prior is as skeptical as theirs is, then the "coolness factor" would've been stripped clean off whatever interventions pass the bar, and Brownlee's demo both blew my mind with its coolness (he placed an order on mobile for a power bank and it arrived by air in thirty seconds flat, yeesh) and also seemed the complete opposite of cost-effective (caveating that I know nothing about drone delivery economics). Quoting their "in a nutshell" section:

In December 2024, GiveWell recommended a $54,620 grant to Zipline for a six-month scoping project. Zipline will use this time to review ways that they could use drones to increase vaccination uptake, especially in hard-to-reach areas with low vaccination coverage and high rates of vaccine-preventable diseases. ...
We recommended this grant because:
Drones are an intuitive way to bring vaccines closer to communities with low coverage rates, especially when demand for vaccination exists, but conditions like difficult terrain, poor infrastructure, weak cold chain, or insecurity make it difficult for families to access immunizations.
This grant aligns with our strategy of making several scoping grants to high-potential organizations to source promising ideas for solving bottlenecks in the routine immunization system, and then testing these concepts.

Okay, but what about cost-effectiveness? Their "main reservations" section says

Evidence on cost-effectiveness of drones for vaccine delivery is limited, and we have not modeled the cost-effectiveness of the types of programs that Zipline plans to consider, nor the value of information for this scoping grant.
An internal review conducted in 2023 focused on drones for health was generally skeptical about there being many opportunities in this area that would meet GiveWell’s bar, although this scoping grant will focus on the most promising scenario (remote areas with high rates of vaccine-preventable diseases and low vaccination coverage rates).

Is there any evidence of cost-effectiveness at all then? According to Zipline, yes — e.g. quoting the abstract from their own 2025 modelling study:

Objectives: In mid-2020, the Ghana Health Service introduced Zipline’s aerial logistics (centralized storage and delivery by drones) in the Western North Region to enhance health supply chain resilience. This intervention led to improved vaccination coverage in high-utilization districts. This study assessed the cost-effectiveness of aerial logistics as an intervention to improve immunization coverage.
Methods: An attack rate model, adjusted for vaccination coverage and vaccine efficacy, was used to estimate disease incidence among vaccinated and unvaccinated populations, focusing on 17 022 infants. Incremental cost-effectiveness ratios of US dollar per averted disability-adjusted life-year (DALY) were evaluated from societal and government perspectives, using real-world operations data. ...
Results: In 2021, aerial logistics averted 688 disease cases. Incremental cost-effectiveness ratios were $41 and $58 per averted DALY from the societal and government perspectives, respectively. The intervention was cost-saving when at least 20% of vaccines delivered by aerial logistics replaced those that would have been delivered by ground transportation, with potential government savings of up to $250 per averted DALY. Probabilistic sensitivity analysis confirmed the robustness of these findings.

That's super cost-effective. For context, the standard willingness-to-pay to avert a DALY is 1x per capita GDP or $2,100 in Ghana, so 35-50x higher. Also:

... we calculated that aerial logistics facilitated the completion of an additional 14 979 full immunization courses... We estimated that 4 children’s lives (95% CI 2–7) were saved in these districts during 2021. ... the intervention averted a total of $20 324 in treatment costs and $2819 for caregivers between lost wages and transport.
At a cost of $0.66 per incremental FIC (fully immunized child), this approach outperforms other delivery methods analyzed in the review, including the most cost-effective category of interventions identified, namely “Delivery Approach” interventions, such as monthly immunization by mobile teams in villages and the enhancement of satellite clinic immunization practices.

(GW notes that they'd given Zipline's study a look and "were unable to quickly assess how key parameters like program costs and the impact of the program on vaccination uptake and disease were being estimated". Neither can I. Still pretty exciting)

NickLaing @ 2025-04-15T18:25 (+8)

Zipline have been around for about 10 years I think - boy do they have the cool factor. One big issue is that they can only carry as really tiny amount of stuff. Also the places where they can potentially save money have to be super hard to access, because a dirt cheap motorcycle which can go 50km for a dollar of fuel can carry 50x as much weight.

My lukewarm take is that hey have done well, but as with most things haven't quite lived up to their initial hype.

Mo Putera @ 2025-04-17T04:24 (+6)

Interesting, I got the opposite impression from their about page ("4,000+ hospitals and health centers served, 51% fewer deaths from postpartum hemorrhaging in hospitals Zipline serves, 96% of providers report increased access to vaccinations in their area" which I assume means they're already targeting those hard-to-access areas), but of course they'd want to paint themselves in a good light and I'd be inclined to trust your in the field experience far more (plus general skepticism just being a sensible starting point).

Actually your point about a cheap bike being able to carry a lot more stuff makes obvious sense, and so me wonder how Zipline's modelling study in Ghana can claim that their cost per incremental fully immunised child was cheaper than "monthly immunization by mobile teams" which I assume includes dirt bikes.

NickLaing @ 2025-04-17T10:30 (+4)

Don't be inclined to trust my in-the-field experience, Zipline has plenty of that too!

I just had a read of their study but couldn't see how they calculated costing (the most important thing).

One thing to note is that vaccine supply chains currently often unnecessarily use trucks and cars rather than motorcycles because, well, GAVI has funded them so they may well be fairly comparing to status quo rather than other more efficient methods. For the life of me I don't know why so many NGOs use cars for si many things that public transport and motorcycles could do sometimes orders of magnitude cheaper. Comparing to status quo is a fair enough thing to do (probably what I would do) but might not be investigating the most cost effective way of doing things.

Also I doubt they are including R and D and the real drone costs in the costs in of that study, but I'll try and dig and get more detail.

It annoys me that most modeling studies focus so hard on their math method, rather than explaining now about how they estimate their cost input data - which is really what defines the model itself.

Mo Putera @ 2025-04-18T04:32 (+2)

The modelling study has a "costs" section (quoted below), but for what it's worth GiveWell said they "were unable to quickly assess how key parameters like program costs... were being estimated" so I don't think this quote will satisfy you:

Given the Ghana Health Service (GHS)'s dominant role, the government perspective in this analysis included healthcare treatment costs and incremental last mile delivery (LMD) costs. The societal perspective also accounted for externalities such as caregivers’ wage loss and transport costs.
To calculate the total cost for aerial LMD of vaccines, we analyzed Zipline’s monthly operational costs and the depreciation of capital expenditures for the GH4 distribution center in the Western North Region. These were adjusted to 2023 US dollar values, and the corresponding portion attributed to vaccine delivery was determined, resulting in a cost per dose of $0.27.
To estimate the incremental cost of the intervention, we took into account that the impact of aerial logistics on vaccination rates can be explained through either a pure expansion of access (ie, health facilities receiving vaccine doses that they otherwise would not have) or more efficient access (ie, health facilities receiving the same number of vaccine doses they would have otherwise received but in a more timely manner, leading to fewer missed opportunities of vaccination). Anecdotal evidence suggests that the impact is likely a combination of both factors. The distinction is significant when computing costs in an ICER: in the former, aerial logistics LMD cost is an additional expense to the existing supply chain cost for the government, whereas, in the latter, aerial logistics LMD replaces the traditional supply chain cost for transporting those vaccines. ...
Due to the absence of detailed data on traditional LMD, we were unable to differentiate between incremental and replaced doses within the number of doses delivered with aerial logistics during the intervention period. To mitigate the impact of this uncertainty on our estimations, for our primary ICER calculation, we proceeded with the conservative assumption that all doses delivered by aerial logistics during this period were incremental. This approach may inflate our incremental cost estimates but ensures the solidity of our findings amid the well-known ambiguous quality and high variance of the traditional LMD data that were used for illustrative purposes in the sensitivity analysis.

But no input numbers, just methods and a dash of conservatism.

I share your annoyance re: modelling studies. Garbage in garbage out as they say (not accusing Zipline of putting garbage data into their model of course!)

Re: NGOs using trucks and cars unnecessarily, I'm just speculating here but I wonder if it's got a bit to do with the NGOs wanting to attract "top talent" (salary difference being the main attractor but also "you get to ride in a car instead of on a bike" being implicitly part of the "comp package", sort of like how top talent in higher-income countries are lured to prestigious industries by not just pay but "comped stays in nice hotels" or whatever). This paper I read awhile back made me think of that: The unintended consequences of NGO-provided aid on government services in Uganda. It argues that NGOs sometimes "poach" scarce local skilled government workers via higher pay, resulting in various adverse effects, although I guess it's a bit different in this case because the adverse effects happen as a result of the pay structure (NGO workers who would've otherwise distributed health products instead sell household products like soap and fortified oil because they get paid on a per-piece basis).

Sam Anschell @ 2025-04-16T01:57 (+7)

Can confirm; Zipline is ridiculously cool. I saw their P1 Drones in action in Ghana and met some of their staff at EA conferences. Imo, Zipline is one of the most important organizations around for last-mile delivery infrastructure. They're a key partner for GAVI, and they save lives unbelievably cost-effectively by transporting commodities like snakebite antivenom within minutes to people who need it.

Their staff and operations are among the most high-performing of any organization I've ever seen. Here are some pics from a visit I took to their bay area office in October 2024. I'd highly recommend this Mark Rober video, and checking out Zipline's website. If any software engineers are interested in high-impact work, I would encourage you to apply to Zipline!

Mo Putera @ 2025-04-17T04:35 (+4)

Thanks for the links! And for the pics, makes me feel like I'm glimpsing the future, but it's already here, just unevenly distributed. Everything you say jives with both what GiveWell said about Zipline in their grant writeup

Though this is our first engagement with Zipline, there are some early signals that we might be well-aligned as partners. Zipline’s proposal specifically calls out a few elements that are important to GiveWell: a) emphasis on cost-effectiveness , b) plans to establish an M&E framework early on for any potential pilots, and c) interest in evaluation and learning

as well as the vibe I get from their about page, stuff like

Zipline is on a mission to build the world’s first logistics system that serves all people equally. ...
Our customers rely on Zipline to save lives, reduce emissions, increase economic opportunity, and provide new logistics services at scale. ...

and 3 out of their 4 most prominent "output statistic" claims being health-oriented

51% fewer deaths from postpartum hemorrhaging in hospitals Zipline serves
4,000+ hospitals and health centers served by Zipline
96% of providers report increased access to vaccinations in their area

Yeah the pointer to snakebite antivenom delivery feels useful, you reminded me of how big a problem it is.

NickLaing @ 2025-04-16T12:21 (+4)

Yep Snakebite is one of the few slamdunk usecases for me here. Until we design a cheap, heat stable antivenom I think drones that can get there in under an hour might be the best option in quite a wide range of places.

Angelina Li @ 2025-04-15T17:39 (+4)

Nice! I've been enjoying your quick takes / analyses, and find your writing style clear/easy to follow. Thanks Mo! (I think this could have been a great top level post FWIW, but to each their own :) )

Mo Putera @ 2025-04-17T04:13 (+2)

That's really kind of you Angelina :) I think top-level posting makes me feel like I need to put in a lot of work to pass some imagined quality bar, while quick takes feel more "free and easy"? Also I hesitate to call any of my takes "analyses", they're more like "here's a surprising thing I just learned, what do y'all think?"

Mo Putera @ 2025-06-22T14:39 (+42)

Ivan Gayton was formerly mission head at Doctors Without Borders. His interview (60 mins, transcript here) with Elizabeth van Nostrand is full of eye-opening anecdotes, no single one is representative of the whole interview so it's worth listening to / reading it all. Here's one, on the sheer level of poverty and how giving workers higher wages (even if just $1/day vs the local market rate of $0.25/day "for nine hours on the business end of a shovel") distorted the local economy to the point of completely messing up society:

[00:06:07] Ivan: I had a real moment when I had this construction crew that was rebuilding a wing of the hospital and there were 30 people on this construction crew. And at some point, my boss, the project coordinator says to me, "Ivan, why are you just so obsessed with the construction crew always working? Constantly working and, and you know, never lacking for something to do." And I'm like, "well, because you know, you have a whole crew of 30 people, it's terribly expensive if they're doing nothing, I mean, if they sit there and do nothing all day, that costs, oh wait, $30, huh? Maybe I'll just relax about that."
'Cause you know, my last gig I'd been a forestry project manager, crew of 75 people who cost $450 a day each. So if they, you know, lose an hour of productivity, that's like huge money. A day of productivity is unthinkable.

(Aside: I'm having trouble believing the $450/person-day cost for forestry crew back in the late 90s and early 00s, isn't that $90k/year or $155k today?)

[00:06:56] Ivan: So I bring that to this, you know, African construction crew and the construction crew themselves are kind of exhausted. Like, good lord, this guy's nuts. but that realization that... 30 people on the business end of sledgehammers and shovels and travels cost way less than one hour of my time for an entire day. Wow. That was shocking.
And we were paying more than the local market rate for unskilled labor. I mean, at that time, this is 2003, the, the local market rate for nine hours on the business end of a shovel was a shiny new quarter, 25 cents. We were paying a dollar. So we had this huge lineup of people to work. I kind of rotated through all the villagers, to give as many people as possible a chance for the real unskilled labor. I think the head construction crew guy was getting two bucks a day.
[00:07:52] Elizabeth: yeah, so maybe let's get into the economics of this. On one hand, it seems very generous to pay people four times their normal wage, and it's, you know, a trivial cost to MSF. On the other hand, that does distort the local economy.
[00:08:07] Ivan: distort is putting it mildly. It just completely messes up the local society. I mentioned that I had done this back in the envelope calculation that we were 75% of the local economy. I mean, what that actually means is we destroyed and distorted the local economy completely; as development practice that would've been utterly and completely unethical.
The only justification for doing something like that is an acute emergency, which it was, it was nigh on a hundred thousand people with literally no access to healthcare whatsoever. The amount of avoidable suffering and death that was going on that we could actually alleviate was something that, you know, in sort of humanitarian practice, I guess we arrogate to ourselves the idea that we can, in a sufficiently emergency situation, justify doing things that would be unethical development practice.
[00:09:06] Elizabeth: Do you think the village was worse off for having the hospital located in their village?
[00:09:11] Ivan: oh yeah. Because we obviously brought this flood of money in, but where does the money go? The doctors and nurses, they're not even local. They're from the capital city. So you're bringing in people from the capitol who then lord it over the local people, price of food jumps up, price of accommodation goes insane. The trickle down opportunities are to be sex workers and cleaners and, you know, servants for these, for these newly created royalty.
[00:09:46] Elizabeth: you might hope that if the price of food goes up, but their wages are also going up because they're working for the hospital or tangentially, then that would compensate?
[00:09:54] Ivan: Well yeah. For the people who are already, you know, have access to the labor market and are already able to sort of get in on that. Sure. I mentioned that I actually, I deliberately kind of rotated through the villagers to give lots of people a chance, but still, if you're not one of the people who gets a chance or even ever had a chance, or was somebody who's, you know, on the outs with the local powerful people, then we, as these foreigners providing these jobs, we never even see those people.
They don't even get to apply for a job with us. We never even know of their existence. So those people, now, the price of everything is jumped. There's a bunch of newly, much more wealthy people around them, and they're excluded from that. They don't see any of the benefit and all of the harm. So it's, it's terrible.

NickLaing @ 2025-06-23T06:03 (+13)

Yeah this rhymes with everything I've seen. I have a deeply unpopular opinion built on years of experience that NGOs generally pay people way too much. Wrote about it (quite poorly) a whopping 8 years ago!

https://ugandapanda.com/2017/04/17/ngos-part-1-pay-your-workers-less/

The thing that makes me doubt my opinion is that I'm yet to find a local Ugandan who publicly agrees with me, and most privately disagree with me too. "More money coming in is better" seems to be the common sense line, despite the inflation (my town Gulu is the most expensive in the country), distorted education system and dragging the best people to less important jobs.

Its not only NGOs, but also means good business ideas can fail because of high salary bills, when they could have worked and grown to employ hundreds/thousands more if the foreigner just paid market rates not 3x....

I think better to just give people money give directly style rather than pay more. It doesn't distort the economy much.

It's a really tricky one emotionally and intellectually, and I find it very difficult to manage when I'm the one with the power to pay more

Buck @ 2025-06-25T23:28 (+3)

This was great, thanks for the link!

Mo Putera @ 2025-01-21T05:12 (+42)

I just learned that Trump signed an executive order last night withdrawing the US from the WHO; this is his second attempt to do so.

WHO thankfully weren't caught totally unprepared. Politico reports that last year they "launched an investment round seeking some $7 billion “to mobilize predictable and flexible resources from a broader base of donors” for the WHO’s core work between 2025 and 2028. As of late last year, the WHO said it had received commitments for at least half that amount".

Full text of the executive order below:

WITHDRAWING THE UNITED STATES FROM THE WORLD HEALTH ORGANIZATION
By the authority vested in me as President by the Constitution and the laws of the United States of America, it is hereby ordered:
Section 1. Purpose. The United States noticed its withdrawal from the World Health Organization (WHO) in 2020 due to the organization’s mishandling of the COVID-19 pandemic that arose out of Wuhan, China, and other global health crises, its failure to adopt urgently needed reforms, and its inability to demonstrate independence from the inappropriate political influence of WHO member states. In addition, the WHO continues to demand unfairly onerous payments from the United States, far out of proportion with other countries’ assessed payments. China, with a population of 1.4 billion, has 300 percent of the population of the United States, yet contributes nearly 90 percent less to the WHO.
Sec. 2. Actions. (a) The United States intends to withdraw from the WHO. The Presidential Letter to the Secretary-General of the United Nations signed on January 20, 2021, that retracted the United States’ July 6, 2020, notification of withdrawal is revoked.
(b) Executive Order 13987 of January 25, 2021 (Organizing and Mobilizing the United States Government to Provide a Unified and Effective Response to Combat COVID–19 and to Provide United States Leadership on Global Health and Security), is revoked.
(c) The Assistant to the President for National Security Affairs shall establish directorates and coordinating mechanisms within the National Security Council apparatus as he deems necessary and appropriate to safeguard public health and fortify biosecurity.
(d) The Secretary of State and the Director of the Office of Management and Budget shall take appropriate measures, with all practicable speed, to:
(i)    pause the future transfer of any United States Government funds, support, or resources to the WHO;
(ii)   recall and reassign United States Government personnel or contractors working in any capacity with the WHO; and
(iii) identify credible and transparent United States and international partners to assume necessary activities previously undertaken by the WHO.
(e) The Director of the White House Office of Pandemic Preparedness and Response Policy shall review, rescind, and replace the 2024 U.S. Global Health Security Strategy as soon as practicable.
Sec. 3. Notification. The Secretary of State shall immediately inform the Secretary-General of the United Nations, any other applicable depositary, and the leadership of the WHO of the withdrawal.
Sec. 4. Global System Negotiations. While withdrawal is in progress, the Secretary of State will cease negotiations on the WHO Pandemic Agreement and the amendments to the International Health Regulations, and actions taken to effectuate such agreement and amendments will have no binding force on the United States.
Sec. 5. General Provisions. (a) Nothing in this order shall be construed to impair or otherwise affect:
(i)   the authority granted by law to an executive department or agency, or the head thereof; or
(ii) the functions of the Director of the Office of Management and Budget relating to budgetary, administrative, or legislative proposals.
(b) This order shall be implemented consistent with applicable law and subject to the availability of appropriations.
(c) This order is not intended to, and does not, create any right or benefit, substantive or procedural, enforceable at law or in equity by any party against the United States, its departments, agencies, or entities, its officers, employees, or agents, or any other person.
THE WHITE HOUSE,
    January 20, 2025.

huw @ 2025-01-21T22:17 (+11)

Someone noted that at the rate of US GHD spending, this would cost ~12,000 counterfactual lives. A tremendous tragedy.

Larks @ 2025-01-23T20:23 (+5)

Is WHO cost-effectiveness similar to US GHD spending?

Mo Putera @ 2025-01-22T06:56 (+2)

That's heartbreaking. Thanks for the pointer.

Mo Putera @ 2025-04-10T08:31 (+41)

What's the most cost-effective economic growth-boosting intervention? It's cat mascots. I just learned about Tama the calico cat (via @thatgoodnewsgirl on Instagram), who "gained fame for being a railway station master and operating officer at Kishi Station on the Kishigawa Line in Kinokawa, Wakayama Prefecture, Japan".

An orange, black and white cat, wearing a collar and a small hat, sat on a chair decorated like a strawberry.

The career section of her Wikipedia page astounded me:

Tama was born in Kinokawa, Wakayama, and was raised with a group of stray cats that used to live close to Kishi Station. They were regularly fed by passengers and by Toshiko Koyama, the informal station manager at the time.
The station was near closure in 2004 because of financial problems on the rail line. Around this time, Koyama adopted Tama. Eventually the decision to close the station was withdrawn after the citizens demanded that it stay open.^[3] In April 2006, the newly formed Wakayama Electric Railway destaffed all stations on the Kishigawa Line to cut costs, and at the same time evicted the stray cats from their shelter to make way for new roads leading to the stations. Koyama pleaded with Mitsunobu Kojima, president of Wakayama Electric Railway, to allow the cats to live inside Kishi Station; Kojima, seeing Tama as a maneki-neko (beckoning cat), agreed to the request.^[4] ...
In lieu of an annual salary, the railway provided Tama with a year's worth of cat food and a gold name tag for her collar stating her name and position. A station master's hat was specially designed and made to fit Tama, and took more than six months to complete.^[6] In July 2008, a summer hat was also issued to Tama for hotter weather.^[7] Tama's original gold name tag was stolen by a visitor on October 10, 2007, but a replica was quickly made to replace it.^[8]
The publicity from Tama's appointment led to an increase in passengers by 17% for that month as compared to January 2006; ridership statistics for March 2007 showed a 10% increase over the previous financial year. A study estimated that the publicity surrounding Tama has contributed 1.1 billion yen to the local economy.^[9]

That's actually lowballing Tama. If you click on footnote 9, it's ¥1.38 billion ($13.1 million at the exchange rate back then):

With 55,000 more people having used the Kishigawa Line than would normally be expected, Tama is being credited with a contribution to the local economy calculated to have reached as much as 1.1 billion yen (10.44 million dollars) in 2007 alone, according to a study announced last week.
Katsuhiro Miyamoto, a professor at Kansai University's School of Accountancy, said picture books and other merchandise featuring the feline stationmaster also produced significant economic effects. A television appearance and other publicity surrounding Tama -- who receives cat food in lieu of a salary -- was worth 280 million yen, according to Miyamoto.

In contrast, Tama was really cheap. Gemini 2.5 Pro helpfully estimated the following cost breakdown, which adds up to a median of ¥110,000:

A year's supply of food probably cost ¥30,000 - ¥60,000
Vet care could range from ¥30,000 - ¥70,000 annually to cover everything from routine check-ups to vaccinations, flea/tick prevention, and potential unexpected health issues
Litter, bedding, toys, and grooming supplies probably cost ¥10,000 - ¥20,000
Her promotion to "super station master" (superintendent equivalent) came with an office (a converted ticket booth containing a litter box) estimated at ¥50,000 - ¥110,000, amortised over Tama's nearly decade-long career
Tama's two custom-made hats (summer and other seasons) weren't cheap, probably ¥10,000 - ¥30,000 each, but worth including in the compensation package to retain top talent
In contrast her two gold-plated name tags (the first was stolen by a visitor) likely weren't as expensive, ¥2,000 - ¥5,000

This works out to almost exactly 10,000:1 benefit-cost ratio just from increased ridership, and that's excluding merchandise, TV appearance and other publicity etc — well beyond even Open Phil's higher-than-ever 2,100x bar!

In fact, a few hundred thousand Tamas would already double global economic growth, and the world's Tama-carrying capacity is probably 3-4 OOMs higher at least (the World Population Review's estimate of over a billion cats worldwide today should be interpreted as a lower bound). Talk about feline-powered explosive economic growth potential...

Tama's office

Joseph @ 2025-04-10T14:53 (+11)

A few thoughts. First, it is a really cute story, and I'm glad you shared it. It feels very Japanese.

Second, marketing and tourism aren't often considered as major areas for economic development and growth (at least not in the popular press books I've read or the EA circles I've been in), but this is a simple little case study to demonstrate that having a mascot (or anything else that people like, from fancy buildings to locations tied to things people like) can drive economic activity. But it is also hard to predict in advance what will be a hit. I bet that lots of places have beautiful murals, cute animals, historical importance, lovely scenery, and similar attractions without having much of a positive return on investment. For me, the notable think about Tama's story is how little money was needed to add something special to the local station. A lot of investments related to tourism are far more expensive.

A final thought, one that maybe folks more versed in economics can help me with. Should we consider this an example of economic growth? Is this just shifting spending/consumption from one place to another? Would people who spent money to ride this train otherwise would have spent that money doing something else: riding a different train, visiting a park, etc.

Benjamin M. @ 2025-04-12T19:33 (+5)

Cats' economic growth potential likely has a heavy-tailed distribution, because how else would cats knock things off shelves with their tail. As such, Open Philanthropy needs to be aware that some cats, like Tama, make much better mascots than other cats. One option would be to follow a hits-based strategy: give a bunch of areas cat mascots, and see which ones do the best. However, given the presence of animal welfare in the EA movement, hitting cats is likely to attract controversy. A better strategy would be to identify cats that already have proven economic growth potential and relocate them to areas most in need of economic growth. Tama makes up 0.00000255995% of Japan's nominal GDP (or something thereabouts, I'm assuming all Tama-related benefits to GDP occurred in the year 2020). If these benefits had occurred in North Korea, they would be 0.00086320506% of nominal GDP or thereabouts. North Korea is also poorer, so adding more money to its economy goes further. Japan and North Korea are near each other, so transporting Tama to North Korea would be extremely cheap. Assuming Tama's benefits are the same each year and are independent of location (which seems reasonable, I asked ChatGPT for an image of Tama in North Korea and it is still cute), catnapping Tama would be highly effective. One concern is that there might be downside risk, because people morally disapprove of kidnapping cats. On the other hand, people expressing moral disapproval of kidnapping cats are probably more likely to respect animal's boundaries by not eating meat, thus making this an intervention that spans cause areas. In conclusion: EA is solved, all we have to do is kidnap some cats.

Toby Tremlett🔹 @ 2025-04-10T09:20 (+5)

I love that there is a disagree react: "hmm... no, seems like the most cost-effective economic growth boosting intervention is not in fact cat mascots"

NickLaing @ 2025-04-13T15:42 (+4)

I would imagine there are some replicability issues...

Love the post 🤩

Mo Putera @ 2025-04-02T05:15 (+34)

I spent most of my early career as a data analyst in industry, which engendered in me a deep wariness of quantitative data sources and plumbing, and a neverending discomfort at how often others tended to just take them as given for input into consequential decision-making, even if at an intellectual level I knew their constraints and other priorities justified it and they were doing the best they could. ...and then I moved to global health applied research and realised that the data trustworthiness situation was so much worse I had to recalibrate a lot of expectations / intuitions.

In that regard I appreciate GiveWell's new guidance on burden note:

Disease burden estimates, such as child mortality rates, are a key input in our cost-effectiveness analyses. Historically, for consistency and convenience, we've primarily relied on a single source for these estimates.
Going forward, we plan to consider multiple sources for burden estimates, apply a higher level of scrutiny to these estimates, and adjust for potential biases or inaccuracies, like we do when estimating other parameters in our models.
This change has already led to us making over $25m in additional grants we would not have otherwise. (Footnote: Our updated estimates of malaria burden in Chad have led us to allocate $3.3 million in grantmaking for seasonal malaria chemoprevention (more), and $25.9m for insecticide-treated nets (not yet published).) We expect to consider additional research to improve estimates of burden of disease in the future.

The rest of the note was cathartic to skim-read. For instance, when I looked into the idea of distributing low-cost glasses to correct presbyopia in low-income countries awhile back (a problem that afflicts over 1.8 billion people globally with >$50 billion in annual lost potential productivity annually in LMICs alone), the industry data analyst in me was dismayed to learn that the WHO didn't even collect data on how many people needed glasses prior to 2008, so governments and associated stakeholders understandably prioritised allocation of resources towards surgical and medical interventions instead. I think the existence of orgs like IHME and OWID greatly improve the GHD data situation nowadays, but there are many "pockets" where it remains a far cry from what it could be, so I appreciated that GiveWell said they're considering

Fund data collection. This includes potentially funding additional nationally representative surveys (DHS/MIS/MICS) or additional modules to these surveys, or supporting more autopsy data collection to better understand cause-specific mortality, particularly for malaria in sub-Saharan Africa. Our guess is that part of the reason different models disagree is that the data underlying these models is limited. We may look for cases where we could fund additional data collection to improve burden of disease estimates.

Another example: a fair bit of my earlier analyst work involved either reconciling discrepant figures for ostensibly similar metrics (e.g. campaign revenue breakdowns etc) or root-cause analysing-via-data-plumbing whether a flagged metric needed to be acted on or was a false positive, which made me appreciate this section:

Key uncertainties: ...
There are likely technical nuances we haven't captured. We've found that comparisons between sources are more complex than they first appear. For example, we recently learned that IGME and IHME define diarrheal diseases differently. Similar technical differences likely exist elsewhere.
Possible next steps:
Get a better understanding of what’s driving differences in models. This may come from bringing together modeling groups in regions with high disagreement to understand methodological differences.
Look for ways to improve model transparency. We’ve found it difficult to engage with burden of disease models, and think that finding ways to see inside the black box of how they produce estimates may make it easier to understand which estimates to rely on and how to improve them.

NickLaing @ 2025-04-03T07:02 (+4)

This is fantastic to hear! The Global burden of disease process (while the best and most reputable we have) is surprisingly opaque and hard to follow in many cases. I haven't been able to find the spreadsheets with their calculations.

Their numbers are usually reasonable but bewildering in some cases and obviously wrong in others. GiveWell moving towards combining GBD with other sensible models is a great way forward.

Its a bit unfortunate that the best burden of disease models we have aren't more understandable.

Mo Putera @ 2025-12-02T07:19 (+28)

I admire influential orgs that publicly change their mind due to external feedback, and GiveWell is as usual exemplary of this (see also their grant "lookbacks"). From their recently published Progress on Issues We Identified During Top Charities Red Teaming, here's how external feedback changed their bottomline grantmaking:

In 2023, we conducted “red teaming” to critically examine our four top charities. We found several issues: 4 mistakes and 10 areas requiring more work. We thought these could significantly affect our 2024 grants: $5m-$40m in grants we wouldn’t have made otherwise and $5m-$40m less in grants we would have made otherwise (out of ~$325m total).
This report looks back at how addressing these issues changed our actual grantmaking decisions in 2024. Our rough estimate is that red teaming led to ~$37m in grants we wouldn't have made otherwise and prevented ~$20m in grants we would have made otherwise, out of ~$340m total grants. The biggest driver was incorporating multiple sources for disease burden data rather than relying on single sources.1 There were also several cases where updates did not change grant decisions but led to meaningful changes in our research.

Some self-assessed progress that caught my eye — incomplete list, full one here; these "led to important errors or... worsened the credibility of our research" (0 = no progress made, 10 = completely resolved):

Failure to engage with outside experts (8/10): We spent 240 days at conferences/site visits in 2024 (vs. 60 in 2023). We think this type of external engagement helped us avoid ~$4m in grants and identify new grant opportunities like Uduma water utility ($480,000). We've established ongoing relationships with field experts. (more)
Failure to check burden data against multiple sources (8/10): By using multiple data sources for disease burden, we made ~$34m in grants we likely wouldn't have otherwise and declined ~$14m in grants we probably would have made. We've implemented comprehensive guidelines for triangulating data sources. (more)
Failure to account for individuals receiving interventions from other sources (7/10): We were underestimating how many people would get nets without our campaigns, reducing cost-effectiveness by 20-25%. We've updated our models but have made limited progress on exploring routine distribution systems (continuous distribution through existing health channels) as an alternative or complement to our mass campaigns. (more)
Failure to estimate interactions between programs (7/10): We adjusted our vitamin A model to account for overlap with azithromycin distribution (reducing effectiveness by ~15%) and accounted for malaria vaccine coverage when estimating nets impact. We've developed a framework to systematically address this. (more)

(As an aside, I've noticed plenty of claims of GW top charity-beating cost-effectiveness figures both on the forum and elsewhere, and I basically never give them the credence I'd give to GW's own estimates, due to the kind of (usually downward) adjustments mentioned above like receiving interventions from other sources or between-program interventions, and GW's sheer reasoning thoroughness behind those adjustments, seriously, click on any of those "(more)"s)

Some other issues they'd "been aware of at the time of red teaming and had deprioritized but that we thought were worth looking into following red teaming" — again incomplete list, full one here:

Insufficient attention to inconsistency across cost-effectiveness analyses (CEAs) (8/10): We made our estimates of long-term income effects of preventive health programs more consistent (now 20-30% of benefits across top charities vs. previously 10-40%) and fixed implausible assumptions on indirect deaths (deaths prevented, e.g., by malaria prevention that aren’t attributed to malaria on cause-of-death data). We've implemented regular consistency checks. (more)
Insufficient attention to some fundamental drivers of intervention efficacy (7/10): We updated our assumptions about net durability and chemical decay on nets (each changing cost-effectiveness by -5% and 11% across geographies) and consulted experts about vaccine efficacy concerns, but we haven't systematically addressed monitoring intervention efficacy drivers across programs. (more)
Insufficient sideways checks on coverage, costs, and program impact (7/10): We funded $900,000 for external surveys of Evidence Action's water programs, incorporated additional DHS data in our models, and added other verification methods. We've made this a standard part of our process but think there are other areas where we’d benefit from additional verification of program metrics. (more)
Insufficient follow-up on potentially concerning monitoring and costing data (7/10): We’ve encouraged Helen Keller to improve its monitoring (now requiring independent checks of 10% of households), verified AMF's data systems have improved, and published our first program lookbacks. However, we still think there are important gaps. (more)

I always had the impression GW engaged outside experts a fair bit, so I was pleasantly surprised to learn they thought they weren't doing enough of it and then actually followed through so seriously, this is an A+ example of organisational commitment to and follow-through on self-improvement so I'd like to quote this section in full:

In 2024, we spent ~240 days at conferences or site visits, compared to ~60 in 2023. We spoke to experts more regularly as part of grant investigations, and tried a few new approaches to getting external feedback. While it’s tough to establish impact, we think this led to four smaller grants we might not have made otherwise (totalling ~$1 million) and led us to deprioritize a ~$10 million grant we might’ve made otherwise.
More detail on what we said we’d do to address this issue and what we found (text in italics is drawn from our original report):
More regularly attend conferences with experts in areas in which we fund programs (malaria, vaccination, etc.).
In 2024, our research team attended 16 conferences, or ~140 days, compared to ~40 days at conferences in 2023.35
We think these conferences helped us build relationships with experts and identify new grant opportunities. Two examples:
A conversation with another funder at a conference led us to re-evaluate our assumptions on HPV coverage and ultimately deprioritize a roughly $10 million grant we may have made otherwise.36
We learned about Uduma, a for-profit rural water utility, at a conference and made a $480,000 grant to them in November 2024.37
We also made more site visits. In 2023, we spent approximately 20 days on site visits. In 2024, the number was approximately 100 days.38
Reach out to experts more regularly as part of grant investigations and intervention research. We’ve always consulted with program implementers, researchers, and others through the course of our work, but we think we should allocate more relative time to conversations over desk research in most cases.
Our research team has allocated more time to expert conversations. A few examples:
Our 2024 grants for VAS to Helen Keller International relied significantly on conversations with program experts. Excluding conversations with the grantee, we had 15 external conversations.
We’ve set up longer-term contracts with individuals who provide us regular feedback. For example, our water and livelihoods team has engaged Daniele Lantagne and Paul Gunstensen for input on grant opportunities and external review of our research.
We spoke with other implementers about programs we’re considering. For example, we discussed our 2024 grant to support PATH’s technical assistance to support the rollout of malaria vaccines with external stakeholders in the space.39
This led to learning about some new grant opportunities. For example:
The $150,000 grant to International Rescue Committee (IRC) for desk-based scoping of programs to increase vaccination coverage stemmed from a conversation with an expert.
We connected with Zipline at a conference, and made a $55,000 grant to them for desk-based scoping of drones to increase vaccination coverage in December 2024.
We are currently considering a $4 million grant that we learned about through an expert conversation.40
Experiment with new approaches for getting feedback on our work.
In addition to the above, we tried a few other approaches we hadn’t (or hadn’t extensively) used before. Three examples:
Following our red teaming of GiveWell’s top charities, we decided to review our iron grantmaking to understand what were the top research questions we should address as we consider making additional grants in the near future. We had three experts review our work in parallel to internal red teaming, so we could get input and ask questions along the way.41 We did not do this during our top charities red teaming, in the report of which we wrote “we had limited back-and-forth with external experts during the red teaming process, and we think more engagement with individuals outside of GiveWell could improve the process.”
We made a grant to Busara to collect qualitative information on our grants to Helen Keller International's vitamin A supplementation program in Nigeria.42
We funded the Center for Global Development to understand why highly cost-effective GiveWell programs aren’t funded by other groups focused on saving lives. This evaluation was designed to get external scrutiny from an organization with expertise in global health and development, and by other funders and decision-makers in low- and middle-income countries.

Some quick reactions:

I like that GW thinks they should allocate more time to expert conversations vs desk research in most cases
I like that GW are improving their own red-teaming process by having experts review their work in parallel
I too am keen to see what CGD find out re: why GW top-recommended programs aren't funded by other groups you'd expect to do so
the Zipline exploratory grant is very cool, I raved about it previously
I wouldn't have expected that the biggest driver in terms of grants made/not made would be failure to sense check raw data in burden calculations; while they've done a lot to redress this there's still a lot more on the horizon, poised to affect grantmaking for areas like maternal mortality (prev. underrated, deserves a second look)
funnily enough, they self-scored 5/10 on "insufficient focus on simplicity in cost-effectiveness models"; as someone who spent all my corporate career ~~pained by~~ working with big messy spreadsheets and who's also checked out GW's CEAs over the years I think they're being a bit harsh on themselves here...

Ben Kuhn has a great essay about how

all my favorite people are great at a skill I’ve labeled in my head as “staring into the abyss.”¹
Staring into the abyss means thinking reasonably about things that are uncomfortable to contemplate, like arguments against your religious beliefs, or in favor of breaking up with your partner. It’s common to procrastinate on thinking hard about these things because it might require you to acknowledge that you were very wrong about something in the past, and perhaps wasted a bunch of time based on that (e.g. dating the wrong person or praying to the wrong god). However, in most cases you have to either admit this eventually or, if you never admit it, lock yourself into a sub-optimal future life trajectory, so it’s best to be impatient and stare directly into the uncomfortable topic until you’ve figured out what to do. ...
I noticed that it wasn’t just Drew (cofounder and CEO of Wave) who is great at this, but many the people whose work I respect the most, or who have had the most impact on how I think. Conversely, I also noticed that for many of the people I know who have struggled to make good high-level life decisions, they were at least partly blocked by having an abyss that they needed to stare into, but flinched away from.
So I’ve come to believe that becoming more willing to stare into the abyss is one of the most important things you can do to become a better thinker and make better decisions about how to spend your life.

I agree, and I think there's an organisational analogue as well, which GiveWell exemplifies above.

Mo Putera @ 2025-08-16T08:37 (+26)

Stuart Buck's new post over at The Good Science Project has one of the hardest-hitting openings I've read in a while:

Many common medical practices do not have strong evidence behind them. In 2019, a group of prominent medical researchers—including Robert Califf, the former Food and Drug Administration (FDA) Commissioner—undertook the tedious task of looking into the level of evidence behind 2,930 recommendations in guidelines issued by the American Heart Association and the American College of Cardiology.They asked one simple question: how many recommendations were supported by multiple small randomized trials or at least one large trial? The answer: 8.5%. The rest were supported by only one small trial, by observational evidence, or just by “expert opinion only.”
For infectious diseases, a team of researchers looked at 1,042 recommendations in guidelines issued by the Infectious Diseases Society of America. They found that only 9.3% were supported by strong evidence. For 57% of the recommendations, the quality of evidence was “low” or “very low.” And to make matters worse, more than half of the recommendations considered low in quality of evidence were still issued as “strong” recommendations.
In oncology, a review of 1,023 recommendations from the National Comprehensive Cancer Network found that “…only 6% of the recommendations … are based on high-level evidence”, suggesting “a huge opportunity for research to fill the knowledge gap and further improve the scientific validity of the guidelines.”
Even worse, as shown in a great book co-authored by current FDA official Vinay Prasad, there are many cases where not only is a common medical treatment lacking the evidence to support it, but also one or more randomized trials have shown that the treatment is useless or even harmful!

(I'll refrain from quoting the rest and suggest instead you check out his post)

Charlie_Guthmann @ 2025-08-17T21:30 (+3)

sent to my dad who is an editor at FPIN. I think he only quickly skimmed it so grain of salt here but this is what he had to say.

"I already know that we waste a lot of money on things that don't work or work poorly. Knowing that they don't work has not yet been enough."

and

"China may be able to do the plan as described because of their command economy rather than the influence wielded by the groups the article described as barriers."

Mo Putera @ 2025-03-27T05:32 (+23)

Buried deep in the PEPFAR Report's appendix - methodology section is a nice "introduction to global health programs" mini-article that also addresses some lay misconceptions about foreign aid and suggests a better way to think about it all in one go; it's a shame that most folks won't read it, so I'm reposting it here for ease of future reference.

Introduction to Global Health Programs
Many people are skeptical of foreign aid and other attempts to help the global poor—and they’re right to be! A lot of foreign aid is poorly targeted, counterproductive, or simply a waste of money. From PlayPumps to TOMS shoes to One Laptop Per Child, the news is full of well-intentioned programs that had nowhere near the effect their boosters advertised. Many prominent experts, such as William Easterly and Angus Deaton, question whether foreign aid works at all.
Development economists, charity evaluators, and other specialists perform “program evaluations,” which ask questions like:
Does the problem we’re trying to solve actually exist?
Why does the problem exist?
Is the program well-implemented?
Is the program having the effect that we expected?
Is the program too expensive? Can some other program get the same results for less money?
In general, program evaluations are interested in finding out what the effects of a program are. The effect of a program is the difference between the outcome (what actually happened) and the counterfactual (what would have happened without the program being implemented). It’s impossible to measure the counterfactual because the counterfactual is about the same people at the same time. The counterfactual can only ever be estimated. Program evaluators have come up with many different ways of estimating the counterfactual, which we’ll talk about on the main page.
Researchers have found that global health interventions are far more likely to work than programs like PlayPumps or One Laptop Per Child. It’s easy to be wrong about whether a school system needs laptops, especially in a country far away from your own; it’s much harder to be wrong about whether a country has sky-high rates of HIV/AIDS. We don’t know much about the causes of poverty or what makes countries develop economically; we know much more about the causes of HIV and what makes HIV progress to AIDS. PlayPumps were a brand-new invention that might not work; antiretroviral medications are well-tested, well-understood, and widely used in the developed world. For this reason, the charity evaluator GiveWell—which specializes in cost-effective ways of helping the global poor—mostly recommends charities that provide healthcare.
Foreign aid often has unintended consequences: for example, giving people shoes (like TOMS shoes did) can put local shoemakers out of work; foreign aid can lead to governments prioritizing the wishes of foreign donors over the wishes of their own people. Providing healthcare has many fewer negative unintended consequences than other forms of foreign aid: providing antiretrovirals is unlikely to put small local antiretroviral manufacturers out of work. It can have some other unintended consequences, like loss of democratic accountability.
But providing healthcare also could also have positive unintended consequences. Healthy children may be more likely to go to school. Then healthy adults may be able to work more and earn more for their families. These effects can be huge. For instance, the charity evaluator GiveWell estimates that a third of the benefit of insecticide-treated bednet distribution, a treatment to avert deaths from malaria, comes from increased income.
Poverty in America is horrible: no one should be unsure how to pay for rent, food, or healthcare. But Americans are extraordinarily rich compared to the rest of the world: a person at the poverty line in the United States is in the top 15% wealthiest people in the world, even if you adjust for how far money goes in each country. Many of the world’s poorest people live in the countries PEPFAR works in: about two-fifths of people in sub-Saharan Africa live on less than $2.15 a day, adjusted for how far money goes. Since these people are so poor, they don’t have many of the opportunities Americans take for granted. Of course, the first priority of the United States government should be to help American citizens. But if you’re used to charity at home, it can be shocking how cheap it is to help people abroad.
Most of all: just throwing money at foreign aid doesn’t fix anything. But that isn’t a reason to give up—not with millions of lives at stake. If we’re careful and thoughtful, and if we actually check whether what we’re doing does any good, then we don’t have to be PlayPumps or TOMS shoes. We can concretely, robustly make things better.

NickLaing @ 2025-03-27T07:52 (+4)

Wow that's fantastic - I wonder who wrote it, that seems extremely EA flavoured.

Mo Putera @ 2025-03-27T12:00 (+6)

You're bang on, definitely a few EAs involved:

Authors:
Kelsey Piper is a journalist at Vox.
Leah Libresco Sargeant is a journalist.
Colin Aitken is a postdoctoral scholar in development economics at the University of Chicago.
Alex Randall is a foreign aid and procurement expert.
Bruce Tsai is a doctor.
Dave Kasten is a consultant.
Zac Hatfield-Dodds is a fellow of the Python Software Foundation.
Keller Scholl is a PhD candidate in policy analysis.
Clara Collier is the Editor in Chief at Asterisk.
Rishi Mago is a software engineer at Amazon.
We speak only for ourselves and our consciences. None of our respective institutions have reviewed this work. Alex was formerly employed by a USAID contractor that received PEPFAR funding. She did not work directly on PEPFAR programs. We thank Emily Lin for serving as our Webmaster. We are indebted to a number of external reviewers, including Saloni Dattani and Andrew Martin.

Mo Putera @ 2024-06-24T10:50 (+23)

Curious what people think of Gwern Branwen's take that our moral circle has historically narrowed as well, not just expanded (so contra Singer), so we should probably just call it a shifting circle. His summary:

The “expanding circle” historical thesis ignores all instances in which modern ethics narrowed the set of beings to be morally regarded, often backing its exclusion by asserting their non-existence, and thus assumes its conclusion: where the circle is expanded, it’s highlighted as moral ‘progress’, and where it is narrowed, what is outside is simply defined away.
When one compares modern with ancient society, the religious differences are striking: almost every single supernatural entity (place, personage, or force) has been excluded from the circle of moral concern, where they used to be huge parts of the circle and one could almost say the entire circle. Further examples include estates, houses, fetuses, prisoners, and graves.

(I admittedly don't find his examples all that persuasive, probably because I'm already biased to only consider beings that can feel pleasure and suffering.)

What's the "so what"? Gwern:

One of the most difficult aspects of any theory of moral progress is explaining why moral progress happens when it does, in such apparently random non-linear jumps. (Historical economics has a similar problem with the Industrial Revolution & Great Divergence.) These jumps do not seem to correspond to simply how many philosophers are thinking about ethics.
As we have already seen, the straightforward picture of ever more inclusive ethics relies on cherry-picking if it covers more than, say, the past 5 centuries; and if we are honest enough to say that moral progress isn’t clear before then, we face the new question of explaining why things changed then and not at any point previous in the 2500 years of Western philosophy, which included many great figures who worked hard on moral philosophy such as Plato or Aristotle.
It is also troubling how much morality & religion seems to be correlated with biological factors. Even if we do not go as far as Julian Jaynes’s⁹ theories of gods as auditory hallucinations, there are still many curious correlations floating around.

Pablo @ 2024-06-24T16:48 (+6)

Hi Mo. I'm unsure if you've seen it, but Gwern’s article was discussed here.

Mo Putera @ 2024-06-25T02:33 (+3)

I hadn't, thanks for the pointer Pablo.

Mo Putera @ 2024-01-31T11:11 (+23)

As someone predisposed to like modeling, the key takeaway I got from Justin Sandefur's Asterisk essay PEPFAR and the Costs of Cost-Benefit Analysis was this corrective reminder – emphasis mine, focusing on what changed my mind:

Second, economists were stuck in an austerity mindset, in which global health funding priorities were zero-sum: $300 for a course of HIV drugs means fewer bed nets to fight malaria. But these trade-offs rarely materialized. The total budget envelope for global public health in the 2000s was not fixed. PEPFAR raised new money. That money was probably not fungible across policy alternatives. Instead, the Bush White House was able to sell a dramatic increase in America’s foreign aid budget by demonstrating that several billion dollars could, realistically, halt an epidemic that was killing more people than any other disease in the world.
...
A broader lesson here, perhaps, is about getting counterfactuals right. In comparative cost-effectiveness analysis, the counterfactual to AIDS treatment is the best possible alternative use of that money to save lives. In practice, the actual alternative might simply be the status quo, no PEPFAR, and a 0.1% reduction in the fiscal year 2004 federal budget. Economists are often pessimistic about the prospects of big additional spending, not out of any deep knowledge of the budgeting process, but because holding that variable fixed makes analyzing the problem more tractable. In reality, there are lots of free variables.

More detail:

Economists’ standard optimization framework is to start with a fixed budget and allocate money across competing alternatives. At a high-level, this is also how the global development community (specifically OECD donors) tends to operate: foreign aid commitments are made as a proportion of national income, entirely divorced from specific policy goals. PEPFAR started with the goal instead: Set it, persuade key players it can be done, and ask for the money to do it.
Bush didn’t think like an economist. He was apparently allergic to measuring foreign aid in terms of dollars spent. Instead, the White House would start with health targets and solve for a budget, not vice versa. ... Economists are trained to look for trade-offs. This is good intellectual discipline. Pursuing “Investment A” means forgoing “Investment B.” But in many real-world cases, it’s not at all obvious that the realistic alternative to big new spending proposals is similar levels of big new spending on some better program. The realistic counterfactual might be nothing at all.
In retrospect, it seems clear that economists were far too quick to accept the total foreign aid budget envelope as a fixed constraint. The size of that budget, as PEPFAR would demonstrate, was very much up for debate.
When Bush pitched $15 billion over five years in his State of the Union, he noted that $10 billion would be funded by money that had not yet been promised. And indeed, 2003 marked a clear breaking point in the history of American foreign aid. In real-dollar terms, aid spending had been essentially flat for half a century at around $20 billion a year. By the end of Bush’s presidency, between PEPFAR and massive contracts for Iraq reconstruction, that number hovered around $35 billion. And it has stayed there since.
Compared to normal development spending, $15 billion may have sounded like a lot, but exactly one sentence after announcing that number in his State of the Union address, Bush pivoted to the case for invading Iraq, a war that would eventually cost America something in the region of $3 trillion — not to mention thousands of American and hundreds of thousands of Iraqi lives. Money was not a real constraint.

Tangentially, I suspect this sort of attitude (Iraq invasion notwithstanding) would naturally arise out of a definite optimism mindset (that essay by Dan Wang is incidentally a great read; his follow-up is more comprehensive and clearly argued, but I prefer the original for inspiration). It seems to me that Justin has this mindset as well, cf. his analogy to climate change in comparing economists' carbon taxes and cap-and-trade schemes vs progressive activists pushing for green tech investment to bend the cost curve. He concludes:

You don’t have to give up on cost-effectiveness or utilitarianism altogether to recognize that these frameworks led economists astray on PEPFAR — and probably some other topics too. Economists got PEPFAR wrong analytically, not emotionally, and continue to make the same analytical mistakes in numerous domains. Contrary to the tenets of the simple, static, comparative cost-effectiveness analysis, cost curves can sometimes be bent, some interventions scale more easily than others, and real-world evidence of feasibility and efficacy can sometimes render budget constraints extremely malleable. Over 20 years later, with $100 billion dollars appropriated under both Democratic and Republican administrations, and millions of lives saved, it’s hard to argue a different foreign aid program would’ve garnered more support, scaled so effectively, and done more good. It’s not that trade-offs don’t exist. We just got the counterfactual wrong.

Aside from his climate change example above, I'd be curious to know what other domains economists are making analytical mistakes in w.r.t. cost-benefit modeling, since I'm probably predisposed to making the same kinds of mistakes.

Mo Putera @ 2025-03-15T13:48 (+2)

From the PEPFAR report, the same thing Justin Sandefur mentioned above:

When PEPFAR was announced, many economists thought it would not be cost-effective to treat AIDS. They were wrong. Their initial concern was understandable: when PEPFAR began in 2004, first-line antiretroviral therapy cost about $1,000 per year, so treating everyone with HIV would cost tens of billions of dollars. Many experts worried that PEPFAR would lead to funding cuts for other highly cost-effective aid efforts. But the pessimistic forecasts didn’t come true. First, the Bush Administration decided to fund PEPFAR on top of existing aid efforts. Second, the massive increase in funding for antiretroviral drugs created demand that helped drive competition and innovation, and supercharged an existing race to develop cheaper, more effective generic drugs.³ Today, a year of first-line antiretroviral medication costs about $60.

As a result, PEPFAR has saved between 7.5 and 30 million lives, at a cost between $1,500 and $10,000 per life saved, and has also prevented at least 5.5 million babies from being born with HIV. It has also gotten more cost-effective each year as medication costs decline, doing more and more on a budget that has been declining in real dollars since 2009.

Some other quotes that stood out to me:

PEPFAR was conceived of and launched during a time of widespread skepticism about foreign aid. It had become clear that lots of foreign aid was corrupt, wasteful, or unhelpful. But the thesis behind PEPFAR was that health interventions might succeed where larger development interventions often had not. Accountability for health interventions is easier—we can tell whether or not a drug has reached a patient. Results are more measurable. And while development interventions are often premised on ideological claims that may come and go, or on theories of investment that may not hold up, health interventions have only the simple thesis that health is better than sickness and life is better than death.
In the present day, PEPFAR has been successfully handing off responsibilities in partner nations, targeting 70% of funding through local organizations including partner country governments. Some countries, including Botswana and South Africa, have successfully transitioned to funding a majority of their own HIV efforts, with PEPFAR now playing a smaller supporting role.
PEPFAR requires very good accounting controls, with every expenditure documented and demonstrated to be in line with program requirements. For this report, we read three recent Office of the Inspector General audits of PEPFAR program recipients. The three audits we reviewed found undocumented expense rates ranging from 0% to 2% of program expenses, and they demanded repayment of every dollar unaccounted for.
Quoting Goldsmith, Horiuchi, and Wood 2014, “PEPFAR has, indeed, positively affected how publics in recipient countries regard the US … what types of aid, under what conditions, might be effective in influencing foreign public opinion about the donor. Specifically, our theory is that foreign aid that is targeted, sustained, effective, and visible is more likely to affect mass opinion. … as Goldsmith and Horiuchi(2012) show, changes in public opinion within Country B about Country A can influence Country B’s foreign policy behavior toward Country A.” (emphasis ours).

Mo Putera @ 2025-12-24T08:41 (+22)

Reread Patrick McKenzie (patio11)'s inspirational oral history of VaccinateCA and thought to pull out a few quotes for my own edification. (Patrick posted about this on the forum awhile back, that's worth reading too.)

The following is what it looks like to bake in triage into org decision-making from the top down:

We had an internal culture of counting the passage of time from Day 0, the day (in California) we started working on the project. We made the first calls and published our first vaccine availability on Day 1. I instituted this little meme mostly to keep up the perception of urgency among everyone.
We repeated a mantra: Every day matters. Every dose matters.
Where other orgs would say, ‘Yeah I think we can have a meeting about that this coming Monday,’ I would say, ‘It is Day 4. On what day do you expect this to ship?’ and if told you would have your first meeting on Day 8, would ask, ‘Is there a reason that meeting could not be on Day 4 so that this could ship no later than Day 5?’
I started every meeting and status report to the team by reminding them what Day it was. Our internal stats dashboard had a counter of what Day it was. I had a whiteboard in my apartment showing what Day it was. I wrote that every morning as soon as I woke up, and updated the other two numbers right before I went to sleep. Those were: the number of locations we had published to Californians where they could currently get the vaccine, and the number we knew about elsewhere across the United States with the vaccine.
The latter was zero at this point, of course. I brushed my teeth, wrote my emails, ate my meals, did media interviews, called my family, negotiated with funders, and said my prayers with the zero where I could see it.
A photo of the non-computer version of VaccinateCA’s dashboard, taken on Day 39. It shows VaccinateCA’s then-current understanding of where to find the vaccine: 1,025 sites in California and 0 sites outside of California. Image by Patrick McKenzie.

I think in absolute terms plenty of orgs do this, Patrick just so happens to be a good writer. But in relative terms it's quite rare, and very meaningful to see, especially for folks like me with a bit of mission orientation. Also this:

After the workday was over and pharmacies stopped answering their phones, the workday began again immediately, as much of our engineering team turned off their day job computers and logged on to Discord to digest what we had learned. We worked into the night, and not infrequently through it, to be ready for 9:30am the next morning.
It was often a brutal crunch. I told the team to take care of themselves and keep an eye on one another. The mission wouldn’t be served by anyone ending up in the hospital. But subject to that constraint, we worked like men and women possessed, because people were dying.

On entrepreneurship:

Part of entrepreneurship is having a vision of something that is possible and figuring out what is necessary to bring it into the world. A cynic would say that the world has a secret: Building things is not actually possible, because different organizations have different timelines allowing access to different resources, and it is impossible to correctly sequence things to satisfy all the requirements in order to build anything. An entrepreneur would tell the cynic a secret in return: You can carefully titrate the amount of truth to various parties to dissolve these deadlocks.
Your donor-advised fund won’t let you donate unless we’re a 501(c)(3)? Well, you’d donate if we were a 501(c)(3), right? Great. We’re applying for approval as a 501(c)(3) from the IRS. Can I put you down for $25,000? Dear IRS examiners: I have a written commitment from a charitable allocator for a $25,000 donation contingent on 501(c)(3) status. As you are aware, IRS procedure says that this qualifies for expedited processing. Oh, yes, government actor whose cooperation we need, we’re a nonprofit. Look at this official paperwork from Delaware. It says that the State of Delaware is officially aware that I say we’re a nonprofit. Not good enough? Our 501(c)(3) status? The IRS is busy approving it, on an expedited basis.

This is somewhat reminiscent of what Scott Alexander wrote about a very different person, although what Patrick calls "carefully titrating the amount of truth to various parties" Scott outright labeled "blatant lies"; my takeaway is that it's possible to do a more ethical version of the description below:

I started the book with the question: what exactly do real estate developers do? They don’t design buildings; they hire an architect for that part. They don’t construct the buildings; they hire a construction company for that part. They don’t manage the buildings; they hire a management company for that part. They’re not even the capitalist who funds the whole thing; they get a loan from a bank for that. So what do they do? Why don’t you or I take out a $100 million loan from a bank, hire a company to build a $100 million skyscraper, and then rent it out for somewhat more than $100 million and become rich?
As best I can tell, the developer’s job is coordination. This often means blatant lies. The usual process goes like this: the bank would be happy to lend you the money as long as you have guaranteed renters. The renters would be happy to sign up as long as you show them a design. The architect would be happy to design the building as long as you tell them what the government’s allowing. The government would be happy to give you your permit as long as you have a construction company lined up. And the construction company would be happy to sign on with you as long as you have the money from the bank in your pocket. Or some kind of complicated multi-step catch-22 like that. The solution – or at least Trump’s solution – is to tell everybody that all the other players have agreed and the deal is completely done except for their signature. The trick is to lie to the right people in the right order, so that by the time somebody checks to see whether they’ve been conned, you actually do have the signatures you told them that you had. The whole thing sounds very stressful.

But I digress. Relatedly:

That’s life in a start-up: trying to create enough impact very quickly to convince people to give you more resources, while understanding that the default case is running out of resources, and, by the way, everything is broken all the time.

On do-gooder precocity:

We saw peer projects sprout up in many states, with varying levels of effort and success. Many credited us as an inspiration. One peer project was ILVaccine.org, a project of Eli Coustan. We were working with him for a while before I learned he was in middle school. When I later blanked on his name and asked someone about the public health infrastructure coordinator who was a middle school student I was asked to be more specific.

On ownership and accountability, a case study:

Our scripts instructed callers to take down notes from pharmacists as to how to get an appointment for doses they had, when they had them. A Rite Aid pharmacy in San Bernardino asked our caller to sign up for an appointment at the county health department’s website. Our caller, who had been calling into San Bernardino frequently and had seen that website frequently, remarked that he had seen no Rite Aid listed as a possible vaccination location.
The pharmacist then swore into the telephone, hung up, and immediately called the county health department.
I want you to visualize the operations of county health offices during the middle of the pandemic. A stressed staff are busy coordinating a logistical challenge larger than any they’ve faced in their careers. Their phones are ringing off the hook. The consultancy that won the bid finally delivered the freaking appointment website, thank god. It is crappy and barely works but at least it is finally here. You just have to download all of the email attachments from the pharmacy chain corporate offices, maybe fix a few in Excel because those jokers can’t read clear instructions, then upload them into the administrative side of the portal, and finally people can register for appointments to get the doses sitting in pharmacy freezers.
Rite Aid’s data never made it into the system. Maybe Rite Aid forgot to send it and nobody followed up. Maybe it got eaten by the county’s spam filter. Maybe a public health worker with a million things to do did 999,999.
There were 13 Rite-Aids in San Bernardino county. None of them, despite being in possession of the most desirable object in the world, had received a single appointment. No pharmacist, with years of training in healthcare, noticed this before we told them.
Why would they? Every pharmacy has lots of tiny glass vials and bottles of pills and satchels of powder. Patients were coming in and getting healthcare. It was no one’s job to check that any particular vials got distributed quickly. Pharmacists are not pharmaceutical sales representatives; they do not generate demand. Pharmacists service appointments and prescriptions, deliver healthcare, and go on to the next patient. If you walked up to the counter or called in and asked about Covid appointments, they’d tell you to book one with the county and move on to the next customer. Just another day at the pharmacy.
You might object and say that it must have been someone’s job to actually get those doses injected. Someone who worked . . . at the White House? Okay, no, but at the CDC? Okay, no, but at the California Office of the Governor? Okay, no, but at the county health department? Okay, no, county health departments do not track individual SKU inventory levels at individual pharmacies, that’s actually not a thing. OK, then, Rite Aid – some logistics manager at Rite Aid should have opened a spreadsheet, seen an SKU like #DJFKJDF3285325 with 50 doses available out of 50 shipped at a location in San Bernardino, and immediately said, ‘Oh, #$*#(%. That drug being in supply is equivalent to a life-threatening medical emergency. I will now get out my emergency procedures binder.’ Nope, that is also not a reasonable expectation.
Each of these organizations wants someone else to be responsible for catching errors like this, and they want them to be effective at doing so. They want, and the nation wants, an organization to be accountable for delivering the vaccine.
VaccinateCA considered this bug, and anything else that kept vaccines in freezers while patients were still waiting, to be our problem.
This problem was fixed because a caller from VaccinateCA thought to say, ‘Wait, I notice that I am confused’. It was fixed within about half an hour of being noticed. We estimate more than 500 doses were quickly taken out of freezers, thawed, and injected into waiting arms. Those arms were often attached to people who had been refreshing the county website every few minutes hoping new appointments would finally open up.
This was early and dramatic evidence to me that California was benefiting from having an organization that felt itself accountable for delivering the vaccine.

On how much of Patrick's job in the early days as CEO was bringing in funding:

VaccinateCA had an extremely effective team, as good as any I have ever had the privilege of working with. They were instrumental in almost everything I did and probably could have done almost all of it without me. The main thing I uniquely brought to the table, and spent a lot of my cycles on, was finding the money.
Our earliest funding source was a prepaid debit card I spun up and posted in our public Discord, with $20,000 of my own savings on it. That would cover servers and software and similar for a while. That was not going to be sufficient to get a proper nonprofit with a paid staff off the ground, and I knew we would both need that and need to be read as that to get cooperation from some quarters.
I called in favors and plead our case up and down the tech industry, and scraped together about $1.2 million in funding.
This was below what I initially thought I could reasonably raise, and below what I thought we likely needed. For better or worse, it would have been a lot easier if I had pitched it as: ‘Just make a small angel investment in a promising technology company whose CEO thinks his job is burning through investor dollars as quickly as possible while driving the total addressable market to zero. You won’t make any money on it, but think of the story.’
But while that tech company would probably have been well funded, it would have smelled like a tech company to potential partners. To accelerate shots in arms we urgently needed the cooperation of people who, if confronted with the proposition ‘Big Tech is bringing about the end of constitutional democracy so that it can gather more of your data to sell’, would like that tweet from their iPhone. I have a different point of view, but debating would not have put shots in arms.
We were approximately the most privileged nascent nonprofit imaginable in terms of access to funding, and given that it directly unlocked our ability to help people find the vaccine, I don’t want to complain too much about the process of getting it. I will record, for the benefit of future charitable founders, that probably half of my time from Day 8 to Day 160 was spent chasing funding, dealing with funders and the nonprofit industrial complex, pitching (and pitching and pitching and pitching) large pots of tech money earmarked for pandemic response, filing required reports with funders and the government, and diligently accounting for every penny spent.
It was a bit of a culture shock coming from the technology industry. Tech isn’t exactly profligate, but it certainly empowers twentysomething engineers to spend thousands of dollars by typing a command into their terminals. An engineer who fumble-fingers a command and spends ten times what they expected to is told to type more carefully next time.

On the advantages private individuals and organizations have over official initiatives:

The government of the United States is an intrinsically political entity. We were formally nonpartisan (and even better, as a 501(c)(3) nonprofit, had to be). Informally, to quote a memo I wrote early on, we would do a deal with the devil himself if it got one more patient one more dose. We didn’t need to worry about compromising anyone’s reelection chances by being too maniacally focused on shots in arms to consider the big picture. We had no responsibilities to allies in our party, like not overshadowing their efforts, because we had no party and, for that matter, no shadow.
I knew that many political actors wanted to hoard the facts of their operations so that they could claim credit for them. I just didn’t particularly care. If an actor had a dose available to the public, it was going to be publicized anywhere we could cause it to be.

(It's hard to convey how much I like and appreciate that last paragraph.)

On funder-nonprofit misalignment:

Some potential funders were in after an emailed discussion that could fit in a tweet. Some were in after a single call.
Some potential funders had expectations that were misaligned with us. VaccinateCA was always designed as a rapid-response project that would spin up, cover for an urgent gap in US infrastructure, and then spin down once the work was done. We explained this at length while emphasizing that we wanted to work with funders who could reach a decision quickly. Every day mattered. Every dose matters.
We had some interactions where we were put through weeks or months of grant writing, which sounds like ‘turn in a paper and wait for it to be graded’ and is more ‘schedule sufficient meetings with sufficient backing documentation to buy a company for $20 million’. The funder eventually passed on us, saying they worried we would create no institution with enduring value after the pandemic.
I don’t begrudge anyone’s choices of how to spend their money, particularly charitably earmarked money. I will point out that the gap in expectations between a grant-reviewing team keen on institution building and a nonprofit with an urgent unmet need is a very, very common story in nonprofit fundraising.
Many pots of money have preferences with regard to how they allocate, and those preferences change with the seasons. Have I mentioned that health equity was all the rage in California in 2021? I put on my best face to funders, explained that the system of siloing vaccine information benefited only people who were professionally competent at navigating the American healthcare bureaucracy. I suggested that publishing vaccine locations to a website and Google and every other place we could think of was an improvement over that status quo. I didn’t engage with debates about how, and this was made absolutely explicit in some conversations – perhaps saving lives but failing to save lives in preferred demographic ratios would be considered worse than not engaging in the project at all.

(I'll be upfront that despite having spent a couple of my formative years in California, my bias leans so far in Patrick's direction that it'd probably be useful for me to hear out the strongest counterargument, especially in the context of triage.)

Vasco Grilo🔸 @ 2025-12-24T19:35 (+2)

Thanks for the relevant quotes, Mo!

Mo Putera @ 2025-11-19T08:39 (+22)

It's mind-blowing to me that AMF's immediate funding gap is $462M for 2027-29. That's 56-154,000 lives (mostly under-5 children) at $3-8k per life saved, maybe fewer going forward due to evolving resistance to insecticides, but it wouldn't change the bottomline that this seems to be a gargantuan ball dropped. Last time AMF's immediate funding gap was over $300M for 2024-26, so it's grown 50%(!) this time round. Both times the main culprit was the same, the Global Fund's funding replenishment shortfall vs target, which affects programmatic planning in countries. I'd like to think we're collectively doing our part (e.g. last year GiveWell directed $150M to AMF, more than to any other charity, which by their reckoning is expected to save ~27k lives over the next 1-2 years), but it's still nuts to me that such a longstanding high-profile "shovel-ready" giving opportunity as AMF can still have such a big and growing gap!

NickLaing @ 2025-11-19T14:53 (+22)

this doesn't surprise me so much - they've got a huge amount of experience and expertise and have fantastic in country distribution networks organized to distribute a lot of nets. there's a strong element of rinse and repeat when it comes even to nationwide net distribution.

Personally I'm probably not in favor of increasing AMF's funding too much more than it is currently at, because I think that countries need to start integrating net buying into their national health budgets in a much bigger scale - the norm of donations buying nets needs to slowly shift to governments, so i think the role of AMF should probably be slowly reducing rather than increasing. i think we should really have reached "peak donor net funding" by now and there should be even more of a push than there already is for governments to pay for their own nets. Nets need to be a normal part of government health spending, given that it's a regular intervention that needs to happen every few years and one of the most cost-effective interventions governments can do for their people.

The systems of paying big extra allowances to government workers that are an unfortunate part of many vertical-ish programs like these also need to be wound down. Net distribution needs to be normalized.

We've seen what happens with USAID HIV and malaria stockouts fall apart and the scramble to cover the funding gap. There was endless talk of countries increasingly funding their own HIV systems but I'm most countries little action was taken and the US didn't withdraw much funding to force their hand. Part of this funding gap as AMF said is even related to US not funding the global fund. I think there's a risk of this kind of scramble happening with net distribution funding, for example if GiveWell decided that other options might be more cost effective, or more likely if their open Phil funding dried up quickly for some reason.

Mihkel Viires 🔹 @ 2025-11-23T05:37 (+3)

It does seem necessary to get governments to spend more of their own money on health, indeed. Do you think it would make sense to fund charities to try to convince governments to invest more in health (perhaps by also helping them increase their tax revenues, via increasing tax collection efficiency)?

NickLaing @ 2025-11-23T07:16 (+9)

I think the solution in this case is make clear plans with government then slowly defund the activity. Poor governments that weren't going to find something of their own accord anyway, usually won't front up until the external funding actually reduces.

"working with government" has been the vogue thing for charities, and especially national government aid orgs (like USAID) for decades. There have been endless attempts in the vein you suggest both to support governments to spend more on health, and to allocate money better within the health budget - with no clear evidence that it works. Although shifting government spending from low impact to high impact areas seems attractive, i don't see any reason that it would work well in future when so many have failed in the past.

GiveWell recently gave a big grant along the lines you are thinking, which i largely disagree with (although I've softened a little on it)

https://forum.effectivealtruism.org/posts/t8QRuMfetCbeAkyFu/technical-support-units-a-dubious-givewell-grant

On the taxes front it's a big debate. Personally i think there's very little correlation between increased tax take persay and increased spending on health. if you look at African countries that are spending more and doing better in health like Liberia and Rwanda, they are spending higher percentages of their GDP on health and smarter, not taking more tax than other similar countries.

Then there are cases like Botswana where they spent their diamond money well on healthcare, but that's not from taxes.

Obviously when a country develops, then health care gets better but that's another story. The " growth people" will tell us to focus on growth and not sweat things like tax take and health allocation, but the jury is out as to how much charities / external actors can influence that either.

Mo Putera @ 2025-07-11T05:22 (+22)

GiveWell did their first "lookbacks" (reviews of past grants) to see if they've met initial expectations and what they could learn from them:

Lookbacks compare what we thought would happen before making a grant to what we think happened after at least some of the grant’s activities have been completed and we’ve conducted follow-up research. While we can’t know everything about a grant’s true impact, we can learn a lot by talking to grantees and external stakeholders, reviewing program data, and updating our research. We then create a new cost-effectiveness analysis with this updated information and compare it to our original estimates.

(While I'm very glad they did so with their usual high quality and rigor, I'm also confused why they hadn't started doing this earlier, given that "okay, but did we really help as much as we think we would've? Let's check?" feels like such a basic M&E / ops-y question. I'm obviously missing something trivial here, but also I find it hard to buy "limited org capacity"-type explanations for GW in particular given total funding moved, how long they've worked, their leading role in the grantmaking ecosystem etc)

Their lookbacks led to substantial changes vs original estimates, in New Incentives' case driven by large drops in cost per child enrolled ("we think this is due to economies of scale, efficiency efforts by New Incentives, and the devaluation of the Nigerian naira, but we haven’t prioritized a deep assessment of drivers of cost changes") and in HKI's case driven by vitamin A deficiency rates in Nigeria being lower and counterfactual coverage rates higher than originally estimated:

Bar chart showing the change in expected deaths averted. For New Incentives, the estimate increased from 17,000 to 27,000. For Helen Keller Intl, the estimate decreased from 2,000 to 450.

Karthik Tadepalli @ 2025-07-11T17:17 (+9)

I'm obviously missing something trivial here, but also I find it hard to buy "limited org capacity"-type explanations for GW in particular given total funding moved, how long they've worked, their leading role in the grantmaking ecosystem etc)

This should be very easy for you to buy! The opportunity cost of lookbacks is investigating new grants. It's not obvious that lookbacks are the right way to spend limited research capacity. Worth remembering that GW only has around 30 researchers and makes grants in a lot of areas. And while they are a leading EA grantmaker, it's only recently that their giving has scaled up to being a notable player in the total development ecosystem.

Ian Turner @ 2025-07-11T21:14 (+4)

I suspect there is a confusion of terminology here, and also perhaps some loss of institutional knowledge. Givewell did post-hoc analyses starting in 2011 of their 2009 and 2010 recommendations to donate to VillageReach, but these were not technically "grants", but rather "charity recommendations", so I guess wouldn't be considered a "grant lookback".

In recent years GiveWell shifted from a charity recommendation model to a more direct grantmaking model, so this could be the first reviews of grants under that new model.

NickLaing @ 2025-07-11T15:56 (+4)

Yep I agree!

I've done a quicky sanity check on the New Incentives numbers and it doesn't seem quite plausible, but my it was fast and I could be plain wrong.

https://forum.effectivealtruism.org/posts/FxAtFMRnJZ2dbLBhA/sanity-check-givewell-s-new-incentives-estimate-seems

I would also like to see OpenPhil look back at a bunch of their "hits based" grants. They've done a decent amount of them and I think we should be able to get some idea about whether the approach is working as planned. It wouldn't have to be too detailed. They could even do something a bit loose, like categorising them into maybe 4 buckets like .....

1. Miss 2. Probable miss 3. Some benefit 4. Home Run hit successful!

Or similar

Mo Putera @ 2025-11-26T12:42 (+20)

I really like Bob Fischer's point #4 from deep within the comment threads of his recent post and thought to share it more widely, seemed like wise advice to me:

FWIW, my general orientation to most of the debates about these kinds of theoretical issues is that they should nudge your thinking but not drive it. What should drive your thinking is just: "Suffering is bad. Do something about it." So, yes, the numbers count. Yes, update your strategy based on the odds of making a difference. Yes, care about the counterfactual and, all else equal, put your efforts in the places that others ignore. But for most people in most circumstances, they should look at their opportunity set, choose the best thing they think they can sweat and bleed over for years, and then get to work. Don't worry too much about whether you've chosen the optimal cause, whether you're vulnerable to complex cluelessness, or whether one of your several stated reasons for action might lead to paralysis, because the consensus on all these issues will change 300 times over the course of a few years.

Toby Tremlett🔹 @ 2025-11-27T08:39 (+12)

+1! I'd add that we care about being right as a group, not being right as each individual. I don't think the most efficient distribution of resources looks like each individual spending years on their own cause prioritisation, making drastic career switches every year or so etc...

Vasco Grilo🔸 @ 2025-11-27T10:14 (+11)

Thanks for sharing, Mo. I liked that point to understand @Bob Fischer's general orientation better. At the same time, I did not find it that insightful. I think it makes a point while providing very little argument for it, and I do not seem to agree with the sentiment about the impact of moral views on cause prioritisation. It makes sense to have 4 years with an impact of 0 throughout a career of 44 years to increase the impact of the remaining 40 years (= 44 - 4) by more than 10 % (= 4/40). In this case, the impact would not be 0 "in most circumstances" (40/44 = 90.9 % > 50 %). So I very much agree with a literal interpretation of Bob's statement. However, I feel like it conveys that moral views, and cause prioritisation are less important than what they actually are.

NickLaing @ 2025-11-30T09:35 (+4)

yeah i loved this a lot as well, interestingly was thinking of quoting it for a quick take as well.

Mo Putera @ 2024-03-26T09:07 (+17)

Why did India's happiness ratings consistently drop so much over time even as its GDP per capita rose?

Epistemic status: confused. Haven't looked into this for more than a few minutes

My friend recently alerted me to an observation that puzzled him: this dynamic chart from Our World in Data's happiness and life satisfaction article showing how India's self-reported life satisfaction dropped an astounding -1.20 points (4.97 to 3.78) from 2011 to 2021, even as its GDP per capita rose +51% (I$4,374 to I$6,592 in 2017 prices):

(I included China for comparison to illustrate the sort of trajectory I expected to see for India.)

The sliding year scale on OWID's chart shows how this drop has been consistent and worsening over the years. This picture hasn't changed much recently: the most recent 2024 World Happiness Report reports a 4.05 rating averaged over the 3-year window 2021-23, only slightly above the 2021 rating.

A -1.20 point drop is huge. For context, it's 10x(!) larger than the effect of doubling income at +0.12 LS points (Clarke et al 2018 p199, via HLI's report), and compares to major negative life events like widowhood and extended unemployment:

The effect of life events on life satisfaction

Given India's ~1.4 billion population, such a large drop is alarming: roughly ~5 billion LS-years lost since 2011, very roughly ballparking. For context, and keeping in mind that LS-years and DALYs aren't the same thing, the entire world's DALY burden is ~2.5 billion DALYs p.a.

But – again caveating with my lack of familiarity with the literature and extremely cursory look into this – I haven't seen any writeup look into this, which makes me wonder if it's not a 'real issue'? For instance, the 2021 WHR just says

Since 2006-08, world well-being has been static, but life expectancy increased by nearly four years up to 2017-19 (we shall come to 2020 later). The rate of progress differed a lot across regions. The biggest improvements in life expectancy were in the former Soviet Union, in Asia, and (the greatest) in Sub-Saharan Africa. And these were the regions that had the biggest increases in WELLBYs. In Asia, the exception is South Asia, where India has experienced a remarkable fall in Well-being which more than outweighs its improved life expectancy.

That's it: no elaboration, no footnotes, nothing.

So what am I missing? What's going on here?

A quick search turned up this WEF article (based on Ipsos data and research, not the WHR's Gallup World Poll, so take it with a grain of salt) pointing to

increased internet access -> pressure to portray airbrushed lives on social media & a feeling that 'their lives have become meaningless'
covid-19 mitigation-induced isolation curtailing activities that improve wellbeing (employment, socializing, going to school, exercising and accessing health services)
urban migration to seek work -> traffic congestion, noise and pollution, demanding bosses -> less sleep and exercise -> higher anxiety and worsening health

But I'm not sure these factors are differential (i.e. that they, for instance, happen much more in India than elsewhere s.t. it explains the wellbeing vs development trajectory difference over 2011-24)?

James Herbert @ 2024-03-26T10:03 (+10)

Interesting! I think figure 2.1 here provides a partial answer. According to the FAQ:

"the sub-bars show the estimated extent to which each of the six factors (levels of GDP, life expectancy, generosity, social support, freedom, and corruption) is estimated to contribute to making life evaluations higher in each country than in Dystopia. Dystopia is a hypothetical country with values equal to the world’s lowest national averages for each of the six factors (see FAQs: What is Dystopia?). The sub-bars have no impact on the total score reported for each country but are just a way of explaining the implications of the model estimated in Table 2.1. People often ask why some countries rank higher than others—the sub-bars (including the residuals, which show what is not explained) attempt to answer that question."

India seems to score very low on social support, compared to similarly ranked countries.

I did some googling and found this, which shows the sub-factors over time for India. Looks like social support declined a lot, but is now increasing again.

I haven't checked whether it declined more than in other countries and, if it has, I'm not sure why it has.

Mo Putera @ 2024-03-26T17:22 (+14)

Thank you for the pointer!

Your second link helped me refine my line of questioning / confusion. You're right that social support declined a lot, but the sum of the six key variables (GDP per capita, etc) still mostly trended upwards over time, huge covid dip aside, which is what I'd expect in the India development success story.

It's the dystopia residual that keeps dropping, from 2.275 - 1.83 = 0.445 in 2015 (i.e. Indians reported 0.445 points higher life satisfaction than you'd predict using the model) to 0.979 - 1.83 = -0.85, an absolute plummeting of life satisfaction across a sizeable fraction of the world population, that's for some reason not explained by the six key variables. Hm...

(please don't feel obliged to respond – I appreciate the link!)

mcnificat @ 2024-03-26T19:56 (+2)

Could this be related to the rising level of inequality in happiness levels in Asia? (See the graph on page 44 of the WHR2024). It can be assumed that the benefits of GDP growth are not evenly distributed, and increasing inequalities trigger frustration and a decrease in well-being in the majority of the population (since to a certain extent, the sense of welfare is relative).

This is how Our World in Data explains a similar phenomenon in the US: "Income inequality in the US is exceptionally high and has been on the rise in the last four decades, with incomes for the median household growing much more slowly than incomes for the top 10%. As a result, trends in aggregate life satisfaction should not be seen as paradoxical: the income and standard of living of the typical US citizen have not grown much in the last couple of decades."

Mo Putera @ 2024-03-27T04:48 (+1)

Yeah rising inequality is a good guess, thank you – the OWID chart also shows the US experiencing the same trajectory direction as India (declining average LS despite rising GDP per capita). I suppose one way to test this hypothesis is to see if China had inequality rise significantly as well in the 2011-23 period, since it had the expected LS-and-GDP-trending-up trajectory. Probably a weak test due to potential confounders...

Mo Putera @ 2025-07-15T04:59 (+15)

This passage from David Roodman's essay Appeal to Me: First Trial of a “Replication Opinion” resonated:

When we draw on research, we vet it in rare depth (as does GiveWell, from which we spun off). I have sometimes spent months replicating and reanalyzing a key study—checking for bugs in the computer code, thinking about how I would run the numbers differently and how I would interpret the results. This interface between research and practice might seem like a picture of harmony, since researchers want their work to guide decision-making for the public good and decision-makers like Open Philanthropy want to receive such guidance.
Yet I have come to see how cultural misunderstandings prevail at this interface. From my side, what the academy does and what I and most of the public think it does are not the same. There are two problems. First, about half the time I reanalyze a study, I find that there are important bugs in the code, or that adding more data makes the mathematical finding go away, or that there’s a compelling alternative explanation for the results. (Caveat: most of my experience is with non-randomized studies.) Second, when I send my critical findings to the journal that peer-reviewed and published the original research, the editors usually don’t seem interested (recent exception). Seeing the ivory tower as a bastion of truth-seeking, I used to be surprised. I understand now that, because of how the academy works, in particular, because of how the individuals within academia respond to incentives beyond their control, we consumers of research are sometimes more truth-seeking than the producers.

I had a similar realisation towards the end of my studies which was a key factor in persuading me to not pursue academia. Also I've mentioned this before, but it surprised me how much more these kinds of details mattered in my experience in industry.

Skipping over to his recap of the specific case he looked into:

To recap:
Two economists performed a quantitative analysis of a clever, novel question.
It underwent peer review.
It was published in one of the top journals in economics. Its data and computer code were posted online, per the journal’s policy.
Another researcher promptly responded that the analysis contains errors (such as computing average daytime temperature with respect to Greenwich time rather than local time), and that it could have been done on a much larger data set (for 1990 to ~2019 instead of 2000–04). These changes make the headline findings go away.
After behind-the-scenes back and forth among the disputants and editors, the journal published the comment and rejoinder.
These new articles confused even an expert.
An outsider (me) delved into the debate and found that it’s actually a pretty easy call.
If you score the journal on whether it successfully illuminated its readership as to the truth, then I think it is kind of 0 for 2. ...
That said, AEJ Applied did support dialogue between economists that eventually brought the truth out. In particular, by requiring public posting of data and code (an area where this journal and its siblings have been pioneers), it facilitated rapid scrutiny.
Still, it bears emphasizing: For quality assurance, the data sharing was much more valuable than the peer review. And, whether for lack of time or reluctance to take sides, the journal’s handling of the dispute obscured the truth.
My purpose in examining this example is not to call down a thunderbolt on anyone, from the Olympian heights of a funding body. It is rather to use a concrete story to illustrate the larger patterns I mentioned earlier. Despite having undergone peer review, many published studies in the social sciences and epidemiology do not withstand close scrutiny. When they are challenged, journal editors have a hard time managing the debate in a way that produces more light than heat.
I have critiqued papers about the impact of foreign aid, microcredit, foreign aid, deworming, malaria eradication, foreign aid, geomagnetic storm risk, incarceration, schooling, more schooling, broadband, foreign aid, malnutrition, …. Many of those critiques I have submitted to journals, usually only to receive polite rejections. I obviously lack objectivity. But it has struck me as strange that, in these instances, we on the outside of academia seem more concerned about getting to the truth than those on the inside.

Mo Putera @ 2025-03-29T10:05 (+15)

From Rachel Glennerster's old J-PAL blog post, a classic worth resharing: "charge for bednets or distribute them for free?"

In 2000 there was an intense argument about whether malarial insecticide-treated bednets (ITNs) should be given out for free. Some argued that charging for bednets would massively reduce take-up by the poor. Others argued that if people don’t pay for something, they don’t value it and are less likely use it. It was an evidence-free argument at the time.
Then, a series of studies in many countries testing many different preventative health products showed that even a small increase in price led to a sharp decline in product take-up. Pricing did not help target the product to those who needed it most, and people were not more likely to use a product if they paid for it. This cleared the way for a massive increase in free bednet distribution (Dupas 2011 and Kremer and Glennerster 2011).

There was a dramatic increase in malaria bednet coverage between 2000 and 2015 in sub-Saharan Africa. At the same time, there was a massive fall in the number of malarial cases. In Nature, Bhatt and colleagues estimate that the vast majority of the decline in malarial cases is due to the increase in ITNs. They estimate there were 450 million fewer cases of malaria due to ITNs and four million fewer deaths due to ITNs. The lesson here is that testing an important policy-relevant idea can have as much impact on peoples’ lives as testing a specific program.

Mo Putera @ 2025-05-06T03:46 (+13)

I appreciated reading these passages from Hannah Ritchie's Children in rich countries are much less likely to die than a few decades ago, but we rarely hear about this progress over at Our World in Data, for both the data and the shift in perspective:

Countries in the European Union, Japan, South Korea, the United Kingdom — the list goes on — have made childhood much safer in my own 30-year lifetime.¹ It’s just something we rarely hear about. I also don’t think that this is a “solved problem”; it is still too common for parents to see their children die, and there’s a lot more that we can do to save their lives. ...
It’s only when we look at the relative reduction in child mortality that we see that rich countries have also made impressive progress.

I think it’s important to highlight this point for two reasons.
First, the idea that progress on health has stalled (or even regressed) in rich countries is, I think, a common one. I’ve previously held that view myself. But it’s not true: improved treatments and vaccinations developed by scientists, dedicated care from doctors, midwives, and nurses, health policies developed by governments, and parents' choices have made things much safer for children even in the world’s richest countries. These efforts were not for nothing: they’ve given kids a future and spared many families the pain of losing a child.
Second, child mortality in rich countries is not a “solved problem”. 23,000 children still die in the United States every year. That’s around 50 times more than the number who die from natural disasters.³ And more than the total number of homicides.⁴ No one would say that murders in the US are a “solved problem”.

Mo Putera @ 2025-02-24T07:30 (+12)

This is a top-level comment compiling scattered notes on quantifying benefits / costs and related things, for my own benefit. Future notes will be comments under this one.

The purpose of this compilation is to essentially fact-post / perpetual-draft my way towards a more nuts-and-bolts understanding of the "what are the most effective interventions?" question, which is a different starting point than the usual "what are the most cost-effective?" one, a perspective shift primarily spurred by Justin Sandefur's case study on PEPFAR and a desire to better internalise what "big EA" might look like (as opposed to taking a scarcity mindset as given), albeit in a lazy undirected way: these notes are from references I encounter in the course of doing other work.

Past relevant notes:

Mo Putera @ 2025-02-24T07:47 (+2)

In their 2024 results report, the Global Fund partnership claims to have saved 65 million lives from dying of AIDS, TB and malaria alone, in addition to other benefits like morbidity reductions from those diseases, lower infant and maternal mortality and fewer deaths from acute trauma and other conditions.

The 65M lives saved figure caught my eye, since they have "only" disbursed $63 billion so far between 2002 and end of 2023, for an implied cost-effectiveness of ~$1,000 per life saved, about 5x the GiveWell bar. On the one hand, I doubt their assessment is as rigorous as GiveWell's; on the other hand, even a crude 90% discount (far more than any of the top charities are subject to, AFAICT on a quick skim) still yields $10k per life saved, surprisingly close to the GW bar despite (1) disbursing billions per year (2) across such a wide range of programs.

Mo Putera @ 2025-01-13T04:03 (+11)

Many heads are more utilitarian than one by Anita Keshmirian et al is an interesting paper I found via Gwern's site. Gwern's summary of the key points:

Collective consensual judgments made via group interactions were more utilitarian than individual judgments.
Group discussion did not change the individual judgments indicating a normative conformity effect.
Individuals consented to a group judgment that they did not necessarily buy into personally.
Collectives were less stressed than individuals after responding to moral dilemmas.
Interactions reduced aversive emotions (eg. stressed) associated with violation of moral norms.

Abstract:

Moral judgments have a very prominent social nature, and in everyday life, they are continually shaped by discussions with others. Psychological investigations of these judgments, however, have rarely addressed the impact of social interactions.
To examine the role of social interaction on moral judgments within small groups, we had groups of 4 to 5 participants judge moral dilemmas first individually and privately, then collectively and interactively, and finally individually a second time. We employed both real-life and sacrificial moral dilemmas in which the character’s action or inaction violated a moral principle to benefit the greatest number of people. Participants decided if these utilitarian decisions were morally acceptable or not.
In Experiment 1, we found that collective judgments in face-to-face interactions were more utilitarian than the statistical aggregate of their members compared to both first and second individual judgments. This observation supported the hypothesis that deliberation and consensus within a group transiently reduce the emotional burden of norm violation.
In Experiment 2, we tested this hypothesis more directly: measuring participants’ state anxiety in addition to their moral judgments before, during, and after online interactions, we found again that collectives were more utilitarian than those of individuals and that state anxiety level was reduced during and after social interaction.
The utilitarian boost in collective moral judgments is probably due to the reduction of stress in the social setting.

I wonder if this means that individual EAs might find EA principles more emotionally challenging than group-level surveys might suggest. It also seems a bit concerning that group judgments may naturally skew utilitarian simply by virtue of being groups, rather than through improved moral reasoning (and I say this as someone for whom utilitarianism is the largest "party" in my moral parliament).

Mo Putera @ 2023-09-26T09:56 (+9)

I'm curious what people who're more familiar with infinite ethics think of Manheim & Sandberg's What is the upper limit of value?, in particular where they discuss infinite ethics (emphasis mine):

Bostrom’s discussion of infinite ethics is premised on the moral relevance of physically inaccessible value. That is, it assumes that aggregative utilitarianism is over the full universe, rather than the accessible universe. This requires certain assumptions about the universe, as well as being premised on a variant of the incomparability argument that we dismissed above, but has an additional response which is possible, presaged earlier. Namely, we can argue that this does not pose a problem for ethical decision-making even using aggregative ethics, because the consequences of any ethical decision can have only a finite (difference in) value. This is because the value of a moral decision relates only to the impact of that decision. Anything outside of the influenced universe is not affected, and the arguments above show that the difference any decision makes is finite.

I first read their paper a few years ago and found their arguments for the finiteness of value persuasive, as well as their collectively-exhaustive responses in section 4 to possible objections. So ever since then I've been admittedly confused by claims that the problems of infinite ethics still warrant concern w.r.t. ethical decision-making (e.g. I don't really buy Joe Carlsmith's arguments for acknowledging that infinities matter in this context, same for Toby Ord's discussion in a recent 80K podcast). What am I missing?

David Mathers🔸 @ 2025-01-04T22:34 (+4)

I haven't read the paper, but a simple objection is that you're never going to be certain your actions only have finite effects, because you should only assign credence 0 to contradictions. (I don't actually know the argument for the latter, but some philosophers believe it.) So you have to deal with the very, very small but not literally 0 chance that your actions will have an infinitely good/bad outcome because your current theories of how the universe works are wrong. However, anything with a chance of bringing about an infinitely good or bad outcome has an infinite expected value or an undefined one. So unless all expected values are undefined (which brings it own problems) you have to deal with infinite expected values, which is enough to cause trouble.

Mo Putera @ 2025-01-06T11:38 (+5)

Manheim and Sandberg address your objection in the paper persuasively (to me personally), so let me quote them, since directly addressing these arguments might change my mind. @MichaelStJules I'd be keen to get your take on this as well. (I'm not quoting the footnotes, even though they were key to persuading me too.)

Section 4.1, "Rejecting Physics":

4.1.1 Pessimistic Meta-induction and expectations of falsification
The pessimistic meta-induction warns that since many past successful scientific theories were found to be false, we have no reason expect that our currently successful theories are approximately true. Hence, for example, the above constraints on information processing are not guaranteed to imply finitude. Indeed, many of them are based on information physics that is weakly understood and liable to be updated in new directions. If physics in our universe does, in fact, allow for access to infinite matter, energy, time, or computation through some as-yet-undiscovered loophole, it would undermine the central claim to finitude.
This criticism cannot be refuted, but there are two reasons to be at least somewhat skeptical. First, scientific progress is not typically revisionist, but rather aggregative. Even the scientific revolutions of Newton, then Einstein, did not eliminate gravity, but rather explained it further. While we should regard the scientific input to our argument as tentative, the fallibility argument merely shows that science will likely change. It does not show that it will change in the direction of allowing infinite storage. Second, past results in physics have increasingly found strict bounds on the range of physical phenomena rather than unbounding them. Classical mechanics allow for far more forms of dynamics than relativistic mechanics, and quantum mechanics strongly constrain what can be known and manipulated on small scales.
While all of these arguments in defense of physics are strong evidence that it is correct, it is reasonable to assign a very small but non-zero value to the possibility that the laws of physics allow for infinities. In that case, any claimed infinities based on a claim of incorrect physics can only provide conditional infinities. And those conditional infinities may be irrelevant to our decisionmaking, for various reasons.
4.1.2 Boltzmann Brains, Decisions, and the indefinite long-term
One specific possible consideration for an infinity is that after the heat-death of the universe there will be an indefinitely long period where Boltzmann brains can be created from random fluctuations. Such brains are isomorphic to thinking human brains, and in the infinite long-term, an infinite number of such brains might exist [ 34]. If such brains are morally relevant, this seems to provide a value infinity.
We argue that even if these brains have moral value, it is by construction impossible to affect their state, or the distribution of their states. This makes their value largely irrelevant to decision-making, with one caveat. That is, if a decision-maker believes that these brains have positive or negative moral value, it could influence decisions about whether decisions that could (or would intentionally) destroy space-time, for instance, by causing a false-vacuum collapse. Such an action would be a positive or negative decision, depending on whether the future value of a non-collapsed universe is otherwise positive or negative. Similar and related implications exist depending on whether a post-collapse universe itself has a positive or negative moral value.
Despite the caveat, however, a corresponding (and less limited) argument can be made about decisionmaking for other proposed infinities that cannot be affected. For example, inaccessible portions of the universe, beyond the reachable light-cone, cannot be causally influenced. As long as we maintain that we care about the causal impacts of decisions, they are irrelevant to decisionmaking.

Section 4.2.4 more directly addresses the objection I think. (Unfortunately the copy-pasting doesn't preserve the mathematical formatting, so perhaps it'd be clearer to just look at page 12 of their paper; in particular I've simplified their notation for $1 in 2020 to just $1):

4.2.4 Bounding Probabilities
As noted above, any act considered by a rational decision maker, whether consequentialist or otherwise, is about preferences over a necessarily finite number of possible decisions. This means that if we restrict a decision-maker or ethical system to finite, non-zero probabilities relating to finite value assigned to each end state, we end up with only finite achievable value. The question is whether probabilities can in fact be bounded in this way.
We imagine Robert, faced with a choice between getting $1 with certainty, and getting $100 billion with some probability. Given that there are two choices, Robert assigns utility in proportion to the value of the outcome weighted by the probability. If the probability is low enough, yet he chooses the option, it implies that the value must be correspondingly high.
As a first argument, imagine Robert rationally believes there is a probability of 10^−100 of receiving the second option, and despite the lower expected dollar value, chooses it. This implies that he values receiving $100 billion at approximately 10^100x the value of receiving $1. While this preference is strange, it is valid, and can be used to illustrate why Bayesians should not consider infinitesimal probabilities valid.
To show this, we ask what would be needed for Robert to be convinced this unlikely event occurred. Clearly, Robert would need evidence, and given the incredibly low prior probability, the evidence would need to be stupendously strong. If someone showed Robert that his bank balance was now $100 billion higher, that would provide some evidence for the claim—but on its own, a bank statement can be fabricated, or in error. This means the provided evidence is not nearly enough to convince him that the event occurred. In fact, with such a low prior probability, it seems plausible that Robert could have everyone he knows agree that it occurred, see newspaper articles about the fact, and so on, and given the low prior odds assigned, still not be convinced. Of course, in the case that the event happened, the likelihood of getting all of that evidence will be much higher, causing him to update towards thinking it occurred.
A repeatable experiment which generates uncorrelated evidence could provide far more evidence over time, but complete lack of correlation seems implausible; checking the bank account balance twice gives almost no more evidence than checking it once. And as discussed in the appendix, even granting the possibility of such evidence generation, the
amount possible is still bounded by available time, and therefore finite.
Practically, perhaps the combination of evidence reaches odds of 10^50:1 that the new money exists versus that it does not. Despite this, if he truly assigned the initially implausibly low probability, any feasible update would not be enough to make the event, receiving the larger sum, be a feasible contender for what Robert should conclude. Not only that, but we posit that a rational decision maker should know, beforehand, that he cannot ever conclude that the second case occurs.
If he is, in fact, a rational decision maker, it seems strange to the point of absurdity for him to to choose something he can never believe occurred, over the alternative of a certain small gain.
Generally, then, if an outcome is possible, at some point a rational observer must be able to be convinced, by aggregating evidence, that it occurred. Because evidence is a function of physical reality, the possible evidence is bounded, just as value itself is limited by physical constraints. We suggest (generously) that the strength of this evidence is limited to odds of the number of possible quantum states of the visible universe — a huge but finite value — to 1. If the prior probability assigned to an outcome is too low to allow for a decision maker to conclude it has occurred given any possible universe, no matter what improbable observations occur, we claim the assigned probability is not meaningful for decision making. As with the bound on lexicographic preferences, this bound allows for an immensely large assignment of value, even inconceivably so, but it is again still finite.

Mo Putera @ 2024-12-30T12:34 (+3)

Just noting for my own future edification this LW exchange between David Manheim (who argues that infinite ethics is irrelevant to actual decisions, per paper above) and Joe Carlsmith (who argues the opposite, per essay above), which only increased my conviction that Manheim and Anders Sandberg were right.

FWIW here's Claude Sonnet 3.5 attempting to first steelman Carlsmith's essay and then being neutral about which stance is more right:

Steelman of Carlsmith's key arguments:
Even if infinite value/disvalue isn't currently accessible, we should have non-zero credence in possibilities that would make it accessible (e.g. discovering our physics is wrong in specific ways). Given expected value theory, a non-zero credence in infinite value should dominate our calculations.
We face decisions with potentially infinite consequences even under current physics through acausal influence - if we accept certain decision theories and cosmological models suggesting infinite copies of us exist, our choices could influence an infinite number of copies.
Even if we can't access infinities in practice, ethics shouldn't fall silent when considering infinite cases. We have strong intuitions about choices like "infinite heaven vs infinite hell" that a complete ethical theory needs to address.
The mathematical/philosophical challenges infinite ethics raises reveal important limitations in our ethical frameworks that we should take seriously, even if just to better understand finite ethics.
Manheim/Sandberg's key counterarguments:
Under our best physics, the accessible universe is fundamentally finite in space, time, and computational capacity.
Any decision theory allowing infinite value requires rejecting either current physics, rational preferences, or established decision theory in problematic ways.
We can represent supposedly "infinite" values (like lexicographic preferences) with very large but finite numbers while preserving all decision-relevant distinctions.
Ethics should focus on actionable choices that make a difference to actual moral patients in our universe.
My analysis: While Carlsmith makes interesting theoretical points, Manheim/Sandberg seem more right for practical purposes. The key considerations:
Their physics arguments about fundamental finitude seem solid and would need extraordinary evidence to overturn.
Even if we assign some credence to infinite possibilities, we can't operationalize that credence in a way that helps real decision-making.
The practical ethical questions we face (e.g. existential risk, animal welfare, global poverty) don't require solving infinite ethics.
Time spent on infinite ethics likely trades off against work on pressing finite problems where we can make concrete progress.
So while infinite ethics raises fascinating theoretical questions, I lean towards Manheim/Sandberg's view that it's not practically relevant for "doing good better." That said, I appreciate Carlsmith's rigorous exploration of the theoretical challenges, which may yield insights for finite ethics.
The prudent path seems to be: Note the theoretical challenges infinite ethics raises, remain humble about our ethical frameworks, but focus our efforts on tractable finite problems where we can clearly help actual moral patients.

MichaelStJules @ 2024-12-30T14:20 (+4)

If you maximize expcted value, you should be taking expected values through small probabilities, including that we have the physics wrong or that things could go on forever (or without hard upper bound) temporally. Unless you can be 100% in no infinities, then your expected values will be infinite or undefined. And there are, I think, hypotheses that can't be ruled out and that could involve infinite affectable value.

In response to Carl Shulman on acausal influence, David Manheim said to renormalize. I'm sympathetic and would probably agree with doing something similar, but the devil is in the details. There may be no very uniquely principled way to do this, and some things can still break down, e.g. you get actions that are morally incomparable.

Mo Putera @ 2025-01-04T16:38 (+3)

And there are, I think, hypotheses that can't be ruled out and that could involve infinite affectable value.

This is my crux, I think. I have yet to find a single persuasive example of an ethical decision I might face for which incorporating infinite ethics considerations suggests a different course of action. I don't remember if Carlsmith's essay provided any such examples; if it did I likely did not find them persuasive, since I skimmed it with this focus in mind. I interpreted Manheim & Sandberg's paper to say that I likely wouldn't find any such examples if I kept looking.

MichaelStJules @ 2025-01-04T17:28 (+4)

You could want to do acausal trades and cooperate with agents causally disconnected from you. You'll expect that those who reason (sufficiently) similarly would do the same in return, and that you would cooperate would be evidence for them cooperating and make it more likely.

If you were difference-making risk averse locally, e.g. you don't care about making a huge difference with very very tiny probability, by taking acausal influence into account, you should be (possibly much) less difference-making risk averse, according to Wilkinson.

Mo Putera @ 2025-01-06T11:47 (+3)

I don't see why acausal trade makes infinite ethics decision-relevant for essentially the reasons Manheim & Sandberg discuss in Section 4.5 – acausal trade alone doesn't imply infinite value; footnote 41's "In mainstream cosmological theories, there is a single universe, and the extent can be large but finite even when considering the unreachable portion (e.g. in closed topologies). In that case, these alternative decision theories are useful for interaction with unreachable beings, or as ways to interact with powerful predictors, but still do not lead to infinities"; physical limits on information storage and computation would still apply to any acausal coordination.

I'll look into Wilkinson's paper, thanks.

MichaelStJules @ 2025-01-06T17:34 (+4)

They aren't asserting that the whole universe, including the unreachable portion, is finite in extent with certainty. They're just saying that it's possible, and they also note infinite is possible too in the sentence after which that footnote follows.

Even if you think a universe with infinite spatial extent is very unlikely, you should still be entertaining the possibility. If there's a chance it's infinite and you can have infinite impact (before renormalizing), a risk neutral expected value reasoner should wager on that.

FWIW, I'm sympathetic to their arguments in that section against expected value maximization, or that at least undermine the arguments for it. I'm not totally convinced of expected value maximization myself.

However, that doesn't give a positive case for ignoring these infinities. I find infinite acausal impacts not too unlikely, personally, because both that acausal influence is possible seems more likely than not and that the universe is infinite in spatial extent (and in the right way to be influenced infinitely acausally) seems not too unlikely.

But I am optimistic about renormalization.

Mo Putera @ 2023-10-07T06:40 (+3)

Sandberg's recent 80K podcast interview transcript has this quote:

Rob Wiblin: OK, so the argument is something like valuing is a process that requires information to be encoded, and information to be processed — and there are just maximum limits on how much information can be encoded and processed given a particular amount of mass and given a finite amount of mass and energy. So that ultimately is going to set the limit on how much valuing can be done physically in our universe. No matter what things we create, no matter what minds we generate, there’s going to be some finite limit there. That’s basically it?
Anders Sandberg: That’s it. In some sense, this is kind of trivial. I think some readers would no doubt feel almost cheated, because they wanted to know that metaphysical limit for value, and we can’t say anything about that. But it seems very likely that if value has to have to do with some entity that is doing the valuing, then there is always going to be this limit — especially since the universe is inconveniently organised in such a way that we can’t get hold of infinite computational power, as far as we know.

Vasco Grilo🔸 @ 2025-01-04T20:21 (+2)

Hi Mo,

I believe the effects of one's actions decay to practically 0 after at most around 100 years, so I do not think it matters whether the theoretically affectable universe is infinite or not. Even if it was, one could simply use limits to figure out which action is best.

Mo Putera @ 2025-01-06T11:17 (+3)

Thanks Vasco. While I agree with what I interpret to be your actionable takeaway (to ethically act as if our actions' consequences are finitely circumscribed in time and space), I don't see where your confidence comes from that the effects of one's actions decay to practically 0 after at most around 100 years, especially given that longtermists explicitly seek and focus on such actions. I'm guessing you have a writeup on the forum elaborating on your reasoning, in which case would you mind linking to it?

Vasco Grilo🔸 @ 2025-01-06T14:28 (+3)

My post Reducing the nearterm risk of human extinction is not astronomically cost-effective? is somewhat related, but it does not empirically analyse how fast effects decay over time. Uncertainty over time and Bayesian updating is the best analysis on this I am aware of. I have just updated the comment I had left there to explain my claim that effects decay to practically 0 after at most 100 years.

Mo Putera @ 2025-01-07T12:35 (+3)

Much appreciated, thanks again Vasco.

Mo Putera @ 2024-11-30T10:50 (+7)

Pretty funny CGD blog post by Victoria Fan and Rachel Bonnifield: If the Global Health Donors Were Your Parents: A (Whimsical) Comparative Perspective. Quoting at length (with some reformatting):

Navigating the global health funding landscape can be confusing even for global health veterans; there are scores of donors and multilateral funding mechanisms, each with its own particular structure, personality, and philosophy. For the uninitiated, PEPFAR, GAVI, PMI, WHO, the Global Fund, UNITAID, and the Gates Foundation can all appear obscure and intimidating. But if your head is spinning from acronym-induced vertigo, fear not! We are here to help you make sense of it all. How, you ask? With a clear method for donor identification: comparing the donors to your parents. So what would happen if the donors were your parents and you asked them for a new car?
PEPFAR: Ok, we’ll buy you a new car, but we’re going with you to the dealership and it must be American-made. At least one seat must be devoted to abstinence and the delay of sexual debut. Before you drive the car, you must promise not to support prostitution. Each quarter, you must report how many miles you’ve driven with how many passengers, with a target of 1000 passenger-miles per month.
President’s Malaria Initiative: We’ve made it very clear that we only support four proven, cost-effective interventions for child rearing: food, clothing, health care, and education. What, do you think money in the Malaria family just grows on trees? Just because HIV/AIDS has a shiny new car doesn’t mean we can afford it.
UNITAID: We’ve identified pediatric vehicles as a niche market which is currently underserved by the major transport providers. By buying cars for you and all our other children, we are helping to create a pediatric automotive market with new and superior transportation commodities. Prior to our innovative entry into the pediatric vehicle market, most of our potential beneficiaries were getting around using lower-quality forms of transportation, such as bicycles, buses, and walking.
GAVI: We will purchase and a deliver a car for you from a particular GAVI-approved dealership. However, you must co-finance the purchase with wages from your part-time job. Gas and insurance will require separate applications.
WHO: Sorry, we haven’t had a car budget in ten years. But we DO have a new set of guidelines on best practices for safe car driving, and a box full of old carfax vehicle reports that you’re welcome to look at any time. Please let us know right away if you experience any engine trouble; regular and reliable reporting allows us to maintain an up-to-date transmission failure surveillance system. And don’t forget to celebrate Vehicle Safety Day on May 11!
Gates Foundation: Of course, darling, we gave your boarding school plenty of money to buy a car. And since we’re on the Board, we’ll make sure they buy the right car. And you can drive it any time you want…as long as one of us is in the passenger seat to make sure you’re going the right way.
Global Fund: We’ve reviewed your proposal for a Range Rover and according to Consumer Reports it is a technically capable car for city driving. Here is a $70,000 check for you to go and buy the Range Rover, as discussed in your proposal.

Mo Putera @ 2023-12-03T11:42 (+7)

The following table is from Scott Alexander's post, which you should check out for the sources and (many, many) caveats.

This table can’t tell you what your ethical duties are. I'm concerned it will make some people feel like whatever they do is just a drop in the bucket - all you have to do is spend 11,000 hours without air conditioning, and you'll have saved the same amount of carbon an F-35 burns on one airstrike! But I think the most important thing it could convince you of is that if you were previously planning on letting yourself be miserable to save carbon, you should buy carbon offsets instead. Instead of boiling yourself alive all summer, spend between $0.04 and $2.50 an hour to offset your air conditioning use.

Mo Putera @ 2024-04-26T11:59 (+6)

This WHO press release was a good reminder of the power of immunization – a new study forthcoming publication in The Lancet reports that (liberally quoting / paraphrasing the release)

global immunization efforts have saved an estimated 154 million lives over the past 50 years, 146 million of them children under 5 and 101 million of them infants
for each life saved through immunization, an average of 66 years of full health were gained – with a total of 10.2 billion full health years gained over the five decades
measles vaccination accounted for 60% of the lives saved due to immunization, and will likely remain the top contributor in the future
vaccination against 14 diseases has directly contributed to reducing infant deaths by 40% globally, and by more than 50% in the African Region
- the 14 diseases: diphtheria, Haemophilus influenzae type B, hepatitis B, Japanese encephalitis, measles, meningitis A, pertussis, invasive pneumococcal disease, polio, rotavirus, rubella, tetanus, tuberculosis, and yellow fever
fewer than 5% of infants globally had access to routine immunization when the Expanded Programme on Immunization (EPI) was launched 50 years ago in 1974 by the World Health Assembly; today 84% of infants are protected with 3 doses of the vaccine against diphtheria, tetanus and pertussis (DTP) – the global marker for immunization coverage
there's still a lot to be done – for instance, 67 million children missed out on one or more vaccines during the pandemic years

Mo Putera @ 2024-05-12T05:02 (+2)

Great OWID charts for this:

Mo Putera @ 2024-01-31T04:11 (+6)

One of the more surprising things I learned from Karen Levy's 80K podcast interview on misaligned incentives in global development was how her experience directly contradicted a stereotype I had about for-profits vs nonprofits:

Karen Levy: When I did Y Combinator, I expected it to be a really competitive environment: here you are in the private sector and it’s all about competition. And I was blown away by the level of collaboration that existed in that community — and frankly, in comparison to the nonprofit world, which can be competitive. People compete for funding, and so very often we’re fighting over slices of the same pie. Whereas the Y Combinator model is like, “We’re making the pie bigger. It’s getting bigger for everybody.”

My assumption had been that the opposite was true.

Mo Putera @ 2023-12-11T09:25 (+6)

I like John Salter's post on schlep blindness in EA (inspired by Paul Graham's eponymous essay), whose key takeaway is

Pay close attention to ideas that repel others people for non-impact related reasons, but not you. If you can get obsessed about something important that most people find horribly boring, you're uniquely well placed to make a big impact.

Unfortunately it's bereft of concrete examples. The closest to a shortlist he shares is in this comment:

Horrible career moves e.g. investigating the corrupt practices of powerful EAs / Orgs
Boring to most people e.g. compiling lists and data
Low status outside EA e.g. welfare of animals nobody cares about (e.g. shrimp)
Low status within EA e.g. global mental health
Living in relatively low quality of living areas e.g. fieldwork in many African countries

(I disagree with some of these; e.g. the first bullet seems contradicted by the propensity for forum drama on adjacent topics, and as someone who likes compiling lists and data I don't actually see much low-hanging fruit for me to contribute here due to the work of e.g. Hamish)

I'd be keen to learn other examples. He does give this advice to brainstorm examples:

What work do you wish someone else would do?

although in my case it's not useful because I either just end up doing it (or trying, failing, and learning why), or discover that it's already been done better than I could (e.g. Rethink Priorities' new CCM).

That said, I still think the original takeaway is a useful reminder.

Mo Putera @ 2023-09-27T17:47 (+6)

I just learned about Tom Frieden via Vadim Albinsky's writeup Resolve to Save Lives Trans Fat Program for Founders Pledge. His impact in sheer lives saved is astounding, and I'm embarrassed I didn't know about him before:

The CEO of RTSL, Tom Frieden, likely prevented tens of millions of deaths by creating an international tobacco control initiative in a prior role that may have been much more cost effective than most of our top recommended charities. ...
We believe that by leveraging his influence with governments, and the relatively low cost of advocating for regulations to improve health, Tom Frieden has the potential to again save a vast number of lives at a low cost.

How many more? Albinsky estimates:

RTSL is aiming to save 94 million lives over 25 years by advocating for countries to implement policies to reduce non-communicable diseases. We believe the industrially-produced trans fat elimination program is the most cost-effective of their initiatives. ... Even after very conservative discounts to RTLS’s impact projections we estimate this program to be more cost effective than most of our top global health and development recommendations.

Tangentially, if a "Borlaug" is a billion lives saved, then Frieden's impact is probably on the scale of ~100 milliBorlaugs (to nearest OOM). Bill and Melinda likely have had similar impact. This makes me wonder who else I don't know about who's done ~100 milliBorlaugs of good.

(It's arguably unfair to wholly attribute all those lives saved to Frieden, and I am honestly unsure what credit attribution makes most sense, but applying the same logic to Borlaug you can no longer really say he saved a billion lives.)

Mo Putera @ 2025-04-24T08:19 (+5)

Thought these quotes from Holden's old (2011) GW blog posts were thought-provoking, unsure to what extent I agree. In In defense of the streetlight effect he argued that

If we focus evaluations on what can be evaluated well, is there a risk that we’ll also focus on executing programs that can be evaluated well? Yes and no.
Some programs may be so obviously beneficial that they are good investments even without high-quality evaluations available; in these cases we should execute such programs and not evaluate them.
But when it comes to programs that where evaluation seems both necessary and infeasible, I think it’s fair to simply de-emphasize these sorts of programs, even if they might be helpful and even if they address important problems. This reflects my basic attitude toward aid as “supplementing people’s efforts to address their own problems” rather than “taking responsibility for every problem clients face, whether or not such problems are tractable to outside donors.” I think there are some problems that outside donors can be very helpful on and others that they’re not well suited to helping on; thus, “helping with the most important problem” and “helping as much as possible” are not at all the same to me.

(I appreciate the bolded part, especially as something baked into GW's approach and top recs by $ moved.)

That last link is to The most important problem may not be the best charitable cause. Quote that caught my eye:

[Project AK-47's emotionally appealing pitch to donors is] an extreme example of a style of argument common to nonprofits: point to a problem so large and severe (and the world has many such problems) that donors immediately focus on that problem – feeling compelled to give to the organization working on addressing it – without giving equal attention to the proposed solution, how much it costs, and how likely it is to work. ...
Many of the donors we hear from are passionately committed to fighting global warming because it’s the “most pressing problem,” or to a particular disease because it affected them personally – even while freely admitting that they know nothing about the most promising potential solutions. I ask these donors to consider the experience related by William Easterly:
I am among the many who have tried hard to find the answer to the question of what the end of poverty requires of foreign aid. I realized only belatedly that I was asking the question backward … the right way around [is]: What can foreign aid do for poor people? (White Man’s Burden pg 11)
As a single human being, your powers are limited. As a donor, you’re even more limited – you’re not giving your talent or your creativity, just your money. This creates a fundamentally different challenge from identifying the problem you care most about, and can lead to a completely different answer.
In my case: I would rather close the achievement gap than fight developing-world disease, but my giving goes to the latter because it’s a problem that I can do much more to address.
The truth is that you may not be able to do anything to help address the root causes of poverty or cure cancer or solve the global energy crisis.* But you probably can save a life, and insisting on giving to the “biggest problem” could be passing up that chance.

Mo Putera @ 2025-03-17T05:56 (+5)

#9 of Santi Ruiz's 50 thoughts on DOGE over at Statecraft caught my eye:

Information silos are crazier than ever.
For example, I’ve been privy to two parallel, heated debates about foreign aid over the past half-decade. People who work in foreign development (especially effective altruists) have engaged in a battle about the efficacy of various forms of foreign aid: what works best, what works less well, what doesn’t work at all, and how we can know.
Meanwhile, right-wingers have spent much of the last decade (since the summer of 2020 in particular) documenting how deeply embedded left-wing NGOs are in many federal (and local) funding programs, and developing a critique of that tight federal/NGO linkage.
[A reader points out that some of the criticism of American foreign aid on the right is older, and comes from a broader critique about the liberal world order — I think that’s also true.]
But neither debate exhibits much awareness of the other at all, with very negative consequences. The DOGE team has axed the most effective and efficient programs at USAID, forced out the chief economist, who was brought in to oversee a more aggressive push toward efficiency. It does not appear to be interested in engaging with what we know about more or less effective humanitarian aid.
And people in the NGO class were completely blindsided by the animosity the Trump administration had toward them and the speed at which many of their contracts would be torn up.

This makes me wonder whether there's something like a "targeted de-siloing intervention" to prevent risks like the above. Asking LLMs yielded a variety of ideas, none all that insightful or promising.

My own not-that-useful observation is that this is yet another instance of the observation John Nerst made in Partial Derivatives and Partial Narratives that narratives can be both correct and seem totally contradictory, the way partial derivatives of a higher-order function (representing reality, too complicated for human minds to grok) can be both correct and seem nothing alike, so that asking "what's the real story here?" is a bit like asking "what's the real derivative of f(x,y,z) = 4x^2 y – y^z?".

Quoting Nerst's square-and-circle example for more intuition-building:

Imagine that the world was just a set of dots like this picture:

What happens then? Say Alice is told (or, because of psychological predilections, personal experiences or self interest, is more likely to internalize) that the world is a square (left picture), while Bob is told it’s a circle (right picture).
Alice and Bob now have differing ideas about which dots are the important ones, which are expressions of something fundamental (signal) and which are just isolated incidents (noise). They will be interested in and eager to talk about the dots that make up their preferred shape.
When Alice talks about any dot in the square she’s actually taking about the square and other dots are Beside The Point. When Bob talks about a dot that makes up the circle, he’s just ranting about some insignificant dot. If Alice is feeling uncharitable she might think Bob’s just talking about irrelevant dots because he doesn’t want to talk about the square. Bob thinks that Alice taking a great interest in dots in the square and dismisses dots in the circle is hypocritical.
Note that they don’t have to disagree on which dots exist or where they are. Savage political fights can happen without any factual disagreement or fundamental value difference.
There are of course more examples than capitalism. Like nature vs. nurture. “People’s behavior are the result of socialization that works to perpetuate power structures” and “people’s behavior are the result of biological impulses and instincts” are both partial truths. But the full truth is not “in the middle” but on another plane entirely.
“History is determined by the actions of individuals” vs. “history is determined by large scale economic and technological forces.”
“Art subverts the audience’s unexamined preconceptions” vs. “art is the creation of transcendent beauty.”
“Sex is about satisfying basic, impersonal appetites” vs. “sex is an act of intimacy and an expression of love.”
“Ethics is about maximizing happiness” vs. “ethics is about doing your duty”.
“Science works by accumulating knowledge about the world, asymptotically approaching perfect correctness” vs. “science works by replacing one paradigm with another in a series of revolutions.”
“Moral rules and norms are symmetrical, exactly the same for everyone” vs. “Moral rules are there to protect the weak, favoring them over the strong.”

I suspect Nerst is sadly right that "savage political fights can happen without any factual disagreement or fundamental value difference", which makes me bearish on the effectiveness of potential interventions involving better awareness-raising of correct factual information for preventing or mitigating risks like the dismantling of USAID, but I'd love to be proven wrong here.

Mo Putera @ 2023-11-27T12:09 (+5)

Some notes from trying out Rethink Priorities' new cross-cause cost-effectiveness model (CCM) from their post, for personal reference:

Cost-effectiveness in DALYs per $1k (90% CI) / % of simulation results with positive outcomes - negative outcomes - no effects / alternative weightings of cost-eff under different risk aversion profiles and weighting schemes in weighted DALYs per $1k, min to max values

GHD:
- US govt GHD: 1 (range: 0.85 - 1.22) / 100% positive / risk 1 - 1
- Cash: 1.7 (range 1.1 - 2.5) / 100% positive / risk 1 - 2
- GW bar: 21 (range: 11 - 42) / 100% positive / risk 16 - 21 (OP bar has ~similar figures)
- Good intervention (per OP & GW): 39 (range: 15 - 67) / 100% positive / risk 31 - 39
AW - generic interventions:
- Black soldier fly: 5.6 (range: 95% below 11.4) / 16% positive, 84% no effect / risk 0 - 6
- Shrimp: 7.8 (range: 95% below 8.0) / 19% positive, 81% no effect / risk 0 - 8
- Carp: 36 (range: 95% below 145) / 31% positive, 69% no effect / risk 2 - 36
- Chicken: 719 (range: 95% below 2,100) / 81% positive, 19% no effect / risk 221 - 717
x-risk:
- Portfolio of biorisk projects ($15-30M budget, 60% chance no effect, 70% effect is positive): 132 (middle 99.9% of expected utility is 0) / >99.9% no effect / risk 0 - 132
- Nanotech safety megaproject ($10-30M budget, 90% chance no effect, 70% effect is positive): 73 (middle 99.9% of EU is 0) / >99.9% no effect / risk -10 - 73
- AI misalignment megaproject ($8-28B budget, 97.3% chance no effect, 70% effect is positive): 154 (middle 99.9% of EU is 27, 99% is 0) / >99.6% no effect / risk -56 - 154
Some things that jumped out at me (caveating that I don't work in any of these areas):
- I'm a little surprised that only chicken campaigns are modeled as clearly higher EV (OOM-wise) than GHD interventions considered good by GW & OP's lights, while interventions for other nonhuman animals fall short
- I'm also surprised that chickens > all other nonhuman animals on both EV and p(+ve simulation outcome). There's some discussion that seems to indicate that cage-free work seems to be much lower EV now than previously, although I'm not sure if it changes the takeaway (and in any case funding prioritization shouldn't be purely EV-based)
- I'm surprised yet again that a >$10B AI misalignment megaproject is modeled as having no effect in >99.6% of simuls. I probably hadn't internalized the 'hits' in 'hits-based giving' as well as I should, since my earlier gut intuition (based on no data whatsoever) was that a near-Manhattan-scale megaproject would surely have some effect in >10% of possible worlds
- I didn't expect the model to say chickens > misaligned AI, unsafe nanotech and biorisk from a risk-neutral EV perspective. That said, the x-risk inputs are in some sense just placeholders, so I don't put much weight in this

In any case, I'd be curious to see how the CCM is taken into consideration by funders and other stakeholders going forward.

Mo Putera @ 2025-12-26T10:44 (+4)

From the article The Environmentalists Making Forest Fires Worse I learned about Denise Boggs, who seems to be a case study in what very effective altruistic execution looks like when it isn't grounded in evidence. First, adverse impact:

From 2010 to 2024, California endured more than 8000 wildfires a year that burned on average just under 1.1 million acres of land. In total, over that period, more than 16 million acres burned in California—about 16 percent of the total landmass of the state. These fires burned through endangered species’ habitats, killing millions of wild animals, and threatening endangerment of species like the long-toed salamander and dozens more.
The smoke produced by California’s fires also costs lives. Between 2008 and 2018, PM2.5 pollution from wildfires was responsible for between 52,480 and 55,710 premature deaths in California alone. But the smoke from California’s fires does not stop at the state border. In 2020, about 28,000 premature deaths were attributable to wildfire smoke across the United States, with the majority occurring in western States.

The cause:

To combat wildfires before they happen, the U.S. Forest Service (USFS), county conservation boards, and other stakeholders implement fuels reduction projects that can reduce excess dry wood and shrubs, and clear smaller vegetation that allows fires to grow faster and reach into the canopy of forests. Fuels reduction approaches like mechanical thinning and prescribed burns have proven to be effective mitigation strategies to reduce the damage from wildfires on ecosystems and to help firefighters stop fires.
Yet, a small but loud environmentalist minority opposes fuels reduction, instead claiming that California’s forests must be left untouched. They use outdated environmental laws like the National Environmental Policy Act (NEPA), Endangered Species Act, Federal Land Policy and Management Act, and National Forest Management Act in courts to delay, and sometimes cancel, projects that would mitigate the wildfires that destroy the ecosystems they claim to protect, and threaten tens of thousands of lives.
During the period that about a sixth of California’s forests were going up in flames, one single group was busy suing the USFS 24 times. That group, Conservation Congress, was responsible for just under two fifths of the USFS’s NEPA-related lawsuits that were decided in federal circuit or appeals courts in California from 2010 to 2024, and spent $2 million on those lawsuits and 5 more in other western States.

Conservation Congress isn't just one group, it's one person, Denise Boggs:

What’s most remarkable about Conservation Congress is not their ability to single-handedly hamstring dozens of USFS projects, but that they are, in fact, single-handed: the organization effectively is just one person: Denise Boggs of Great Falls, Montana.
A long-time forest activist and veteran of the California “timber wars,” Boggs has taken the USFS to the mat on countless occasions, often coming up the loser. But she is determined. Boggs believes that the USFS, in bed with logging companies, is using fuels reduction programs and other fire management to create “loopholes big enough to drive logging trucks through.” It is Bogg’s mission to close those loopholes and save the northern spotted owl.

There are other such special interest groups, although they're much larger; the article names the Center for Biological Diversity (100+ staff) and the Sierra Club (700+!). In the case of Boggs, her mind-blowing cost-effectiveness, if not sign/direction, seems to be a testament to sheer tenacity + blinding focus + a theory of change that leverages US environmental law (NEPA litigation in particular) letting small groups delay or stop big federal projects with lots of local support (structural pendulum overswing from Jane Jacobs vs Bob Moses?):

The Center for Biological Diversity and the Sierra Club are national non-profits that advocate for and act on behalf of a specific ideological framework that places the abstract entity of “the environment” over all else.
While the Sierra Club has a much longer history—the organization was founded in 1892 by legendary environmentalist and conservationist John Muir—the rest of these non-profits are relatively new projects. CBD was founded in the 1990s by a group of northern spotted owl biologists who sought to protect the species at all costs. Conservation Congress, Alliance for the Wild Rockies, and Native Ecosystems Council are all post-turn-of-the-21st-century organizations founded by activists who grew up—ideologically speaking—during the environmental protests of the late 20th century.
With the exception of the Sierra Club, which has grown beyond just conservation and preservation, these groups are single-issue groups—protect endangered species, no matter their niche, or lack thereof, and ignore everything else.

(In different circles, there is another name for this kind of thing.)

I really like the outdoors, some of my most cherished youthful memories were the multiday hikes across the very SoCal wilderness Boggs' Conservation Congress aims to preserve. But man, this ain't it. 52,480 to 55,710 premature deaths between 2008-18 due to PM2.5 pollution from wildfires in California alone and 28,000 premature deaths in 2020 alone(!) attributable to wildfire smoke across the United States and 16%(!) of California's entire landmass ravaged by wildfires killing millions of wild animals and causing hundreds of billions of dollars in damage ain't it. Gargantuan impact and cost-effectiveness but negative sign.

Evidence grounding matters a lot to ensure positive impact sign, and sometimes naively counterintuitive interventions like "regular targeted forest-thinning and burning are better than leaving forests untouched" are actually correct, and sometimes the evidence says that interventions need to change with the times (especially policy prescriptions) so you shouldn't get wedded to them let alone build ideologies and tie identities to them. Evidence grounding may not be enough however: your frame/prior for interpreting the evidence matters a lot too. The article argues that Boggs is trapped in a bad prior:

Ultimately, groups like Conservation Congress don’t really claim that fire prevention through mechanical thinning and prescribed burns—or other fuels management practices—are not effective. Boggs simply argues that any and all U.S. Forest Service projects are secretly logging projects. Fire management, to these groups, is a cover-up for a corrupt federal agency in bed with timber companies. To save old-growth forests, they must stop all projects all the time, and forests must go untouched.

Mo Putera @ 2025-07-28T12:44 (+4)

Nice table from the paper Epic narratives of the Green Revolution in Brazil, China, and India by Lídia Cabral, Poonam Pandey, and Xiuli Xu (2022):

Mo Putera @ 2025-03-24T09:15 (+4)

Martin Gould's Five insights from farm animal economics over at Open Phil's FAW newsletter points out that (quote) "blocking local factory farms can mean animals are farmed in worse conditions elsewhere":

Consider the UK: Local groups celebrate blocking new chicken farms. But because UK chicken demand keeps growing — it rose 24% from 2012-2022 — the result of fewer new UK chicken farms is just that the UK imports more chicken: it almost doubled its chicken imports over the same time period. While most chicken imported into the UK comes from the EU, where conditions for chickens are similar, a growing share comes from Brazil and Thailand, where regulations are nonexistent. Blocking local farms may slightly reduce demand via higher prices, but it also risks sentencing animals to worse conditions abroad.
The same problem haunts government welfare reforms — stronger standards in one country can just shift production to places with worse standards.

This reminded me of what Will MacAskill wrote in Doing Good Better on anti-sweatshop protests being potentially misguided because the alternative for sweatshop workers is worse (long quote):

... those who protest sweatshops by refusing to buy goods produced in them are making the mistake of failing to consider what would happen otherwise. In developing countries, sweatshop jobs are the good jobs. The alternatives are typically worse, such as backbreaking, low-paid farm labor, scavenging, or unemployment.
A clear indicator that sweatshops provide comparatively good jobs is the great demand for them among people in developing countries. Almost all workers in sweatshops chose to work there, and some go to great lengths to do so. In the early 2000s, nearly four million people from Laos, Cambodia, & Burma immigrated to Thailand to take sweatshop jobs, and many Bolivians risk deportation by illegally entering Brazil in order to work in the sweatshops there. The average earnings of a sweatshop worker in Brazil are $2,000/year — not very much, but $600/year more than the average earnings in Bolivia, where people generally work in agriculture or mining. Similarly, the average earnings among sweatshop workers are: $2/day in Bangladesh, $5.50/day in Cambodia, $7/day in Haiti, and $8/day in India. These wages are tiny, but when compared to the $1.25 a day many citizens of these countries live in, the demand for these jobs seem more understandable.
It’s difficult for us to imagine that people would risk deportation just to work in sweatshops. But that’s because the extremity of global poverty is almost unimaginable.
Among economists, there’s no question that sweatshops benefit those in poor countries and that they are ‘tremendous good news for the world’s poor.’ One said, ‘My concern is not that there are too many sweatshops but that there are too few.’ Low-wage, labor-intensive manufacturing is a stepping-stone that helps an economy based around cash crops develop into an industrialized, rich country. During the Industrial Revolution, for example, Europe and America spent more than 100 years using sweatshop labor, emerging with much higher living standards as a result. It took many decades to pass through this stage because the tech to industralize was new, and the 20th century has seen countries pass through this stage of development much more rapidly because the tech is already in place. The four East Asian ‘Tiger economies’ — Hong Kong, Singapore, South Korea, and Taiwan — exemplify speedy development, having evolved from very poor, agrarian societies in the early 20th century to manufacturing-oriented sweatshop countries mid-century, and finally emerging as industrialized economic powerhouses in recent decades. Because sweatshops are good for poor countries, if we boycott them, we make people in poor countries worse off.
We should certainly feel outrage and horror at the conditions sweatshop laborers toll under. The correct response, however, is not to give sweatshop-produced goods in favor of domestically produced goods. The correct response is to try to end the extreme poverty that makes sweatshops desirable places to work in the first place. What about buying products from companies that employ people in poor countries but claim to have higher labor standards, like People Tree, Indigenous, and Kuyichi? By doing this, we would avoid the use of sweatshops, while at the same time providing even better job opportunities for the extreme poor.

This made me wonder about 2 things:

Zooming out: if you buy that the two examples above form a natural category (of "noble intentions misguided by poor reasoning about counterfactuals / second-order effects", say), what other examples are there of such altruistic mistakes that we might be making?
Zooming in: what kind of intervention is analogous to "buy from People Tree" in the FAW context? Is this a promising avenue at all?

I know very little about FAW, but I'd guess the answer to #2 is "not promising" mainly because it isn't what advocates do. Instead, and again quoting from Gould's writeup, they do this:

... advocates are getting smarter about this. They're pushing for laws that tackle both production and imports at once. US states like California have done this — when it banned battery cages, it also banned selling eggs from hens caged anywhere. The EU is considering the same approach. It's a crucial shift: without these import restrictions, both farm bans and welfare reforms risk exporting animal suffering to places with even worse conditions. And advocates have prioritized corporate policies, which avoid this problem, as companies pledge to stop selling products associated with the worst animal suffering (like caged eggs), regardless of where they are produced.

Joseph @ 2025-03-24T22:04 (+5)

Zooming out, regarding other examples of altruistic mistakes that we might be making, I think there are a lot of scenarios in which banning something or making something less appealing in one locations is intended to reduce the bad thing, but actually just ends up shifting the thing elsewhere, where there are even fewer regulations.

One critique of the United States's drug policy is that it doesn't halt the production or trade of dangerous drugs, but simply pushes it elsewhere (the balloon effect).
When a jurisdiction bans chicken farmers from using small cages, (such as California's Proposition 2 from 2008) then it might just shift production elsewhere.
Regarding immigration from Mexico to the USA, Bill Clinton implemented Operation Gatekeeper to discourage illegal immigration into the USA near Tijuana. But it actually just caused immigrants to shift from from crossing the border in one place to crossing in a different place. It also may have increased the number of illegal immigrants in the USA, because previously people came and left cyclically, but with stricter border control people instead came and stayed. While we could certainly argue that this isn't altruistic, the general idea of taking action to reduce/halt a behavior actually resulting in that behavior continuing elsewhere applies.
More mundane: a parent doesn't want their child to engage in a particular behavior (smoking cigarettes, having sex, drinking alcohol, etc.), the child will then do it away from the home in a more dangerous context. My vague impression is that teenagers with parents who ban sexual activity tend to have less access to contraception and worse health outcomes (although I haven't read the research on this).
A little bit different, but a classic example of this kind of "poor reasoning about second order effects" is the cobra effect (or any similar incentive for extermination)
Welfare traps

Mo Putera @ 2025-03-25T04:03 (+2)

Thank you, this is exactly the kind of list of examples I was looking for.

Mo Putera @ 2024-12-12T04:55 (+4)

I like Austin Vernon's idea for scaling CO2 direct air capture to 40 billion tons per year, i.e. matching our current annual CO2 emissions, using (extreme versions of) well-understood industrial processes.

The proposed solution may not be the cheapest out there. Other ideas like ocean seeding or olivine weathering might be less expensive. But most of the science is understood, and it can scale quickly. I'd guess 100,000 workers could build enough sites to capture our 40 billion tons goal in a decade. The capital expenditure rate would be between $1 trillion and $5 trillion yearly, or 1% to 5% of global GDP. That cost and deployment speed take doomer scenarios off the table. Say something scary like melting permafrost threatens runaway warming. You can target the area with a few years of sulfur cooling while a tiny portion of the global economy builds carbon capture devices. It is nothing like a wartime mobilization.
The most disruptive aspect would be energy usage. We'd need to ramp output up at double-digit rates because each ton of CO2 requires 2-3 MWh of energy for removal. Thankfully low-grade heat is easy to come by. There is enough energy near coal mines in Wyoming or natural gas fields in SW Pennsylvania at less than $5/MWh. Other places might use solar, hydro, or geothermal steam if they lack fossil fuel reserves. The key is to put the facilities at the energy sources instead of trying to move the energy. Cheap energy makes the operating costs <1% of global GDP. Many clean energy proponents have fretted about how to keep fossil fuel reserves in the ground. Burning them to run carbon capture equipment kills two birds with one stone!
The takeaway is that we could completely turn around the carbon dioxide problem within a few years with a similar spending rate as rich world COVID relief. There won't be a scenario where we've waited too long to act.

I am admittedly perhaps biased to want moonshots like Vernon's idea to work, and for society at large to be able to coordinate and act on the required scale, after seeing these depressing charts from Assessing the costs of historical inaction on climate change:

Mo Putera @ 2024-06-25T10:37 (+4)

Epistemic status: public attempt at self-deconfusion & not just stopping at knee-jerk skepticism

The recently published Cost-effectiveness of interventions for HIV/AIDS, malaria, syphilis, and tuberculosis in 128 countries: a meta-regression analysis (so recent it's listed as being published next month), in my understanding, aims to fill country-specific gaps in CEAs for all interventions in all countries for HIV/AIDS, malaria, syphilis, and tuberculosis, to help national decision-makers allocate resources effectively – to a first approximation I think of it as "like the DCP3 but at country granularity and for Global Fund-focused programs". They do this by predicting ICERs, IQRs, and 95% UIs in US$/DALY using the meta-regression parameters obtained from analysing ICERs published for these interventions (more here).

AFAICT their methodology and execution seem superb, so I was keen to see their results:

Antenatal syphilis screening ranks as the lowest median ICER in 81 (63%) of 128 countries, with median ICERs ranging from $3 (IQR 2–4) per DALY averted in Equatorial Guinea to $3473 (2244–5222) in Ukraine.

At risk of being overly skeptical: $3 per DALY averted is >30x better than Open Phil's 1,000x bar of $100 per DALY which is roughly around GW top charity level which OP have said are hard to beat, especially for a direct intervention like antenatal syphilis screening. It makes me wonder how much credence to put in the study's findings for actual resource allocation decisions (esp. Figure 4 ranking top interventions at country granularity). Also:

Specifically re: antenatal syphilis screening, CE/AIM's report on screening + treating antenatal syphilis estimates $81 per DALY; I'm hard-pressed to believe that removing treatment improves cost-eff >1 OOM
I'm reminded of the time GW found 5 separate spreadsheet errors in a DCP2 estimate of soil-transmitted-helminth (STH) treatment that together misleadingly 'improved' its cost-effectiveness ~100-fold from $326.43 per DALY (correct output) to just $3.41 (wrong, and coincidentally in the ballpark of the estimate above that triggered my skepticism)

So how should I think about and use their findings given what seems like reasonable grounds for skepticism, if I'm primarily interested in helping decision-makers help people better? Scattered thoughts to defend the study / push back on my nitpicking above:

even if imperfect – and I'm not confident in my skepticism above – they clearly improve substantially upon the previous state of affairs (CEA gaps everywhere at country-disease-intervention level granularity; expert opinion not lending itself to country-specific predictions; case-by-case methods often being unsuccessful)
their recommendations seem reasonably hedged, not naively maximalist: they include 95% uncertainty intervals; they clearly say "cost-effectiveness... should not be the only criterion... [consider also] enhancing equity and providing financial risk protection"
even a naively maximalist recommendation ("first fund lowest-ICER intervention, then 2nd-lowest, ... until funds run out") doesn't seem unreasonable in this context – essentially countries would end up funding more antenatal syphilis screening, intermittent preventive treatment of malaria in pregnant women and infants, and chemotherapy for drug-susceptible TB (just from eyeballing Figure 4)
I interpret what they're trying to do as not so much "here are the ICER league tables, use them", but shifting decision-makers' approach to resource allocation from needing a single threshold for all healthcare funding decisions to (quoting them) "ICERs ranked in country-specific league tables", and in the long run this perspective shift seems useful to "bake into" decision-making processes, even if the specific figures in this specific study aren't necessarily the most accurate and shouldn't be taken at face value

That said, I do wonder if the authors could have done a bit better, like

cautioning against naively taking the best cost-eff estimates at face value, instead of suggesting "Funds could be first spent on the intervention that has the lowest ICER. Following that, other interventions could be funded in order of their ICER rankings, as long as there are available funds"
spot-checking some of (not all) the top cost-eff ICERs that went into their meta-regression analysis to get a sense of their credibility, especially those which feed into their main recommendations, like GW did above with the DCP2 estimate for STH treatment
extracting qualitative proxies for decision-maker guidance from an analysis of the main drivers behind the substantial ranking differences in intervention ICERs across economic and epidemiological contexts (eg "we should expect antenatal syphilis screening to be substantially less cost-effective in our context due to factors XYZ, let's look at other interventions instead" – what would a short useful list of XYZ look like?), instead of just saying "we found the rankings differ substantially"

Jason @ 2024-06-25T14:31 (+2)

The positive spin is that someone got funded to do this kind of big-picture analysis and got it published in The Lancet.

There were 1,792 potential country-intervention pairs (although it is not immediately clear if they did all 1,792 pairs). So I don't think most reasonable readers would view these findings as substitutes for a more in-depth, country-specific analysis on the potentially promising intervention. They did publish at least some data for each intervention, although maybe it isn't enough to poke at each of the country-intervention pairs.

Mo Putera @ 2024-03-14T03:55 (+4)

[Question] How should we think about the decision relevance of models estimating p(doom)?

(Epistemic status: confused & dissatisfied by what I've seen published, but haven't spent more than a few hours looking. Question motivated by Open Philanthropy's AI Worldviews Contest; this comment thread asking how OP updated reminded me of my dissatisfaction. I've asked this before on LW but got no response; curious to retry, hence repost)

To illustrate what I mean, switching from p(doom) to timelines:

The recent post AGI Timelines in Governance: Different Strategies for Different Timeframes was useful to me in pushing back against Miles Brundage's argument that "timeline discourse might be overrated", by showing how choice of actions (in particular in the AI governance context) really does depend on whether we think that AGI will be developed in ~5-10 years or after that.
A separate takeaway of mine is that decision-relevant estimation "granularity" need not be that fine-grained, and in fact is not relevant beyond simply "before or after ~2030" (again in the AI governance context).
Finally, that post was useful to me in simply concretely specifying which actions are influenced by timelines estimates.

Question: Is there something like this for p(doom) estimates? More specifically, following the above points as pushback against the strawman(?) that "p(doom) discourse, including rigorous modeling of it, is overrated":

What concrete high-level actions do most alignment researchers agree are influenced by p(doom) estimates, and would benefit from more rigorous modeling (vs just best guesses, even by top researchers e.g. Paul Christiano's views)?
What's the right level of granularity for estimating p(doom) from a decision-relevant perspective? Is it just a single bit ("below or above some threshold X%") like estimating timelines for AI governance strategy, or OOM (e.g. 0.1% vs 1% vs 10% vs >50%), or something else?
- I suppose the easy answer is "the granularity depends on who's deciding, what decisions need making, in what contexts", but I'm in the dark as to concrete examples of those parameters (granularity i.e. thresholds, contexts, key actors, decisions)
- e.g. reading Joe Carlsmith's personal update from ~5% to >10% I'm unsure if this changes his recommendations at all, or even his conclusion – he writes that "my main point here, though, isn't the specific numbers... [but rather that] here is a disturbingly substantive risk that we (or our children) live to see humanity as a whole permanently and involuntarily disempowered by AI systems we’ve lost control over", which would've been true for both 5% and 10%

Or is this whole line of questioning simply misguided or irrelevant?

Some writings I've seen gesturing in this direction:

harsimony's argument that Precise P(doom) isn't very important for prioritization or strategy ("identifying exactly where P(doom) lies in the 1%-99% range doesn't change priorities much") amounts to the 'single bit granularity' answer
- Carl Shulman disagrees, but his comment (while answering my 1st bullet point) isn't clear in the way the different AI gov strategies for different timelines post is, so I'm still left in the dark – to (simplistically) illustrate with a randomly-chosen example from his reply and making up numbers, I'm looking for statements like "p(doom) < 2% implies we should race for AGI with less concern about catastrophic unintended AI action, p(doom) > 10% implies we definitely shouldn't, and p(doom) between 2-10% implies reserving this option for last-ditch attempts", which he doesn't provide
Froolow's attempted dissolution of AI risk (which takes Joe Carlsmith's model and adds parameter uncertainty – inspired by Sandberg et al's Dissolving the Fermi paradox – to argue that low-risk worlds are more likely than non-systematised intuition alone would suggest)
- Froolow's modeling is useful to me for making concrete recommendations for funders, e.g. (1) "prepare at least 2 strategies for the possibility that we live in one of a high-risk or low-risk world instead of preparing for a middling-ish risk", (2) "devote significantly more resources to identifying whether we live in a high-risk or low-risk world", (3) "reallocate resources away from macro-level questions like 'What is the overall risk of AI catastrophe?' towards AI risk microdynamics like 'What is the probability that humanity could stop an AI with access to nontrivial resources from taking over the world?'", (4) "When funding outreach / explanations of AI Risk, it seems likely it would be more convincing to focus on why this step would be hard than to focus on e.g. the probability that AI will be invented this century (which mostly Non-Experts don’t disagree with)". I haven't really seen any other p(doom) model do this, which I find confusing
I'm encouraged by the long-term vision of the MTAIR project "to convert our hypothesis map into a quantitative model that can be used to calculate decision-relevant probability estimates", so I suppose another easy answer to my question is just "wait for MTAIR", but I'm wondering if there's a more useful answer to the "current SOTA" than this. To illustrate, here's (a notional version of) how MTAIR can help with decision analysis, cribbed from that introduction post:

This question was mainly motivated by my attempt to figure out what to make of people's widely-varying p(doom) estimates, e.g. in the appendix section of Apart Research's website, beyond simply "there is no consensus on p(doom)". I suppose one can argue that rigorous p(doom) modeling helps reduce disagreement on intuition-driven estimates by clarifying cruxes or deconfusing concepts, thereby improving confidence and coordination on what to do, but in practice I'm unsure if this is the case (reading e.g. the public discussion around the p(doom) modeling by Carlsmith, Froolow, etc), so I'm not sure I buy this argument, hence my asking for concrete examples.

Mo Putera @ 2024-08-28T08:55 (+3)

(Attention conservation notice: rambling in public)

A striking throwaway remark, given its context:

There is remarkably little evidence that evidence-based medicine leads to better health outcomes for patients, though this is absence of (good) evidence rather than (good) evidence of absence of effect.

It's striking given that this comes from this book on Thailand’s Health Intervention and Technology Assessment Program (HITAP) (ch 1 pg 22), albeit perhaps understandable given the authors' stance that evidence is necessary but not sufficient to determine the best course of action (to treat a patient, to design a social insurance scheme, etc), which seems completely unobjectionable.

That said, I did wonder about the first half of the quoted throwaway remark, so I asked Elicit; its top-4 paper summary is

Evidence-based medicine (EBM) has been shown to improve patient outcomes and healthcare efficiency. A study in a Spanish hospital found that an EBP unit had lower mortality rates (6.27% vs 7.75%) and shorter lengths of stay (6.01 vs 8.46 days) compared to standard practice (Emparanza et al., 2015). EBM can reduce clinical uncertainty, leading to better patient outcomes, improved population health, and reduced costs (Molony & Samuels, 2012). The implementation of EBM is expected to enhance the quality of care as part of healthcare reform initiatives (Hughes, 2011). Additionally, EBM has paralleled the growth of patient empowerment, supporting informed decision-making by integrating the best available research with individual patient values and concerns (Hendler, 2004). While challenges remain in translating EBM principles for public consumption, its adoption has the potential to significantly improve healthcare delivery and patient outcomes.

although the summary didn't include these papers it listed in the top 10

Bahtsevani et al 2004's systematic review (weak evidence of limited findings)
Every-Palmer & Howick 2014's paper with these dramatic sentences in their abstract:
- "In this paper we suggest that EBM's potential for improving patients' health care has been thwarted by bias in the choice of hypotheses tested, manipulation of study design and selective publication."
- "Evidence for these flaws is clearest in industry-funded studies. We argue EBM's indiscriminate acceptance of industry-generated 'evidence' is akin to letting politicians count their own votes. Given that most intervention studies are industry funded, this is a serious problem for the overall evidence base. Clinical decisions based on such evidence are likely to be misinformed, with patients given less effective, harmful or more expensive treatments."
- "More investment in independent research is urgently required. Independent bodies, informed democratically, need to set research priorities. We also propose that evidence rating schemes are formally modified so research with conflict of interest bias is explicitly downgraded in value."
Shaw et al 2007's dramatically-titled Why Evidence Based Medicine May Be Bad for You and Your Patients ("This review argues that the basis of EBM is so deeply flawed that in many cases it cannot usefully inform clinical practice, reflected in fact by the current majority outcome of most trials as “no-blood,” or no result")

With the proviso that I'm a layperson w.r.t. medicine and healthcare, and that I didn't ask Elicit further questions or really dig further into this at all — I find myself mostly unmoved by these papers & reviews, while the younger me of (say) a decade ago would've epistemically panicked. Partly it's that they aren't really contra "using evidence to inform medicine" per se: to oversimplify a bit, Bahtsevani et al recommend more evidence generation, Every-Palmer & Howick recommend less industry-biased evidence generation, and Shaw et al argue that other less legible-than-RCT types of evidence should occupy more mindshare than they did back in '07 (there's a loose parallel here to the more recent growth vs randomista debate in dev econ). Partly it's that I suspect there's some talking past each other, which only becomes clear when one digs into the nuts-and-bolts. Partly it's that I think the general underlying ethos of "using evidence to inform medicine" is a lot more robust than any particular instantiation of it (e.g. using only empirical data from systematic reviews of RCTs), sort of like how cluster thinking > sequence thinking for decision-making, or like how foxes have weak views strongly held (side note: in that essay's framing I used to be a hedgehog, hopefully I'm now more fox than degenerate cactus). Partly it's that I've "seen this before" with other topics, cf. Scott Alexander's many deep dives. Maybe I'm just getting old...

NickLaing @ 2024-08-28T13:42 (+2)

I haven't looked in detail, but my quick comment would be that these studies seem to basically be comparing extreme careful following of evidence based medicine, vs. "normal medical practise" which is like 90%+ based on evidence anyway. Standard medical training and registered medical practise in most of the world closely follows the evidence - it would be very difficullt (maybe impossible) to practise "outside" of the evidence. So not finding a huge difference between these 2 ways of practising isn't so surprising.

Mo Putera @ 2024-03-31T08:35 (+3)

The 1,000-ton rule is Richard Parncutt's suggestion for reframing the political message of the severity of global warming in particularly vivid human rights terms; it says that someone in the next century or two is prematurely killed every time humanity burns 1,000 tons of carbon.

I came across this paper while (in the spirit of Nuno's suggestion) trying to figure out the 'moral cost of climate change' so to speak, driven by my annoyance that e.g. climate charity BOTECs reported $ per ton of CO2-eq averted in contrast to (say) the $ per death averted bottomline of GHW charities, since I don't intrinsically care to avert CO2-equivalent emissions the way I do about averting deaths. (To be clear, I understand why the BOTECs do so and would do the same for work; this is for my own moral clarity.)

Parncutt's derivation is simple: burning a trillion tons of carbon will cause ~2 °C of anthropogenic global warming, which will in turn cause 1 - 10 million premature deaths a year "for a period of several centuries", something like this:

www.frontiersin.org Modelling the rise in global mean surface temperature (GMST) as a function of carbon burned is already very hard; Parncutt doesn't try to model premature deaths as a function of GMST but just makes a semi-quantitative order-of-magnitude estimation anchored extensively at the lower and upper ends to various catastrophic outcomes discussed in the literature on climate change, and assumes a lognormal distribution around a billion future deaths with a 10x range for worst-vs-best case scenario, which over time looks 'very approximately' like this:

The lower line represents deaths due to poverty without AGW. As the negative effect of AGW overtakes the positive effect of development, the death rate will increase, as shown by the upper line. In a more accurate model, the upper line might be concave upward on the left (exponential increase) and concave downward on the right (approaching a peak).

Based on the 1,000-ton rule, Pearce & Parncutt suggest the 'millilife' as "an accessible unit of measure for carbon footprints that is easy to understand and may be used to set energy policy to help accelerate carbon emissions reductions". A millilife is a measure of intrinsic value defined to be 1/1000th of a human life; the 1,000-ton rule says that burning a ton of fossil carbon destroys a millilife. This lets Pearce & Parncutt make statements like these, at an individual level (all emphasis mine):

For example in Canada, which has some of the highest yearly carbon emissions per capita in the world at around 19 tons of CO2 or 5 tons of carbon per person, roughly 5 millilives are sacrificed by an average person each year. As the average Canadian lives to be about 80, he/she sacrifices about 400 millilives (0.4 human lives) in the course of his/her lifetime, in exchange for a carbon-intensive lifestyle

and

... an average future AGW-victim in a developing country will lose half of a lifetime or 30–40 life-years, as most victims will be either very young or very old. If the average climate victim loses 35 life-years (or 13,000 life-days), a millilife corresponds to 13 days.
Stated in another way: if a person is responsible for burning a ton of fossil carbon by flying to another continent and back, they effectively steal 13 days from the life of a future poor person living in the developing world. If the traveler takes 1000 such trips, they are responsible for the death of a future person.

and for "large-scale energy decisions":

... the Adani Carmichael coalmine in Queensland, Australia, is currently under construction and producing coal since 2021. Despite massive protests over several years, it will be the biggest coalmine ever. Its reserves are up to 4 billion tons of coal, or 3 billion tons of carbon. If all of that was burned, the 1000-tonne rule says it would cause the premature deaths of 3 million future people. Given that the 1000-tonne rule is only an order-of-magnitude estimate, the number of caused deaths will lie between one million and 10 million. ... Many of those who will die are already living as children in the Global South; burning Carmichael coal will cause their future deaths with a high probability. Should energy policy allow that to occur?

Pearce & Parncutt then use the 1,000-ton rule and millilife to make various suggestions. Here's one:

Under what circumstances might a government ban or outlaw an entire corporation or industry, considered a legal entity or person—for example, the entire global coal industry? ...
Ideally, a company should not cause any human deaths at all. If it does, those deaths should be justifiable in terms of improvements to the quality of life of others. For example, a company that builds a bridge might reasonably risk a future collapse that would kill 100 people with a probability of 1%. In that case, the company accepts that on average one future person will be killed as a result of the construction of the bridge. It may be reasonable to claim that the improved quality of life for thousands or millions of people who cross the bridge justifies the human cost.
Fossil fuel industries are causing far more future deaths than that, raising the question of the point at which the law should intervene. As a first step to solving this problem, it has been proposed a rather high threshold (generous toward the corporations) is appropriate. A company does not have the right to exist if its net impact on human life (e.g., a company/industry might make products that save lives like medicine but do kill a small fraction of users) is such that it kills more people than it employs. This requirement for a company’s existence is thus:
Number of future premature deaths/year < Number of full-time employees (1)
This criterion can be applied to an entire industry. If the industry kills more people than it employs, then primary rights (life) are being sacrificed for secondary rights (jobs or profits) and the net benefit to humankind is negative. If an industry is not able to satisfy Equation (1), it should be closed down by the government.
... the coal industry kills people by polluting the air that they breathe. ... In the U.S., about 52,000 human lives are sacrificed per year to provide coal-fired electricity. ... In the U.S., coal employed 51,795 people in 2016. Since the number of people killed is greater than the number employed, the U.S. coal industry does not satisfy Equation (1) and should be closed down. This conservative conclusion does not include future deaths caused by climate change due to burning coal.

One more energy policy suggestion (there's many more in the paper):

Applying asset forfeiture laws (also referred to as asset seizure) to manslaughter caused by AGW. These laws enable the confiscation of assets by the U.S. government as a type of criminal-justice financial obligation that applies to the proceeds of crime. Essentially, if criminals profit from the results of unlawful activity, the profits (assets) are confiscated by the authorities.
This is not only a law in the U.S. but is in place throughout the world. For example, in Canada, Part XII.2 of the Criminal Code, provides a national forfeiture régime for property arising from the commission of indictable offenses. Similarly, ‘Son of Sam laws’ could also apply to carbon emissions. In the U.S., Son of Sam laws refer to laws designed to keep criminals from profiting from the notoriety of their crimes and often authorize the state to seize funds earned by the criminals to be used to compensate the criminal’s victims.
If that logic of asset forfeiture is applied to fossil fuel company investors who profit from carbon-emission-related manslaughter, taxes could be set on fossil fuel profits, dividends, and capital gains at 100% and the resultant tax revenue could be used for energy efficiency and renewable energy projects or to help shield the poor from the most severe impacts of AGW. ...
Such AGW-focused asset forfeiture laws would also apply to fossil fuel company executive compensation packages. Energy policy research has shown that it is possible to align energy executive compensation with careful calibration of incentive equations such that the harmful effects of emissions can be prevented through incentive pay. Executives who were compensated without these safeguards in place would have their incomes seized the same as other criminals benefiting materially from manslaughter.

I have no (defensible) opinion on these suggestions; curious to know what anyone thinks.

Mo Putera @ 2023-12-22T08:41 (+3)

Notes from Ozy Brennan's On capabilitarianism

Martha Nussbaum's first-draft list of central capabilities (for humans)
- Life
- Bodily health
- Bodily integrity
- Senses, Imagination, and Thought
- Emotions
- Practical reason
- Affiliation
- Other species
- Play
- Control over political & material environment
the Five Freedoms (for animals)
- Freedom from hunger and thirst
- Freedom from discomfort
- Freedom from pain, injury, and disease
- Freedom to express normal behavior
- Freedom from fear and distress

Mo Putera @ 2024-01-30T11:40 (+2)

I thought I had mostly internalized the heavy-tailed worldview from a life-guiding perspective, but reading Ben Kuhn's searching for outliers made me realize I hadn't. So here are some summarized reminders for posterity:

Key idea: lots of important things in life generated by multiplicative processes resulting in heavy-tailed distributions – jobs, employees / colleagues, ideas, romantic relationships, success in business / investing / philanthropy, how useful it is to try new activities
Decision relevance to living better, i.e. what Ben thinks I should do differently:
- Getting lots of samples improves outcomes a lot, so draw as many samples as possible
- Trust the process and push through the demotivation of super-high failure rates (instead of taking them as evidence that the process is bad)
- But don't just trust any process; it must have 2 parts: (1) a good way to tell if a candidate is an outlier ("maybe amazing" below) (2) a good way to draw samples
- Optimize less, draw samples more (for a certain type of person)
- Filter for "maybe amazing", not "probably good", as they have different traits
- Filter for "ruling in" candidates, not "ruling out" (e.g. in dating)
- Cultivate an abundance mindset to help reject more candidates early on (to find 99.9th percentile not just 90th)
- Think ahead about what outliers look like, to avoid accidentally rejecting 99.9th percentile candidates out of miscalibration, by asking others based on their experience
My reservations with Ben's advice, despite thinking they're mostly sound and idea-generating:
- "Stick with the process through super-high failure rates instead of taking them as evidence that the process is bad" feels uncomfortably close to protecting a belief from falsification
- Filtering for "maybe amazing", not "probably good" makes me uncomfortable because I'm not risk-neutral (e.g. in RP's CCM I'm probably closest to "difference-making risk-weighted expected utility = low to moderate risk aversion", which for instance assesses RP's default AI risk misalignment megaproject as resulting in, not averting, 300+ DALYs per $1k)
- Unlike Ben, I'm a relatively young person in a middle-income country, and the abundance mindset feels privileged (i.e. not as much runway to try and fail)
So maybe a precursor / enabling activity for the "sample more" approach above is "more runway-building": money, leisure time, free attention & health, proximity to opportunities(?)

Mo Putera @ 2024-05-21T04:00 (+1)

From Richard Y Chappell's post Theory-Driven Applied Ethics, answering "what is there for the applied ethicist to do, that could be philosophically interesting?", emphasis mine:

A better option may be to appeal to mid-level principles likely to be shared by a wide range of moral theories. Indeed, I think much of the best work in applied ethics can be understood along these lines. The mid-level principles may be supported by vivid thought experiments (e.g. Thomson’s violinist, or Singer’s pond), but these hypothetical scenarios are taken to be practically illuminating precisely because they support mid-level principles (supporting bodily autonomy, or duties of beneficence) that we can then apply generally, including to real-life cases.
The feasibility of this principled approach to applied ethics creates an opening for a valuable (non-trivial) form of theory-driven applied ethics. Indeed, I think Singer’s famous argument is a perfect example of this. For while Singer in no way assumes utilitarianism in his famous argument for duties of beneficence, I don’t think it’s a coincidence that the originator of this argument was a utilitarian. Different moral theories shape our moral perspectives in ways that make different factors more or less salient to us. (Beneficence is much more central to utilitarianism, even if other theories ought to be on board with it too.)
So one fruitful way to do theory-driven applied ethics is to think about what important moral insights tend to be overlooked by conventional morality. That was basically my approach to pandemic ethics: to those who think along broadly utilitarian lines, it’s predictable that people are going to be way too reluctant to approve superficially “risky” actions (like variolation or challenge trials) even when inaction would be riskier. And when these interventions are entirely voluntary—and the alternative of exposure to greater status quo risks is not—you can construct powerful theory-neutral arguments in their favour. These arguments don’t need to assume utilitarianism. Still, it’s not a coincidence that a utilitarian would notice the problem and come up with such arguments.
Another form of theory-driven applied ethics is to just do normative ethics directed at confused applied ethicists. For example, it’s commonplace for people to object that medical resource allocation that seeks to maximize quality-adjusted life years (QALYs) is “objectionably discriminatory” against the elderly and disabled, as a matter of principle. But, as I argue in my paper, Against 'Saving Lives': Equal Concern and Differential Impact, this objection is deeply confused. There is nothing “objectionably discriminatory” about preferring to bestow 50 extra life-years to one person over a mere 5 life-years to another. The former is a vastly greater benefit, and if we are to count everyone equally, we should always prefer greater benefits over lesser ones. It’s in fact the opposing view, which treats all life-saving interventions as equal, which fails to give equal weight to the interests of those who have so much more at stake.

Two asides:

This seems broadly correct (at least for someone who shares my biases); e.g. even in pure math John von Neumann warned:

As a mathematical discipline travels far from its empirical source, or still more, if it is a second and third generation only indirectly inspired by ideas coming from "reality" it is beset with very grave dangers. It becomes more and more purely aestheticizing, more and more purely l'art pour l'art. This need not be bad, if the field is surrounded by correlated subjects, which still have closer empirical connections, or if the discipline is under the influence of men with an exceptionally well-developed taste. But there is a grave danger that the subject will develop along the line of least resistance, that the stream, so far from its source, will separate into a multitude of insignificant branches, and that the discipline will become a disorganized mass of details and complexities. In other words, at a great distance from its empirical source, or after much "abstract" inbreeding, a mathematical subject is in danger of degeneration. ... In any event, whenever this stage is reached, the only remedy seems to me to be the rejuvenating return to the source: the re-injection of more or less directly empirical ideas.

This makes me wonder if it would be fruitful to look at & somehow incorporate mid-level principles into decision-relevant cost-effectiveness analyses that attempt to incorporate moral uncertainty, e.g. HLI's app or Rethink's CCM. (This is not at all a fleshed-out thought, to be clear)

Mo Putera @ 2024-05-13T10:34 (+1)

The following is a collection of long quotes from Ozy Brennan's post On John Woolman (which I stumbled upon via Aaron Gertler) that spoke to me. Woolman was clearly what David Chapman would call mission-oriented with respect to meaning of and purpose in life; Chapman argues instead for what he calls "enjoyable usefulness", which is I think healthier in ~every way ... it just doesn't resonate. All bolded text is my own emphasis, not Ozy's.

As a child, Woolman experienced a moment of moral awakening: ... [anecdote]
This anecdote epitomizes the two driving forces of John Woolman’s personality: deep compassion and the refusal to ever cut himself a moment of slack. You might say “it was just a bird”; you might say “come on, Woolman, what were you? Ten?” Woolman never thought like that. It was wrong to kill; he had killed; that was all there was to say about it.
When Woolman was a teenager, the general feeling among Quakers was that they were soft, self-indulgent, not like the strong and courageous Quakers of previous generations, unlikely to run off to Massachusetts to preach the Word if the Puritans decided once again to torture Quakers for their beliefs, etc. Woolman interpreted this literally. He spent his teenage years being like “I am depraved, I am evil, I have not once provoked anyone into whipping me to death, I don’t even want to be whipped to death.”
As a teenager, Woolman fell in with a bad crowd and committed some sins. What kind of sins? I don’t know. Sins. He's not telling us:
“I hastened toward destruction,” he writes. “While I meditate on the gulf toward which I travelled … I weep; mine eye runneth down with water.”
In actuality, Woolman’s corrupting friends were all... Quakers who happened to be somewhat less strict than he was. We have his friends' diaries and none of them remarked on any particular sins committed in this period. Biographers have speculated that Woolman was part of a book group and perhaps the great sin he was reproaching himself for was reading nonreligious books. He may also have been reproaching himself for swimming, skating, riding in sleighs, or drinking tea.
Woolman is so batshit about his teenage wrongdoing that many readers have speculated about the existence of different, non-Quaker friends who were doing all the sins. However, we have no historical evidence of him having other friends, and we have a fuckton of historical evidence of Woolman being extremely hard on himself about minor failings (or “failings”).
Most people who are Like That as teenagers grow out of it. Woolman didn’t. He once said something dumb in Weekly Meeting¹ and then spent three weeks in a severe depression about it. He never listened to nonreligious music, read fiction or newspapers, or went to plays. He once stormed down to a tavern to tell the tavern owner that celebrating Christmas was sinful.

... if Woolman were just an 18th century neurotic, no one would remember him. We care about him because of his attitude about slavery.
When Woolman was 21, his employer asked him to write a bill of sale for an enslaved woman. Woolman knew it was wrong. But his employer told him to and he was scared of being fired. Both Woolman’s employer and the purchaser were Quakers themselves, so surely if they were okay with it it was okay. Woolman told both his master and the purchaser that he thought that Christians shouldn't own enslaved people, but he wrote the bill.
After he wrote the bill of sale Woolman lost his inner peace and never really recovered it. He spent the rest of his life struggling with guilt and self-hatred. He saw himself as selfish and morally deficient. ...
Woolman worked enough to support himself, but the primary project of his life was ending slavery. He wrote pamphlet after pamphlet making the case that slavery was morally wrong and unbiblical. He traveled across America making speeches to Quaker Meetings urging them to oppose slavery. He talked individually with slaveowners, both Quaker and not, which many people criticized him for; it was “singular”, and singular was not okay. ...
It is difficult to overstate how much John Woolman hated doing anti-slavery activism. For the last decade of his life, in which he did most of his anti-slavery activities, he was clearly severely depressed. ... Partially, he hated the process of traveling: the harshness of life on the road; being away from his family; the risk of bringing home smallpox, which terrified him.
But mostly it was the task being asked of Woolman that filled him with grief. Woolman was naturally "gentle, self-deprecating, and humble in his address", but he felt called to harshly condemn slaveowning Quakers. All he wanted was to be able to have friendly conversations with people who were nice to him. But instead, he felt, God had called him to be an Old Testament prophet, thundering about God’s judgment and the need for repentance. ...
Woolman craved approval from other Quakers. But even Quakers personally opposed to slavery often thought that Woolman was making too big a deal about it. There were other important issues. Woolman should chill. His singleminded focus on ending slavery was singular, and being singular was prideful. Isn’t the real sin how different Woolman’s abolitionism made him from everyone else?
Sometimes he persuaded individual people to free their slaves, but successes were few and far between. Mostly, he gave speeches and wrote pamphlets as eloquently as he could, and then his audience went “huh, food for thought” and went home and beat the people they’d enslaved. Nothing he did had any discernible effect.
... Woolman spent much of his time feeling like a failure. If he were better, if he followed God’s will more closely, if he were kinder and more persuasive and more self-sacrificing, then maybe someone would have lived free who now would die a slave, because Woolman wasn’t good enough.

The modern version of this is probably what Thomas Kwa wrote about here:

I think that many people new to EA have heard that multipliers like these exist, but don't really internalize that all of these multipliers stack multiplicatively. ... If she misses one of these multipliers, say the last one, ... Ana is losing out on 90% of her potential impact, consigning literally millions of chickens to an existence worse than death. To get more than 50% of her maximum possible impact, Ana must hit every single multiplier. This is one way that reality is unforgiving.

From one perspective, Woolman was too hard on himself about his relatively tangential connection to slavery. From another perspective, he is one of a tiny number of people in the eighteenth century who has a remotely reasonable response to causing a person to be in bondage when they could have been free. Everyone else flinched away from the scale of the suffering they caused; Woolman looked at it straight. Everyone else thought of slaves as property; Woolman alone understood they were people.
Some people’s high moral standards might result in unproductive self-flagellation and the refusal to take actions because they might do something wrong. But Woolman derived strength and determination from his high moral standards. When he failed, he regretted his actions and did his best to change them. At night he might beg God to fucking call someone else, but the next morning he picked up his walking stick and kept going.
And the thing he was doing mattered. Quaker abolitionism wasn’t inevitable; it was the result of hard work by specific people, of whom Woolman was one of the most prominent. If Woolman were less hard on himself, many hundreds if not thousands of free people would instead have been owned things that could beaten or raped or murdered with as little consequence as I experience from breaking a laptop.

An aside (doubling as warning) on mission orientation, quoting Tanner Greer's Questing for Transcendence:

... out of the lands I’ve lived and roles I’ve have donned, none blaze in my memory like the two years I spent as a missionary for the Church of Jesus Christ. It is a shame that few who review my resume ask about that time; more interesting experiences were packed into those few mission years than in the rest of the lot combined. ... I doubt I shall ever experience anything like it again. I cannot value its worth. I learned more of humanity’s crooked timbers in the two years I lived as missionary than in all the years before and all the years since.
Attempting to communicate what missionary life is like to those who have not experienced it themselves is difficult. ... Yet there is one segment of society that seems to get it. In the years since my service, I have been surprised to find that the one group of people who consistently understands my experience are soldiers. In many ways a Mormon missionary is asked to live something like a soldier... [they] spend years doing a job which is not so much a job as it is an all-encompassing way of life.
The last point is the one most salient to this essay. It is part of the reason both many ex-missionaries (known as “RMs” or “Return Missionaries” in Mormon lingo) and many veterans have such trouble adapting to life when they return to their homes. ... Many RMs report a sense of loss and aimlessness upon returning to “the real world.” They suddenly find themselves in a society that is disgustingly self-centered, a world where there is nothing to sacrifice or plan for except one’s own advancement. For the past two years there was a purpose behind everything they did, a purpose whose scope far transcended their individual concerns. They had given everything—“heart, might, mind and strength“—to this work, and now they are expected to go back to racking up rewards points on their credit card? How could they?
The soldier understands this question. He understands how strange and wonderful life can be when every decision is imbued with terrible meaning. Things which have no particular valence in the civilian sphere are a matter of life or death for the soldier. Mundane aspects of mundane jobs (say, those of the former vehicle mechanic) take on special meaning. A direct line can be drawn between everything he does—laying out a sandbag, turning off a light, operating a radio—and the ability of his team to accomplish their mission. Choice of food, training, and exercise before combat can make the difference between the life and death of a soldier’s comrades in combat. For good or for ill, it is through small decisions like these that great things come to pass.
In this sense the life of the soldier is not really his own. His decisions ripple. His mistakes multiply. The mission demands strict attention to things that are of no consequence in normal life. So much depends on him, yet so little is for him.
This sounds like a burden. In some ways it is. But in other ways it is a gift. Now, and for as long as he is part of the force, even his smallest actions have a significance he could never otherwise hope for. He does not live a normal life. He lives with power and purpose—that rare power and purpose given only to those whose lives are not their own.
... It is an exhilarating way to live.
This sort of life is not restricted to soldiers and missionaries. Terrorists obviously experience a similar sort of commitment. So do dissidents, revolutionaries, reformers, abolitionists, and so forth. What matters here is conviction and cause. If the cause is great enough, and the need for service so pressing, then many of the other things—obedience, discipline, exhaustion, consecration, hierarchy, and separation from ordinary life—soon follow. It is no accident that great transformations in history are sprung from groups of people living in just this way. Humanity is both at its most heroic and its most horrifying when questing for transcendence.

Mo Putera @ 2024-03-22T03:49 (+1)

Michael Dickens' 2016 post Evaluation Frameworks (or: When Importance / Neglectedness / Tractability Doesn't Apply) makes the following point I think is useful to keep in mind as a corrective:

INT has its uses, but I believe many people over-apply it.
Generally speaking (with some exceptions), people don’t choose between causes, they choose between interventions. That is, they don’t prioritize broad focus areas like global poverty or immigration reform. Instead, they choose to support specific interventions such as distributing deworming treatments or lobbying to pass an immigration bill. The INT framework doesn’t apply to interventions as well as it does to causes. In short, cause areas correspond to problems, and interventions correspond to solutions; INT assesses problems, not solutions.

(aside: Michael Plant makes the same point in chapters 5 & 6 of his PhD thesis as per Edo Arad's post, using it as a starting point to develop a systematic cause prio approach he called 'cause mapping')

In most cases, we can try to directly assess the true marginal impact of investing in an intervention. These assessments will never be perfectly accurate, but they generally seem to tell us more than INT does. ...
How can we estimate an intervention’s impact more directly? To develop a better framework, let’s start with the final result we want and work backward to see how to get it.

Dickens' post has more; the framework they end up with is this:

which (somewhat less practically, they note) could be fine-grained further:

I also appreciated that Dickens actually used this framework to guide their giving decision (more details in their post).

Mo Putera @ 2024-01-26T10:02 (+1)

Just came across Max Dalton's 2014 writeup Estimating the cost-effectiveness of research into neglected diseases, part of Owen Cotton-Barratt's project on estimating cost-effectiveness of research and similar activities. Some things that stood out to me:

High-level takeaways
- ~100x 95% CI range (mostly from estimates of total current funding to date, and difficulty of continuing with research), so figures below can't really argue for change in priorities so much as compel further research
  - This uncertainty is a lower bound, including only statistical uncertainty and not model uncertainty
- Differing returns to research are largely driven by disease burden size, so look at diarrheal diseases, malaria, hookworm, ascariasis, trichuriasis, lymphatic filariasis, meningitis, typhoid, and salmonella – i.e. nothing too surprising
Estimated figures:
- 13.9 DALYs/$1k for the sector as a whole (vs ~20 DALYs/$1k for GWWC top charities back in 2014), 95% CI 1.43-130 DALYs/$1k
- Median estimates: diarrheal disease e.g. cholera and dysentry 121 DALYs/$1k, salmonella infections 74 DALYs/$1k, worms ~50 DALYs/$1k, leprosy 0.058 DALYs/$1k
- Most of the top diseases have ~100x 95% CI range, except salmonella whose range is ~3,000x(!)
References
- Sources & calculations for estimates above
- G-FINDER survey of research funding for neglected diseases
- Cotton-Barratt's essay deriving the simple equation for calculating the estimates above

Mo Putera @ 2023-12-20T15:19 (+1)

List of charities providing humanitarian assistance in the Israel-Hamas war mentioned in response to this request, for posterity and ease of reference:

Physicians for Human Rights Israel (2022 Impact Report, response to the current crises)
Al Mezan Centre for Human Rights
Palestinian Centre for Human Rights
Charity Navigator's list has 18 charities, of which only Global Empowerment Mission Inc. is rated on 'impact & results'