Thoughts on the "Meta Trap"

By Rohin Shah @ 2016-12-20T21:36 (+10)

Cross-posted to my blog. Thanks to Ajeya Cotra and Jeff Kaufman for feedback on a draft of this post. Any remaining errors are my own. Comments on this post may be copied to my blog.

Last year, Peter Hurford wrote a post titled ‘EA risks falling into a "meta trap". But we can avoid it.’ Ben Todd wrote a followup that clarified a few points. I have a few more meta traps to add to the list.

What do I mean by "meta"?

In the original two posts, I believe "meta" means "promoting effective altruism in abstract, with the hope that people do good object level projects in the future". This does not include cause prioritization research, which is typically also lumped under "meta". In this post, I'll use "meta" in much the same way -- I'm talking about work that is trying to promote effective altruism in the abstract and/or fundraising for effective charities. Example organizations include most of the Centre for Effective Altruism (Giving What We Can, EA Outreach, EA Global, Chapters team), 80,000 Hours, most of .impact, Raising for Effective Giving, Charity Science, Center for Applied Rationality, Envision, Students for High Impact Charity and local EA groups. However, it does not include GiveWell or the Foundational Research Institute.

What is this post not about?

I am not suggesting that we should put less money into meta; actually I think most meta-charities are doing great work and should get more money than they have. I am more worried that in the future, meta organizations will be funded more than they should be because of leverage ratio considerations that don't take into account all of the potential downsides.

Potential Biases

Most of the work I do falls under "meta"—I help coordinate across local EA groups, and I run a local group myself. As a result, I've become pretty familiar with the landscape of certain meta organizations, which means that I've thought much more about meta than other cause areas. I expect that if I spent more time looking into other cause areas, I would find issues with them as well that I don't currently know about.

I have become most familiar with local EA groups, the CEA chapters team, Giving What We Can, and 80,000 Hours, so most of my examples focus on those organizations. I'm not trying to single them out or suggest that they suffer most from the issues I'm highlighting—it's just easiest for me to use them as examples since I know them well.

Summary of considerations raised before

In Peter Hurford’s original post, there were several points made:

Meta Orgs Risk Not Actually Having an Impact. Since meta organizations are often several levels removed from direct impact, if even one of the "chains of impact" between levels fails to materialize, the meta org will not have any impact.
Meta Orgs Risk Curling In On Themselves. This is the failure mode where meta organizations are optimized to spread the meta-movement but fail to actually have object-level impact.
You Can’t Use Meta as an Excuse for Cause Indecisiveness. Donating to meta organizations allows you to skip the work of deciding which object-level cause is best, which is not good for the health of the movement.
At Some Point, You Have to Stop Being Meta. This is the worry that we do too much meta work and don't get around to doing object-level work soon enough.
Sometimes, Well Executed Object-Level Action is What Best Grows the Movement. For example, GiveWell built an excellent research product without focusing on outreach, but still ended up growing the movement.

It seems to me that meta trap #4 is simply a consequence of several other traps -- there are many meta traps that will cause us to overestimate the value of meta work, and as a result we may do more meta work than is warranted, and not do enough object-level work. So, I will be ignoring meta trap #4 for the rest of this post.

I'm also not worried about meta trap #2, because I think that meta organizations will always have a “chain of impact” that bottoms out with object-level impact. (Either you start a meta organization that helps an object-level organization, or you start a meta organization that helps some other organization that inductively eventually helps an object-level organization.) To fall into meta trap #2, at some point in the chain one of the organizations would have to change direction fairly drastically to not have any object-level impact at all. However, the magnitude of the object-level impact may be much smaller than we expect. In particular, this "chain of impact" is what causes meta trap #1.

The Chain of Impact

Let's take the example of the Chapters team at the Centre for Effective Altruism (CEA), which is one of the most meta organizations -- they are aimed at helping and coordinating local EA groups, which are helping to spread EA, which is both encouraging more donations to effective charities and encouraging more effective career choice, which ultimately causes object-level impact. At each layer, there are several good metrics which are pretty clear indicators that object-level impact is happening:

The Chapters team could use "number of large local EA groups" or "total GWWC pledges/significant plan changes from local groups"
Each local group could use "number of GWWC pledges", "number of significant plan changes", and "number of graduated members working at EA organizations"
GWWC and 80,000 Hours themselves have metrics to value a pledge and a significant plan change, respectively.
The object-level charities that GWWC recommends have their own metrics to evaluate their object-level impact.

Like I said above, this “chain of impact” makes it pretty unlikely that meta organizations will end up curling in on themselves and become detached from the object level. However, a long chain does mean that there is a high risk of having no impact (meta trap #1). In fact, I would generalize meta trap #1:

Meta Trap #1. Meta orgs amplify bad things too

The whole point of meta organizations is to have large leverage ratios through a "chain of impact", which is usually operationalized using a "chain of metrics". However, this sort of chain can lead to other amplifications as well:

Meta Trap #1a. Probability of not having an impact

This is the point made in the original post -- the more meta an organization is, the more likely that one of the links in the chain fails and you don't have any impact at all.

One counterargument is that often, meta organizations help many organizations one level below them (eg. GWWC). In this case, it is extremely unlikely that none of these organizations have any impact, and so the risk is limited to just the risk that the meta organization itself fails at introducing enough additional efficiency to justify its costs.

Meta Trap #1b. Overestimating impact

It is plausible that organizations systematically overestimate their impact (something like the overconfidence effect). If most or all of the organizations along the chain overestimate their impact, then the organization at the top of the chain will have a vastly overestimated object-level impact. Note that if an organization phrases its impact in terms of its impact on the organization directly below them in the chain, this does not apply to that number. It only applies to the total estimated object-level impact of the organization.

You can get similar problems with selection effects, where the people who start meta organizations are more likely to think that meta organizations are worthwhile. Each selection effect leads to more overconfidence in the approach, and once again the negative effects of the bias grow as you go further up the chain.

Meta Trap #1c. Issues with metrics

A commonly noted issue with metrics is that an organization will optimize to do well on its metrics, rather than what we actually care about. In this way, badly chosen metrics can make us far less effective.

As a concrete example, let's look at the CEA Chapters Team chain of metrics:

One of many plausible metrics for the CEA Chapters team is "number of large local groups". (I'm not sure what metrics they actually use.) They may focus on getting smaller local groups to grow, but this may result in local groups having large speaker events and counting vaguely interested students as "members" that results in a classification of "large" even though not much has actually changed with that local group.
Local groups themselves often use "number of GWWC pledges" as a metric. They may start to promote the pledge very widely with external incentives (eg. free food for event attendees, followed by social pressure to sign the pledge). As a result, more students may take the pledge, but they may be much more likely to drop out quickly.
GWWC has three metrics on its home page -- number of pledges, amount of money already donated, and amount of money pledged. They may preferentially build a community for pledge takers who already make money, since they will donate sooner and increase the amount of money already donated. Students may lose motivation due to the lack of community and so would forget about the pledge, and wouldn’t donate when the time came. GWWC would still be happy to have such pledges, because they increase both the number of pledges and the amount of money pledged.
The various object-level organizations that GWWC members donate to could have similar problems. For example, perhaps the Against Malaria Foundation would focus on areas where bednets can be bought and distributed cheaply, giving a better “cost per net” metric, rather than spending slightly more to distribute nets in regions with much higher malaria burdens, where the money would save more lives.

If all of these were true, you'd have to question whether the CEA Chapters team is having much of an impact at all. On the other hand, even though the metrics for AMF lead to suboptimal behavior, you can still be quite confident that they have a significant impact.

I don't think that these are true, but I could believe weaker versions (in particular, I worry that GWWC pledges from local groups are not as good as the average GWWC pledge).

In addition to the 5 traps that Peter Hurford mentions, I believe there are other meta traps that tend to make us overestimate the impact of meta work:

Meta Trap #6. Marginal impact may be much lower than average impact.

The GiveWell top charities generally focus on implementing a single intervention. As a result, we can roughly expect that the marginal impact of a dollar donated is about the same as the average impact of a dollar, since both dollars go to the same intervention. It's still likely lower (for example, AMF may start funding bednets in areas with lower malaria burden, reducing marginal impact), but not that much lower.

However, meta organizations typically have many distinct activities for the same goal. These activities can have very different cost-effectiveness. The marginal dollar will typically fund the activity with the lowest (estimated) cost-effectiveness, and so will likely be significantly less impactful than the average dollar.

Note that this assumes that the activities are not symbiotic -- that is, the argument only works if stopping one of the activities would not significantly affect the cost-effectiveness of other activities. As a concrete example of such activities in a meta organization, see this comment about all the activities that get pledges for GWWC.

The numbers that are typically publicized are for average impact. For example, on average, GWWC claims a 104:1 multiplier, or 6:1 for their pessimistic calculation, and claims that the average pledge is worth $73,000. On average the cost to 80,000 Hours of a significant plan change is £1,667. (These numbers are outdated and will probably be replaced by newer ones soon.) What about the marginal impact of the next dollar? I have no idea, except that it's probably quite a bit worse than the average. I would not be surprised if the marginal dollar didn't even achieve a 1:1 multiplier under GWWC's assumptions for their pessimistic impact calculation. (But their pessimistic impact calculation is really pessimistic.) To be fair to meta organizations, marginal impact is a lot harder to assess than average impact. I myself focus on average impact when making the case for local EA groups because I actually have a reasonable estimate for those numbers.

One counterargument against this general argument is that each additional activity further shares fixed costs, making everything more cost-effective. For example, GWWC has to maintain a website regardless of which activities it runs. If it adds another activity to get more pledges that relies on the website, then the cost of maintaining the website is spread over the new activity as well, making the other activities more cost effective. Similar considerations could apply to office space costs, legal fees, etc. I would guess that this is much less important than the inherent difference in cost effectiveness of different activities.

Meta Trap #7. Meta suffers more from coordination problems.

As the movement grows and more people and meta organizations become a part of it, it becomes more important to consider the group as a whole, rather than the impact of just your actions. This 80,000 Hours post applies this idea to five different situations.

Especially in the area of promoting effective altruism in the abstract, there are a lot of organizations working toward the same goal and targeting the same people. For example, perhaps Alice learns about EA from a local EA group, goes to a CFAR workshop, starts a company because of 80,000 Hours and takes the Founder's Pledge and GWWC pledge. We now have five different organizations that can each claim credit for Alice's impact, not to mention Alice herself. In addition, from a "single player counterfactual analysis", it is reasonable for all the organizations to attribute nearly all the impact to themselves -- if Alice was talking to these organizations while making her decision, each organization could separately conclude that Alice would not have made these life changes without them, and so counterfactually they get all the credit. (And this could be a reasonable conclusion by each organization.) However, the total impact caused would then be smaller than the sum of the impacts each organization thinks they had.

Imagine that Alice will now have an additional $2,000 of impact, and each organization spent $1,000 to accomplish this. Then each organization would (correctly) claim a leverage ratio of 2:1, but the aggregate outcome is that we spent $5,000 to get $2,000 of benefit, which is clearly suboptimal. These numbers are completely made up for pedagogical purposes and not meant to be actual estimates. In reality, even in this scenario I suspect that the ratio would be better than 1:1, though it would be smaller than the ratio each organization would compute for itself.

Note that the recent changes at CEA have helped with this problem, but it still matters.

Meta Trap #8. Counterfactuals are harder to assess.

It's very unclear what would happen in the absence of meta organizations -- I would expect the EA movement to grow anyway simply by spreading through word of mouth, but I don't know how much. If Giving What We Can didn’t exist, perhaps EA would grow at the same rate and EAs would publicize their donations on the EA Hub. If 80,000 Hours didn’t exist, perhaps some people would still make effective career choices by talking to other EAs about their careers. It is hard to properly estimate the counterfactual for impact calculations -- for example, GWWC asks pledge takers to self-report their counterfactual donations, which is fraught with uncertainties and biases, and as far as I know 80,000 Hours does not try to estimate the impact that a person would have had before their plan change.

This isn't a trap in and of itself -- it becomes a trap when you combine it with biases that lead us to overestimate how much counterfactual impact we have by projecting the counterfactual as worse than it actually would have been. We should care about this for the same reasons that we care about robustness of evidence.

I think that with most object-level causes this is less of an issue. When RCTs are conducted, they eliminate the problem, at least in theory (though you do run into problems when trying to generalize from RCTs to other environments). I think that this is a problem in far future areas (would the existential risk have happened, or would it have been solved anyway?), but people are aware of the problem and tackle it (research into the probabilities of various existential risks, looking for particularly neglected existential risks such as AI risk). I haven't seen anything similar for meta organizations.

What should we do?

Like I mentioned before, I’m not worried about meta traps #2 and #4. I agree that meta traps #3 (cause indecisiveness) and #5 (object-level action as a meta strategy) are important but I don't have concrete suggestions for them, other than making sure that people are aware of them.

For meta trap #1, I agree with Peter's suggestion: The more steps away from impact an EA plan is, the more additional scrutiny it should get. In addition, I like 80,000 Hours’ policy of publishing some specific, individual significant plan changes that they have caused. Looking at the details of individual cases makes it clear what sort of impact the organization has underneath all the metrics, and ideally would directly show the object-level impact that the organization is causing (even if the magnitude is unclear). It seems to me that most meta organizations can do some version of this.

I worry most about meta trap #6 (marginal vs. average impact). It also applies to animal welfare organizations, but I'm less worried there because Animal Charity Evaluators (ACE) does a good job of thinking about that consideration deeply, creating cost-effectiveness estimates for each activity, and basing its recommendations on that. We could create "Meta Charity Evaluators", but I'm not confident that this is actually worthwhile, given that there are relatively few meta organizations, and not much funding flows to them. However, this is similar to the case for ACE, so there's some reason to do this. This could also help with meta trap #7 (coordination problems), if it took on coordination as an explicit goal.

I would guess that we don't need to do anything about meta trap #8 (hard counterfactuals) now. I think most meta organizations have fairly strong cases for large counterfactual impact. I would guess that we could make good progress by research into what the counterfactuals are, but again since there is not much funding for meta organizations now, this does not seem particularly valuable.

undefined @ 2016-12-21T01:23 (+14)

I'll add two more potential traps. There's overlap with some of the existing ones but I think these are worth mentioning on their own.

9) Object level work may contribute more learning value.

I think it's plausible that the community will learn more if it's more focused on object level work. There are several plausible mechanisms. For example (not comprehensive): object level work might have better feedback loops, object level work may build broader networks that can be used for learning about specific causes, or developing an expert inside view on an area may be the best way to improve your modelling of the world. (Think about liberal arts colleges' claim that it's worth having a major even if your educational goals are broad "critical thinking" skills.)

I'm eliding here over lots of open questions about how to model the learning of a community. For example: is it more efficient for communities to learn by their current members learning or by recruiting new members with preexisting knowledge/skills?

I don't have an answer to this question but when I think about it I try to take the perspective of a hypothetical EA community ten years from now and ask whether it would prefer to primarily be made up of people with ten years' experience working on meta causes or a biologist, a computer scientist, a lawyer, etc. . .

10) The most valuable types of capital may be "cause specific"

I suppose (9) is a subset of (10). But it may be that it's important to invest today on capital that will pay off tomorrow. (E.G. See 80k on career capital.) And cause specific opportunities may be better developed (and have higher returns) than meta ones. So, learning value aside, it may be valuable for EA to have lots of people who invested in graduate degrees or building professional networks. But these types of opportunities may sometimes require you to do object level work.

undefined @ 2016-12-21T21:45 (+3)

These traps are fairly compelling.

My broader claim would be that if we had a model where most of the activities that can usefully be augmented will come from folks with: i) great expertise in one of several fields ii) excellent epistemics iii) low risk aversion then the movement would de-prioritize grassroots meta, and change its emphasis, while upweighting direct activities and subfield-specific meta.

undefined @ 2016-12-23T12:00 (+1)

9) seems pretty compelling to me. To use some analogies from the business world: it wouldn't make sense for a company to hire lots of people before it had a business model figured out, or run a big marketing campaign while its product was still being developed. Sometimes it feels to me like EA is doing those things. (But maybe that's just because I am less satisfied with the current EA "business model"/"product" than most people.)

undefined @ 2016-12-20T22:51 (+7)

Hi Rohin,

I agree these are good concerns.

I partially address 7 and point 8 in the post that you link to: https://80000hours.org/2016/02/the-value-of-coordination/#attributing-impact Note that it is possible for the credit to sum to more than 100%.

I discuss point 6 here: https://80000hours.org/2015/11/stop-talking-about-declining-returns-in-small-organisations/

Issues with how to assess impact, metrics etc. are discussed in-depth in the organisation's impact evaluations.

What I'm keen to see is a detailed case arguing that these are actually problems, rather than just pointing out that they might be problems. This would help us improve.

Just to clarify, you'd like to see funding to meta-charities increase, so don't think these worries are actually sufficient to warrant a move back to first order charities?

Cheers,

Ben

PS. One other small thing – it's odd to class GiveWell as not meta, but 80k as meta. I often think of 80k as the GiveWell of career choice. Just as GiveWell does research into which charities are most effective and publicises it, we do research into which career strategies are most effective and publicise it.

undefined @ 2016-12-21T07:51 (+2)

Note that it is possible for the credit to sum to more than 100%.

Yes, I agree that this is possible (this is why I said it could be "a reasonable conclusion by each organization"). My point is that because of this phenomenon, you can have the pathological case where from a global perspective, the impact does not justify the costs, even though the impact does justify the costs from the perspective of every organization.

I discuss point 6 here

Yeah, I agree that potential economies of scale are much greater than diminishing marginal returns, and I should have mentioned that. Mea culpa.

Issues with how to assess impact, metrics etc. are discussed in-depth in the organisation's impact evaluations.

My impression is that organizations acknowledge that there are issues, but the issues remain. I'll write up an example with GWWC soon.

Just to clarify, you'd like to see funding to meta-charities increase, so don't think these worries are actually sufficient to warrant a move back to first order charities?

That's correct.

PS. One other small thing – it's odd to class GiveWell as not meta, but 80k as meta. I often think of 80k as the GiveWell of career choice. Just as GiveWell does research into which charities are most effective and publicises it, we do research into which career strategies are most effective and publicise it.

I agree that 80k's research product is not meta the way I've defined it. However, 80k does a lot of publicity and outreach that GiveWell for the most part does not do. For example: the career workshops, the 80K newsletter, the recent 80K book, the TedX talks, the online ads, the flashy website that has popups for the mailing list. To my knowledge, of that list GiveWell only has online ads.

undefined @ 2016-12-21T10:19 (+3)

What I'm keen to see is a detailed case arguing that these are actually problems, rather than just pointing out that they might be problems. This would help us improve.

I've got a speculative one for GWWC, and a more concrete one for chapter seeding.

GWWC pledges: I've mentioned that I don't worry about traps #2 and #4, and traps #3 and #5 don't apply to a specific organization, so I'll skip those.

Meta Trap #1a. Probability of not having an impact

I don't think this is a problem for GWWC.

Meta Trap #1b. Overestimating impact

Here are some potential ways that GWWC could be overestimating impact:

They assume that the rate people drop out of the pledge is constant, but I would guess that the rate increases for the first ~10 years and then drops. In addition, I would guess that the average 2016 pledge taker is less committed than the average 2015 pledge taker (more numbers suggests lower quality), so that would also increase the rate.
In one section, instead of computing a weighted average, they compute the average weight and then multiply it by the total donations, for reasons unclear to me. Fixing this could cause the impact to go up or down. I mentioned this to GWWC and so far haven't gotten a definitive answer.
Counterfactuals are self-reported and so could be systematically wrong. GWWC's response to this ameliorated my concerns but I would still guess that they are biased in GWWC's favor. For example, all of the pledges from EA Berkeley members would have happened as long as some sort of pledge existed. You may credit GWWC with some impact since we needed it to exist, but at the very least it wouldn't be marginal impact. However, I think (I'm guessing) that some of our members assigned counterfactual impact to GWWC.
Their time discounting could be too small (since meta trap #5 suggests a larger time discount).

Now since GWWC talks about their impact in terms of their impact on the organizations directly beneath them on the chain, you don't see any amplification of overestimates. However, consider the case of local EA groups. They could be overestimating their impact too:

They assume that a GWWC pledge is worth $73,000, which is what GWWC says, but the average local group pledge may be worse than the average GWWC pledge, because the members are not as committed to effectiveness, or they are more likely to drop out due to lack of community. (I say "may be worse" because I don't know what the average GWWC pledge looks like, it may turn out there are similar problems there.)
They may simply overcount the number of members who have taken the pledge. (I have had at least one student say that they took the pledge, but their name wasn't on the website, and many students say they will take the pledge but then don't.)

Meta Trap #1c. Issues with metrics

I do think that local groups sometimes sacrifice quality of pledges for number of pledges when they shouldn't.
It actually is the case that student pledge takers from EA Berkeley have basically no interaction with the GWWC community. I don't know why this is or if its normal for pledge takers in general. Sometimes I worry that GWWC spends too much time promoting their top charities since that would improve their metrics, but I have no real reason for thinking that this is actually happening.

Meta Trap #6. Marginal impact may be much lower than average impact.

My original post explained how this would be the case for GWWC. I agree though that economies of scale will probably dominate for some time.

Meta Trap #7. Meta suffers more from coordination problems.

I think local EA groups and GWWC both take credit for pledges originating from local groups. (It depends on what those pledge takers self-reported as the counterfactual.) If they came from an 80,000 Hours career workshop, then we now have three organizations claiming the impact.

There's also a good Facebook thread about this. I forgot about it when writing the post.

Meta Trap #8. Counterfactuals are harder to assess.

I've mentioned this above -- GWWC uses self-reported counterfactuals. If you agree that you should penalize expected value estimates if the evidence is not robust, then I think you should do the same here.

Here's the second example:

There was an effort to "seed" local EA groups, and in the impact evaluation we see "each hour of staff time generated between $187 and $1270 USD per hour."

First problem: The entire point of this activity was to get other people to start a local EA group, but the time spent by these other people aren't included as costs (meta trap #7, kind of). Those other people would probably have to put in ~20 person hours per group to get this impact. If you include these hours as costs, then the estimated cost-effectiveness becomes something more like $167-890.

Second problem: I would bet money at even odds that the pessimistic estimate for cold-email seeding was too optimistic (meta trap #1b). (I'm not questioning the counterfactual rate, I'm questioning the number of chapters that "went silent".)

Taking these two into account, I think that the chapter seeding was probably worth ~$200 per hour. Now if GWWC itself is too optimistic in its impact calculation (as I think it is), this falls even further (meta trap #1b), and this seems just barely worthwhile.

That said, there are other benefits that aren't incorporated into the calculation (both for GWWC pledges in general and chapter seeding). So overall it still seems like it was worthwhile, but it's not nearly as exciting as it initially seemed.

undefined @ 2016-12-21T17:51 (+2)

These are all reasonable concerns. I can't speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates. Though you could easily differ from them in your judgement calls.

With LEAN not including the costs of the chapter heads, they might have just decided that the costs of this time are low. Typically, in these estimates, people are trying to work out something like GiveWell dollars in vs. GiveWell dollars out. If a chapter head wouldn't have worked on an EA project or earned to give to GiveWell charities otherwise, then the opportunity cost of their time could be small when measured in GiveWell dollars. In practice, it seems like much chapter time comes out of other leisure activities.

With 80k, we ask people taking the pledge whether they would have taken it if 80k never existed, and only count people who say "probably not". These people might still be biased in our favor, but on the other hand, there's people we've influenced but were pushed over the edge by another org. We don't count these people towards our impact, even though we made it easier for the other org.

(We also don't count people who were influenced by us indirectly, so don't know they were influenced)

Zooming out a bit, ultimately what we do is make people more likely to pledge.

Here's a toy model.

At time 0, you have 3 people.
Amy has a 10% chance of taking the pledge
Bob has a 80% chance
Carla has a 90% chance

80k shows them a workshop, which makes the 10% more likely to take it, so at time 1, the probabilities are:

Amy: 20%
Bob: 90%
Carla: 100% -> she actually takes it

Then GWWC shows them a talk, which has the same effect. So at time 2:

Amy: 30%
Bob: 100% -> actually takes it
Carla: 100% (overdetermined)

Given current methods, 80k gets zero impact. Although they got Carla to pledge, Carla tells them she would have taken it otherwise due to GWWC, which is true.

GWWC counts both Carla and Bob as new pledgers in their total, but when they ask them how much they would have donated otherwise, Carla says zero (80k had already persuaded her) and Bob probably gives a high number too (~90%), because he was already close to doing it. So this reduces GWWC's estimate of the counterfactual value per pledge. In total, GWWC adds 10% of the value of Bob's donations to their estimates of counterfactual money moved.

This is pessimistic for 80k, because without 80k, GWWC wouldn't have persuaded Bob, but this isn't added to our impact.

It's also a bit pessimistic for GWWC, because none of their effect on Amy is measured, even though they've made it easier for other organisations to persuade her.

In either case, what's actually happening is that 80k is adding 30% of probability points and GWWC 20% of probability points. The current method of asking people what they would have done otherwise is a rough approximation for this, but it can both overcounts and undercounts what's really going on.

undefined @ 2016-12-21T20:24 (+2)

Re: Leisure time. I think I would have probably either taken another class, gotten a part-time paying job as a TA, or done technical research with a professor if I weren't leading EAB (which took ~10 hours of my time each week). I'm not positive how representative this is across the board, but I think this is likely true of at least some other chapter leaders, and more likely to be true of the most dedicated (who probably produce a disproportionate amount of the value of student groups).

undefined @ 2016-12-22T21:08 (+1)

Hmm, my comment about this was lost.

On second thoughts, "leisure time" isn't quite what I meant. I more thought that it would come out of other extracurriculars (e.g. chess society).

Anyway, I think there's 3 main types of cost:

Immediate impact you could have had doing something else e.g. part-time job and donating the proceeds.
Better career capital you could have gained otherwise. I think this is probably the bigger issue. However, I also think running a local group is among the best options for career capital while a student, especially if you're into EA. So it's plausible the op cost is near zero. If you want to do research and give up doing a research project though, it could be pretty significant.
More fun you could have had elsewhere. This could be significant on a personal level, but it wouldn't be a big factor in a calculation measured in terms of GiveWell dollars.

undefined @ 2016-12-21T20:12 (+1)

In practice, it seems like much chapter time comes out of other leisure activities.

I strongly disagree.

I can't speak for the details of the two estimates you mention, though my impression is that the points listed have probably already been considered by the people making the estimates.

This is exactly why I focused on general high-level meta traps. I can give several plausible ways in which the meta traps may be happening, but it's very hard to actually prove that it is indeed happening without being on the inside. If GWWC has an issue where it is optimizing metrics instead of good done, there is no way for me to tell since all I can see are its metrics. If GWWC has an issue with overestimating their impact, I could suggest plausible ways that this happens, but they are obviously in a better position to estimate their impact and so the obvious response is "they've probably thought of that". To have some hard evidence, I would need to talk to lots of individual pledge takers, or at least see the data that GWWC has about them. I don't expect to be better than GWWC at estimating counterfactuals (and I don't have the data to do so), so I can't show that there's a better way to assess counterfactuals. To show that coordination problems actually lead to double-counting impact, I would need to do a comparative analysis of data from local groups, GWWC and 80k that I do not have.

There is one point that I can justify further. It's my impression that meta orgs consistently don't take into account the time spent by other people/groups, so I wouldn't call that one a judgment call. Some more examples:

CEA lists "Hosted eight EAGx conferences" as one of their key accomplishments, but as far as I can tell don't consider the costs to the people who ran the conferences, which can be huge. And there's no way that you could expect this to come out of leisure time.
I don't know how 80k considers the impact of their career workshops, but I would bet money that they don't take into account the costs to the local group that hosts the workshop.

(We also don't count people who were influenced by us indirectly, so don't know they were influenced)

Yes, I agree that there is impact that isn't counted by these calculations, but I expect this is the case with most activities (with perhaps the exception of global poverty, where most of the impacts have been studied and so the "uncounted" impact is probably low).

Here's a toy model.

The main issue is that I don't expect that people are performing these sorts of counterfactual analyses when reporting outcomes. It's a little hard for me to imagine what "90% chance" means so it's hard for me to predict what would happen in this scenario, but your analysis seems reasonable. (I still worry that Bob would attribute most or all of the impact to GWWC rather than just 10%.)

However, I think this is mostly because you've chosen a very small effect size. Under this model, it's impossible for 80k to ever have impact -- people will only say they "probably wouldn't" have taken the GWWC pledge if they started under 50%, but if they started under 50%, 80k could never get them to 100%. Of course this model will undercount impact.

Consider instead the case where a general member of a local group comes to a workshop and takes the GWWC pledge on the spot (which I think happens not infrequently?). The local group has done the job of finding the member and introducing her to EA, maybe raising the probability to 30%. 80K would count the full impact of that pledge, and the local group would probably also count a decent portion of that impact.

More generally, my model is that there are many sources that lead to someone taking the GWWC pledge (80k, the local group, online materials from various orgs), and a simple counterfactual analysis would lead to every such source getting nearly 100% of the credit, and based on how questions are phrased I think it is likely that people are actually attributing impact this way. Again, I can't tell without looking at data. (One example would be to look at what impact EA Berkeley members attribute to GWWC.)

rohinmshah @ 2021-12-14T18:50 (+6)

Apparently this post has been nominated for the review! And here I thought almost no one had read it and liked it.

Reading through it again 5 years later, I feel pretty happy with this post. It's clear about what it is and isn't saying (in particular, it explicitly disclaims the argument that meta should get less money), and is careful in its use of arguments (e.g. trap #8 specifically mentions that counterfactuals being hard isn't a trap until you combine it with a bias towards worse counterfactuals). I still agree that all of the traps mentioned here are worth keeping in mind when working on "meta".

The biggest critique of this post is that it doesn't demonstrate that any of these traps actually happen(ed) in practice. It has several examples, but most are of the form "such-and-such bad thing could be happening, I can't tell from the outside". This comment makes some more speculative claims about what bad things actually happened, but they are speculative and the response mentions that they were probably already taken into account.

I think this does in fact make the post less valuable than it otherwise could be. Nonetheless, I still find the post important, because it's the closest we get to criticism of "meta" work. In theory, we could have better criticisms from people who are actually doing the work themselves, who can say more definitively whether in practice there are cases of these "traps", but in practice I have not seen such critiques.

If I were rewriting this post today, I'd make a few changes:

Stop saying "meta". I don't know what I'd replace it with, but "meta" is too ambiguous and easily misunderstood. "Promotion traps" came up as a suggestion in the comments; that seems reasonable.
Focus on properties of the work. Instead of having a single type of work called "meta" and then talking about various traps, it seems better to talk about specific traps and say what kinds of work that trap applies to. For example, trap #1 applies to work that has a long chain of impact, whereas trap #6 and trap #7 apply whenever there are multiple things optimizing the same outcome. It happens that what I called "meta" work in 2016 satisfied both of these properties, but the analysis would be stronger if I had just talked about the properties and then noted that "meta" work has both of these properties. (This would also make it easier to talk about which traps apply to which pieces of work, rather than arguing about whether GiveWell and 80K count as "meta" work.)
Note positives of "meta". This post is straightforwardly about the negatives; it would have been good to acknowledge the positives as well (which I had probably been taking as background knowledge), or at least say that I'm only focusing on negatives because the positives are more widely known.
Note the possibility of increasing marginal returns. Trap #6 is implicitly arguing for diminishing marginal returns, but there's a strong case for increasing marginal returns instead. I don't currently think this is quite as clear as it sounds in the comments -- you want to distinguish between exponential-growth-that-would-have-happened-anyway vs. exponential-growth-that-happens-as-a-result-of-future-work -- but I think it is more likely increasing than decreasing.
Emphasize taking the perspective of "EA as a whole". Trap #7 is centrally about how individually rational actions by orgs can be irrational from the perspective of an "EA superagent" trying to coordinate all of EA. Ideally I would have added motivation for why "what an EA superagent should do" was the appropriate perspective, rather than "what an EA org should do", "what in individual should do", or "what humanity as a whole should do".

Some miscellaneous thoughts:

The clearest critique of "meta"-in-practice I gave in this post is that cost effectiveness analyses often don't take into account the costs incurred by people outside of the "meta" org. I think this critique has become stronger over time. As a small example, people working on improving the AI safety pipeline often ask for an hour or two of my time (which I value quite highly); I doubt these costs are making it into cost effectiveness analyses.
Some commenters questioned whether there is more knowledge of positive arguments for "meta" work vs. negative arguments. The fact that enough people read this post for it to be nominated for the review is giving me some pause, but despite no longer working in "meta", I do still have the occasional conversation where a "meta" person seems more aware of the positive arguments than the negative ones, but not the other way around, so I continue to think that the positives are better known than the negatives.

undefined @ 2016-12-22T17:03 (+2)

Hypothesis: there's lots of good, informal meta work to be done, like telling convincing your aunt to donate to GiveWell rather than Heiffer International or your company to do a cash fundraiser rather than a canned food drive. But the marginal returns diminish real quickly: once you've convinced all the relatives that are amenable, it is really hard to convince the holdouts or find new relatives. But the remaining work isn't just lower expected value, it has much slower, more ambiguous feedback loops, so it's easy to miss the transition.

Object level work is hard, and there are few opportunities to do it part time. Part time meta work is easy to find and sometimes very high value. My hypothesis is that when people think about doing direct work full time, these facts conspire to make meta work the default choice. In fact full time meta work is the most difficult thing, because of poor feedback loops, and the easiest to be actively harmful with, because you risk damaging the reputation of EA or charity as a whole.

I think we need to flip the default so that people look to object work, not meta, when they have exhausted their personal low hanging fruit.

undefined @ 2016-12-21T08:55 (+2)

… but people are aware of the problem and tackle it (research into the probabilities of various existential risks, looking for particularly neglected existential risks such as AI risk). I haven't seen anything similar for meta organizations.

I’m closest to the EA Foundation and know that their strategy rests to a great part on focusing on hard-to-quantify high risk–high return projects because these are likely to be neglected. I don’t know if other metaorganizations are doing something similar, but it is possible.

Imagine that Alice will now have an additional $2,000 of impact, and each organization spent $1,000 to accomplish this. Then each organization would (correctly) claim a leverage ratio of 2:1, but the aggregate outcome is that we spent $5,000 to get $2,000 of benefit, which is clearly suboptimal. These numbers are completely made up for pedagogical purposes and not meant to be actual estimates. In reality, even in this scenario I suspect that the ratio would be better than 1:1, though it would be smaller than the ratio each organization would compute for itself.

Yes. Good point and another reason fund ratios are silly (and possibly toxic). The other one is this one. I’ve written an article on a dangerous phenomenon that has been limiting the work in some cause areas that is also related to this attribution problem.

undefined @ 2016-12-21T01:46 (+2)

Re#6: The only object-level cause discussed is global poverty and health interventions. However, other object-level causes seem much more structurally similar to meta-level work. For instance, this description would seem to hold true of much of the work on X-risk:

[They] typically have many distinct activities for the same goal. These activities can have very different cost-effectiveness. The marginal dollar will typically fund the activity with the lowest (estimated) cost-effectiveness, and so will likely be significantly less impactful than the average dollar

Hence insofar as this is an issue (though see Rob's and Ben's comments) it's not unique to meta-level work.

Re #7: Much of the impact from current work on X-risk plausibly derives from getting more valuable actors involved. Since there are several players in the EA X-risk space, this means that it may be hard to estimate which EA X-risk org caused more valuable actors to get involved in X-risk, just like it may be hard to estimate which EA meta-org caused EA movement growth. Thus this problem doesn't seem to be unique to meta-orgs. (Also, I agree with Ben that one would like to see detailed case arguing that these are actually problems, rather than just pointing out that they might be problems.)

This points to the fact that much of the work within object-level causes is "meta" in the sense that it concerns getting more people involved, rather than in doing direct work. However, it is not "meta" in the sense used in this post. (Ben discussed this distinction in his reply to Hurford - see his remark on 'second level meta'.)

Generally, I think that the discussion on "meta" vs "object-level" work would gain from more precise definitions and more conceptual clarity. I'm currently working on that.

Re#8

I think that with most object-level causes this is less of an issue. When RCTs are conducted, they eliminate the problem, at least in theory (though you do run into problems when trying to generalize from RCTs to other environments).

I don't understand why global poverty and health RCTs (which I suppose is what you refer to) would make a difference. What you're discussing is whether someone's investment makes a difference, or whether what they're trying to do would have occurred anyway. For instance, whether them donating to AMF leads to less people dying from malaria. I think that's plausibly the case, but the question of RCTs vs other kinds of evidence - e.g. observational studies - seems orthogonal to that issue.

I think that this is a problem in far future areas (would the existential risk have happened, or would it have been solved anyway?), but people are aware of the problem and tackle it (research into the probabilities of various existential risks, looking for particularly neglected existential risks such as AI risk).

Current neglectedness of an existential risk is not necessarily a good guide to future neglectedness. Hence focussing on currently neglected risks does not guarantee that you have a large counterfactual impact.

I'm currently looking into the issue of future investment into X-risk and there doesn't seem to be that much research done on it, so it's not clear to me that people have tackled this problem. It's generally very difficult.

It thus seems to me the situation regarding X-risk is quite analogous to that regarding meta-work on this score, too, but I am not sure I have understood your argument.

Re#3 (which you support, though you don't comment on it further):

To the contrary, meta-work can be a wise choice in face of uncertainty of what the best cause is. Meta-work is supposed to give you resources which can be flexibly allocated across a range of causes. This means that if we're uncertain of what object-level cause is the best, meta-work might be our best choice (whereas being sure what the best cause is is a reason to work on that cause instead of doing meta-level work).

One of the more ingenious aspects of effective altruism is that it is fits an uncertain world so well. If the world were easy to predict, there would be less of a need for a movement which can shift cause as we gather more evidence of what the top cause is. However, that is not the world we're living in, as, e.g. the literature on forecasting shows.

Given what we know about human overconfidence, I think there is more reason to be worried that people are overconfident about their estimates of the relative marginal expected value of object-level causes, than that they withhold judgement of what object-level cause is best for too long.

undefined @ 2016-12-20T23:22 (+2)

"On average the cost to 80,000 Hours of a significant plan change is £1,667."

80,000 Hours' average cost per plan change this year was more like £300. It's a good example of increasing returns to scale/quality being a more important factor than shrinking opportunities - the marginal cost is way lower than the long term average.

The cost of getting an extra plan change by running additional workshops is about the same, which suggests the short term marginal is about the same as the short-term average.

undefined @ 2016-12-21T08:25 (+1)

80,000 Hours' average cost per plan change this year was more like £300.

Yeah, I was looking for this year's numbers because I figured it would have gone below £1000. But that's impressively low, damn. One more point to economies of scale.

The cost of getting an extra plan change by running additional workshops is about the same, which suggests the short term marginal is about the same as the short-term average.

What drove the improvement to £300?

undefined @ 2016-12-21T20:33 (+1)

The online career guide and workshop keep getting better, and we've become more skilled at promoting them. The fixed costs of producing it all are now spread over a much larger number of readers (~1 million a year).