Open Phil EA/LT Survey 2020: Introduction & Summary of Takeaways

By Eli Rose @ 2021-08-19T01:01 (+91)

Introduction

Between September and November 2020, Open Philanthropy ran a survey of ~200 people, most of whom were involved in or interested in longtermist work. Our aim was to get a better understanding of a) these people and b) the extent to which various community-building interventions have or haven’t been important to getting them involved and helping them have more impact. Ultimately, the goal was to make better-informed grantmaking decisions to support these people and increase their number. This project was done as part of our work in our EA community-building program area.

In contrast to the 2020 EA Survey run by Rethink Priorities, our survey was invitation-only. Invites were sent mostly to people we, or our advisors in various longtermist focus areas, knew and thought were likely to have careers with particularly high expected altruistic value (though our judgments here are very rough and we expect them to have missed a lot of such people; see more about this); as a result, it was taken by about 10x fewer people than the 2020 EA Survey. It was also more in-depth: we asked a lot of questions to each respondent, including questions about their life stories and about specific organizations. And, since our goal was to get fairly up-to-date information about what interventions are affecting people in this space, we focused on people who we thought were likely to have been influenced fairly recently (if they were influenced at all).

This sequence of posts gives an overview of our survey and presents our findings. “I” throughout refers to me, Eli Rose. I did this work in my capacity as a Program Associate at Open Phil. “We” refers to me and Claire Zabel. Claire leads Open Phil’s grantmaking in this area; she project-managed this work.

We’re sharing this report only with the goal of informing others who are interested in our findings; it doesn’t include any updates on Open Phil grantmaking policy.

There are seven posts in this sequence:

Introduction & Takeaways, the post you are currently reading, which will go on to briefly discuss what we think are the main takeaways from this survey and provide links to later posts which describe them in more detail.
Methodology, which describes how we conducted the survey.
Respondent Info, which presents information about what cause areas our respondents are involved in and what types of jobs they have or seem on track to have, as well as info about their educational backgrounds and self-reported level of engagement in EA.
What Helped and Hindered Our Respondents, which describes the different ways we asked, in this survey, the basic question “What was important in your journey towards longtermist priority work?”, how those different questions gave rise to the different quantitative metrics, and how we recommend interpreting those metrics (which inform some of the results we state here).
EA Groups, which describes results about how different types of EA meetup groups seem to be having impact on our respondents.
How Our Respondents First Learned About EA/EA-Adjacent Ideas, which describes results about how and when our respondents first came into contact with EA, rationalist, etc. ideas.
Other Findings, which discusses findings from miscellaneous other questions we asked in the survey.

You can read this whole sequence as a single Google Doc here, if you prefer having everything in one place. Feel free to comment either on the doc or on the Forum. Comments on the Forum are likely better for substantive discussion, while comments on the doc may be better for quick, localized questions.

This public version of the report doesn’t include all the data we gathered or all the analysis we did. It’s intended primarily to convey some of what we think are the most important takeaways (and how we arrived at them), rather than to transparently report our complete findings in a way that would allow readers to do their own analysis on top of them. For example, we show only a portion of our full numerical results about which organizations, pieces of content, etc. seemed to have had the most positive impact on our respondents (see here), since for various reasons we didn’t want to publicly rank these items. We also decided to omit some results because they didn’t contribute to our main takeaways, because it seemed too difficult to share them while maintaining our respondents’ privacy, or because we thought there were other costs involved in sharing them. These results are included in a non-public version of the report. If you think having access to this non-public version might help you in your work, you can get in contact with us (eli.rose@openphilanthropy.org or claire@openphilanthropy.org) and we can discuss what else we might be able to share from it (though we can’t guarantee anything).

Methodological Notes and Limitations

Our survey has a time bias, such that it’s better at catching impacts which happened in some years than in others. We tried to bias towards surveying people who entered longtermist priority work fairly recently (because we care most about the current set of processes that cause this to happen), so we tend to miss impacts that happened to people who have been around for a longer time, and hence miss a disproportionate amount of impacts that happened a long time ago. On the other end, because we tried to survey people that our advisors knew pretty well, we tended to survey fewer people who got involved in longtermist priority work very recently, because these people had had less time to become well-known by our advisors. See Who We Surveyed for more details on this process. Overall, though we don’t have clean data on this, I think most of the impacts recorded in this survey happened between 2015 - 2018. So keep that in mind when interpreting our results, especially our results concerning organizations or pieces of content which have had most of their impact outside that window.

We don’t do significance testing or much other statistical analysis in this analysis; instead, we mostly look at means and medians, make graphs, and present qualitative data. Our sense was that this approach was the best one for our dataset, where the n on any question was at maximum 217 (the number of respondents) and often lower, though we’re open to suggestions about ways to apply statistical analysis that might help us learn more. (As non-experts in statistics, we’re more enthusiastic about simple and transparent methods.)

Relatedly, this report often presents results that are based on fairly small sample sizes. The results would be more robust if sample size were larger. But the type of person we’re trying to study (quite-promising, “recent”-ish people influenced by EA/EA-adjacent ideas and doing longtermist priority work; see here for more detail) is, as far as we know, rare enough in the world that there just isn’t that much data to get, and rare enough that what we got represents a substantial fraction of the total available. Also, since the average survey respondent seems to us to represent a lot of expected altruistic value, something that e.g. just 5 survey respondents say meaningfully affected them seems worth knowing about — even apart from the extent to which it suggests a larger trend.

Another big limitation of our survey is that it’s primarily data about success stories. It captures very little about people who started down the path to doing longtermist priority work but then veered off of it, or did not start down the path but could have under different conditions. Consequently, we expect it to reveal less (though not zero) information about the extent to which the influences we studied are repelling people from EA/EA-adjacent ideas and communities — since people who were wholly repelled wouldn’t have ended up in our respondent pool (though people who were partially repelled might have).

Sometimes, we’ll compare our results to results from the 2020 EA Survey. This survey wasn’t yet published at the time I was writing up this report, but Rethink Priorities shared some unpublished data with us for use in comparisons. I’ll link to the public results when possible.

Summary of Takeaways

In this section, I’ve attempted to summarize what I think are the most important insights from this project. If you have limited time, this is a good section to read first.

Reasonable people who are all looking at our raw data might disagree about some of our takeaways. This section is focused on expressing what we think we’ve learned, while the rest of the report focuses more on explaining how we came to think we’ve learned it. Each takeaway links to a section in a later post which provides more elaboration about what the claim means and how we arrived at it.

Some of the takeaways mention “impact on our respondents.” When I say that something “has a lot of impact on our respondents,” I’m talking about answers our respondents gave to survey questions that asked those respondents what they think most increased their expected positive impact on the world. At a high level, X “having a lot of impact on our respondents” means that, according to these answers, the total increase in our respondents’ expected positive impact on the world due to X was high. There are some more subtleties (we asked these questions in a few different ways, which gave rise to a few different “impact metrics” or ways of determining total impact on our respondents) — see this post for details.

Before doing this survey, we would have thought it plausible that a small number of community-building efforts would have had >100x more impact on our respondents than the rest of the community-building efforts. But according to the data we gathered, this doesn’t look like the situation. The distribution of impact across efforts looks substantially flatter, though there are still relatively large differences. (see more.)
Among the 34 EA/EA-adjacent items we explicitly asked about, organizations got 60-70% of the total impact reported by our respondents, pieces of content (books, blog posts, etc.) got 20-30%, and meetup groups/discussion platforms (which includes university and local or national EA groups) got 10-20%. (see more.)
When “interactions with specific people” (focused on direct interactions, e.g. conversations, not e.g. reading something a person wrote) is additionally introduced into the above categories, the credit breakdown is roughly 40% organizations, 30% specific people, 15% pieces of content, and 10% meetup groups/discussion platforms. This is according to just one metric based on a question about the few largest impacts on each respondent. (see more.)
- There are some individuals who, through these types of interactions (e.g. giving advice, connecting others), have had an impact-according-to-this-metric competitive with whole organizations or the whole works of significant authors. But keep in mind that this is a conclusion reached only by one metric; see here for more thoughts on how to interpret such conclusions.
Among the organizations, pieces of content, and meetup groups/discussion platforms we explicitly asked about, the ones that seemed to have been most impactful on our respondents robustly, across multiple metrics, were 80,000 Hours, CEA, local and university EA groups, FHI, Nick Bostrom’s works, Eliezer Yudkowsky’s works, and Peter Singer’s works. (see more.)
- A corollary of this is that the impact on our respondents of some individuals’ written work looks to have been on par with, or greater than, the work of whole organizations.
- OpenAI, CFAR, and LessWrong seemed to occupy a “second tier” of impact; they looked to have had as much or more impact than the above according to some metrics (and/or some versions of those metrics), but worse according to others or other versions.
There was a pattern where organizations, pieces of content, and meetup groups/discussion platforms associated with rationalist ideas tended to get negative responses more often than non-rationalist ones, though there were exceptions. Respondents more frequently said both that they felt negatively about these organizations etc. than others and that these organizations etc. had negatively impacted them. This is distinct from the question of what had the most net impact on our respondents (some of these things had fairly high net impact, regardless). (see more.)
Among EA groups, university groups looked to have had more impact than non-university groups. University groups got between 65 and 75% of the total impact from all groups. (see more.)
University groups at top-ranked universities looked to have been very impactful. The Oxford, Cambridge, Stanford, and Harvard groups alone got between 40-55% of the total impact from all groups. (see more.)
Impact was very unevenly distributed among groups at top-15 universities. Although this is confounded by some groups being more recently founded than others, it suggests that there might currently be (or might have been in the past) room to have significantly more impact at some universities like Princeton and MIT. Oxford, Cambridge, Stanford, and Harvard combined got the vast majority of the impact from groups at top-15 universities, according to the impact points metric. (see more.)
On average, our respondents reported first hearing about EA/EA-adjacent ideas when they were 20 years old, and 9% of them said they first heard about the ideas when they were 16 or younger. (see more.)
Respondents, on average, said that they thought the best age for them personally to have heard about EA/EA-adjacent ideas was age 16, in contrast to the average age that they actually did first hear about these ideas, which was 20. (see more.)
- See other data from the survey about early outreach such as high-school outreach.

Note that we see our results from this survey as inputs into our models about what is having an impact — not as a wholesale replacement for them. They are part of the many types of evidence we take into account when doing grantmaking.

Jonas Vollmer @ 2021-08-19T15:50 (+30)

I think it's really cool that you're making this available publicly, thanks a lot for doing this!

MarkusAnderljung @ 2021-08-19T21:53 (+6)

Came here to say the same thing :)

reallyeli @ 2021-08-20T03:41 (+2)

Ah, glad this seems valuable! : )

David_Moss @ 2021-08-19T10:41 (+24)

We don’t do significance testing or much other statistical analysis in this analysis... Our sense was that this approach was the best one for our dataset, where the n on any question was at maximum 217 (the number of respondents) and often lower, though we’re open to suggestions about ways to apply statistical analysis that might help us learn more.

Because you have so much within-subjects data (i.e. multiple datapoints from the same respondent), you will actually be much better powered than you might expect with ~200 respondents. For example, if you asked each respondent to rate 10 things (and for some questions you actually asked them to rate many more) you'd have 2000 datapoints and be better able to account for individual differences.

You might, separately, be concerned about the small sample size meaning that your sample is not representative of the population you are interested in: but as you observe here, it looks like you actually managed to sample a very large proportion of the population you were interested in (it was just a small population).