$5k challenge to quantify the impact of 80,000 hours' top career paths

By NunoSempere @ 2022-09-23T11:32 (+126)

Motivation

80,000 hours has identified a number of promising career paths. They have a fair amount of analysis behind their recommendations, and in particular, they have a list of top ten priority paths.

However, 80,000 hours doesn’t quite^[1] have quantitative estimates of these paths' value. Although their usefulness would not be guaranteed, quantitative estimates could make it clearer:

how valuable their top career paths are relative to each other
how valuable their top career paths are relative to options further down their list
at which level of personal fit one should switch between different career paths^[2]
where the expected impact is coming from, and which variables we are most uncertain about
eventually, whether certain opportunities are valuable in themselves or for the value of information or career capital that they provide
etc.

The Prize

Following up on the $1,000 Squiggle Experimentation Challenge and the Forecasting Innovation Prize we are offering a prize of $5k for quantitative estimates of the value of 80,000 hours' top 10 career paths.

Rules

Step 1: Make a public post online between now and December 1, 2022. Posts on the EA Forum (link posts are fine) are encouraged.

Step 2: Complete this submission form.

Further details

Participants can use units or strategies of their choice—these might be QALYs, percentage points of reduction in existential risk, basis points of the future, basis points of existential risk reduced, career-dependent units, etc. Contestants could also use some other method, like relative values, estimating proxies, or some original option.
We are specifically looking for quantitative estimates that attempt to estimate some magnitude reasonably close to the real world, similar to the units above^[3]. So for example, assigning valuations from 0 to 5 stars would not fulfil the requirements of the contest, but estimates in terms of the units above would qualify.
Participants are free to estimate the value of one, several, or all ten career paths.
Participants are free to use whatever tool or language they want to produce these estimates. Some possible tooling might be: Excel, Squiggle, Guesstimate, probabilistic languages or libraries (e.g., Turing.jl, PyMC3, Stan), Causal, working directly in a popular programming language, etc.
Participants can provide point estimates of impact, but they are encouraged to provide their estimates as distributions instead.
Participants are free to estimate the impact of a marginal person, of a marginal person with a good fit, the average value, etc. Participants are welcome to provide both average and marginal value—for example, they could provide a function which provides an estimate of marginal value at different levels of labor and capital.

We provide some examples of possible rough submissions in an appendix. We are also happy to comment on estimation strategies: feel free to leave a comment on this post or to send a message to Nuño Sempere using the EA forum message functionality.

Judging

The judges will be Nuño Sempere, Eli Lifland, Alex Lawsen and Sam Nolan. These judges will judge on their personal capacities, and their stances do not represent their organizations.

Judges will estimate the quality and value of the entries, and we will distribute the prize amount of $5k^[4] in proportion to an equally weighted aggregate of those subjective estimates^[5].

To reduce our operational burden, we are looking to send out around three to five prizes. If there are more than five submissions, we plan to implement a lottery system. For example, a participant who would have won $100 would instead get a 10% chance of receiving $1k.

Acknowledgements

This contest is a project of the Quantified Uncertainty Research Institute, which is providing the contest funds and administration. Thanks in advance to Eli Lifland, Alex Lawsen and Sam Nolan for their good judgments. Thanks to Ozzie Gooen for comments and suggestions.

Appendix: Example models

Example I: "Founder of new projects tackling top problems"

The following is a crude example estimate for the career path of Founder of new projects tackling top problems, written in Squiggle.

// Operational challenges
chanceOfAttainingTrustAndGettingFunding = beta(5, 30) // t(0.05 to 0.3, 1)
chanceOfGettingAnOrganizationOfTheGround = beta(10, 10) // t(0.3 to 0.8, 1)

// Estimate of impact
yearlyOrganizationFunding = mx(50k to 500k, 200k to 10M, 5M to 50M, [0.65, 0.25, 0.05])
giveDirectlyValueOfQALYsPerDollar = 1/(160 to 2700) // taken from some Sam Nolan’s estimates: <https://observablehq.com/@hazelfire/givewells-givedirectly-cost-effectiveness-analysis>
organizationValueMultiplier = mx(
  [0.1 to 1, 1 to 8, 8 to 80, 80 to 320, 320 to 500k],
  [4, 8, 4, 2, 1]
)
// very roughly inspired by: https://forum.effectivealtruism.org/posts/GzmJ2uiTx4gYhpcQK/effectiveness-is-a-conjunction-of-multipliers
shapleyMultiplier = 0.2 to 0.5
lifetimeOfOrganization = mx(2 to 7, 5 to 50)
 
// Aggregate
totalValueOfEntrepeneurshipInQALYs = chanceOfAttainingTrustAndGettingFunding *
  chanceOfGettingAnOrganizationOfTheGround *
  yearlyOrganizationFunding *
  giveDirectlyValueOfQALYsPerDollar *
  organizationValueMultiplier *
  lifetimeOfOrganization *
  shapleyMultiplier
 
// Aggregate with maximums
t(dist, max) = truncateRight(dist, max)
totalValueOfEntrepeneurshipInQALYsWithMaxs =
  chanceOfAttainingTrustAndGettingFunding *
  chanceOfGettingAnOrganizationOfTheGround *
  t(yearlyOrganizationFunding, 500M) *
  giveDirectlyValueOfQALYsPerDollar *
  t(organizationValueMultiplier, 10M) * // overall estimate really sensitive to the maximum here.
  lifetimeOfOrganization *
  t(shapleyMultiplier, 1)

 // Display
{ 
    totalValueOfEntrepeneurshipInQALYsWithMaxs: 
      totalValueOfEntrepeneurshipInQALYsWithMaxs 
}

Alone, the estimate might be too obscure, so it would be better if it were accompanied by some explanation about the estimation strategy it is using. So, its estimation strategy is:

To estimate the chance of getting funding and then getting an organization off the ground
- This is based on subjective guesses. Perhaps Charity Entrepreneurship, or EA funds if it kept data, could have better estimates
To estimate the value that an organization produces. This is the weakest part of the model, and it would be better if it were based on specific steps. Instead, we are using more of a "black box" model, and estimating:
- The funding that the organization would receive
- The QALYs per dollar that a reference organization—GiveDirectly—produces, taken from Sam Nolan’s estimate thereof.
- The advantage over GiveDirectly that the new organization would have. We are getting this estimate from this EA forum post
To estimate some other factors to go from the above to the total output, again based on pretty subjective estimates:
- The lifetime of the organization
- The "Shapley multiplier" penalizes efforts which require more people. In this case, we are saying that the founder gets between 20% and 50% of the impact.

We also have to take care that not only the 90% confidence interval, but also the overall shape of the estimates was correct. For this reason, we have a step where we truncate some of them.

As mentioned, a key input of the model is the multiplier of impact over GiveDirectly, but this is based on black box reasoning. This could be a possible point of improvement. For example, we could improve it with an estimate of how many QALYs, or what percentage of the future is an speculative area like AI safety research worth.

Example II: Value of global health charities

There are various distributional models of global health charities in the EA forum that participants may want to take some inspiration from, e.g.:

Dan Wahl’s estimate of the cost-effectiveness of LEEP
Sam Nolan’s cost-effectiveness models

The advantage of these is that they can be pretty clean. The disadvantage is that they come from a different cause area.

Example III: Value of the Centre for the Governance of AI

Here, I give an estimate for the value of the Centre for the Governance of AI (GovAI) in terms of basis points of existential risk reduced. It might serve as a source of inspiration. One disadvantage is that it only considers one particular pathway to impact that GovAI might have, and it doesn't consider other pathways that might be more important—e.g., field-building.

Example IV: Value of ALLFED

Historically, one of the few longtermist organizations which has made an attempt to estimate their own impact quantitatively is ALLFED. A past estimate of theirs can be seen here. My sense is that the numeric estimates might have been on the optimistic side (some alternative numbers here). But the estimation strategy of dividing their influence and impact depending on different steps might be something to take inspiration from.about

^{^}
80,000 hours, when thinking abou their own impaabouct, internally use "discounted impact-adjusted peak year" (DIPY). But this seems like a fairly coarse unit.
^{^}
This is actually more nuanced. There might be some frustration about people quickly/naïvely jumping to whatever cause or sub-cause has the best apparent marginal value at each point in time rather than committing to something. But this might be counterproductive if people have more impact staying in one place, or if impact is a combination of people working on different areas. For a specific example, suppose that impact is a Cobb–Douglas function of work in different areas, and that there is some coordination inefficiencies. Then focusing on attaining the optimal proportion of people in each area might be better than aiming to estimate marginal values through time.
^{^}
The criteria isn't exactly to have a unit such that 2x on that unit is twice as better. For example, percentage reductions of existential/catastrophic risk in the presence of several such risks aren't additive, but we would accept such estimates. Similarly, relative values can only be translated to magnitudes in an "additive" unit with a bit of work, but we would also accept such estimates.
^{^}
Having a fixed pot is slightly less elegant than deciding beforehand on an amount to reward for a given level of quality, but it comes with an added operations burden/uncertainty.
^{^}
For example, if we get two submissions and we estimate the first one to be twice as valuable as the second one, the first submission would receive $3.33k and the second submission would receive 1.66k. Instead, if the first submission's individual estimates were estimated to be twice as valuable, but also were twice as many in number as those of the second submission, the first one would receive $4k and the second one would receive $1k.

merilalama @ 2023-03-02T13:20 (+8)

FYI, the lottery system described above as a means to distributing prizes is illegal in the US (states have a monopoly on random lotteries, unfortunately)

Basically, if a payment system involves three features together, it will likely be illegal:

Consideration (e.g. you need to spend money or time to be considered for the prize)
Chance
Prize (cash or anything of value)

The typical way around this is to get rid of consideration and give an alternative means of entry, e.g. “you can also enter by emailing this email” or similar. This is why promotions often have the “no purchase necessary” language in the US. There are ways around this, but typically you need to apply for a license.

[Credits to Abraham Rowe for catching this.]

merilalama @ 2023-03-02T20:43 (+1)

Also, to be clear, QURI didn't end up using this system to distribute prizes.

Ardenlk @ 2022-10-09T12:06 (+4)

Is it right that all submissions will receive some (expected) part of the pot?

NunoSempere @ 2022-10-09T21:37 (+8)

Not if their expected value as estimated by the judges is negative or zero. But essentially yes.

Arepo @ 2022-11-09T20:11 (+3)

This seems like a really cool idea. I'm interested to hear how this element of it works out, and whether you decide to use a similar approach in future.

It's presumably not worth it this time around, but if you do do it again, I as a potential entrant would be really interested in some kind of live tracking of the number of current entrants, and if possible other analytics about quality of existing submissions - basically anything that would help people gauge the EV of submitting.

NunoSempere @ 2022-11-10T13:24 (+3)

Only two submissions I know of now, one of which is a draft. Still seems like it has high EV.

Ardenlk @ 2022-10-09T22:04 (+1)

cool thanks!

James Ozden @ 2022-09-23T13:56 (+4)

Step 1: Make a public post online between now and November 1, 2022. Posts on the EA Forum (link posts are fine) are encouraged.

Step 2: Complete this submission form.
Submissions are due December 1, 2022.

This doesn't make sense - people need over a month to submit their public post via your submission form? Or one of the dates is wrong?

NunoSempere @ 2022-09-23T14:29 (+4)

Thanks, fixed. December 1 is the correct date

Katy Moore @ 2022-10-12T14:15 (+1)

Just noting the Airtable form still says October 1 too :)

NunoSempere @ 2022-10-12T15:43 (+2)

Thanks, changed as well.

NunoSempere @ 2022-10-29T14:08 (+3)

Heads up that I only know of one future submission (Vasco Grillo below), and no one has actually sent a submission yet. Seems like applying has decently high expected value

Vasco Grilo @ 2022-10-24T14:16 (+1)

Thanks for the challenge!

We are also happy to comment on estimation strategies: feel free to leave a comment on this post or to send a message to Nuño Sempere using the EA forum message functionality.

Comments on this strategy to quantify the cost-effectiveness of operations management in high-impact organisations are welcome.

NunoSempere @ 2022-10-29T14:06 (+3)

Left a few comments on your doc, thanks!