Impactful Forecasting Prize Results and Reflections

By elifland, Misha_Yagudin @ 2022-03-29T16:16 (+40)

The Impactful Forecasting Prize contest ran from February 4 to March 11, 2022. Below we announce the results and share some reflections. Please share any feedback with us in the comments or privately.

Results

Thank you to all participants and congratulations to the winners! The writeups below are from Eli’s perspective, as the winners were from submissions he primarily judged. 


$1,500: RyanBeck for a writeup on whether genetic engineering will raise IQ by >= 10 points by 2050


I appreciated Ryan’s literature review densely packed with helpful background information, and his overall forecast seemed reasonable though I found the way the scenarios were presented a little unintuitive. I updated my forecast from 56% to 50% after reading it, with significantly greater resilience [1] than before. Kudos to Ryan for releasing an updated version of his writeup in the Metaculus Journal.


$1,000: qassiov for a writeup on whether synthetic biological weapons will infect 100 people by 2030


I liked qassiov’s decomposition into non-state actors, state actor accidents, and state accident intentional usage. I also liked their attention to detail on the resolution criteria, e.g. noticing that selective breeding would not count for a pathogen to be considered synthetic. qassiov is skeptical that technology will lower the barrier to entry much in the next 8 years, but I am more uncertain; I’m not sure how much to trust the source they cited here and it may be out of date. I updated from 41% to 35%.

$800: FJehn for a writeup on when carbon capture will costs <$50/ton


FJehn presented a brief but compelling case that scientists are much more pessimistic than a few companies about bringing carbon capture costs down. This updated me from a median of 2039 to 2055, albeit still with somewhat low resilience.


$700: rodeoflagellum for a writeup on how many gene-edited babies will be born by 2030


I found some of the background information in the writeup helpful. The reminder that gene editing to treat diseases is more widely supported than editing for enhancement was useful, with the caveat that the WHO is still fairly critical of the former. The approach of decomposing into scenarios was interesting, and thinking about the framing made me narrow the right tail of my distribution.

Reflections

Level of interest

The quantity of submissions was lower than we hoped for; with 13 submissions from 8 unique forecasters while we aimed for 100 submissions from 25. On the other hand, the average quality of submissions was higher than we expected. Forecasters may have preferred to either spend lots of time to have a good shot at winning a prize, or not participate at all.


Another possible reason for lower quantity is the focus of the forecasting community on the Ukraine Conflict (e.g. on Metaculus) throughout much of the time the prize was open for submissions. To the extent this shift happened, much of it was likely deserved, e.g. we were heartened to see predictions on Metaculus alerting at least one Ukrainian observer to evacuate.


Some ideas for how we could have gotten more submissions, in case it’s helpful for organizers of future prizes:

  1. Be ambitious in who you reach out to for promotion, early. We reached out to Scott Alexander and were mentioned in an ACX open thread but we waited until over halfway through the submission window to try. We likely should have reached out to more people with large audiences, earlier.
  2. Launching near the end of the month and/or running  competition for at least ~2 months seems preferable due to newsletter timing at the beginning of months. We launched on Feb 4, then were linked in several newsletters in early March giving people not much notice for submission by March 11.


Let us know if you have other ideas for how we could have done better, especially if you considered submitting but didn’t. Help us improve our hypotheses on the barriers to more people forecasting, both in general and for our contest! Feel free to share either in the comments or in this private and optionally anonymous form.

Evaluating submissions

Evaluating submissions took more time than budgeted for, in part because the average length of submissions was longer than expected but also because as we started judging, we realized the evaluation criteria needed to be refined.[2]
 

We originally were going to judge submissions mostly based on how much they changed our best-guess forecast, but we realized that this isn’t capturing all we care about. Also, it may have misaligned incentives, in particular the incentive to compose a one-sided writeup to convince the judge to update as much as possible in one direction.[3]

Reflecting on the value of a forecast writeup, we identified 3 components of value:

  1. Improving the community’s or judge’s best guess forecast
    1. We approximated this by dividing the extent to which the judge’s best-guess forecast shifted by the initial resilience of the judge’s forecast.
  2. Increasing the resilience of the forecast: When making decisions based off of a forecast, it’s helpful to have more confidence in the answer, even if the further research needed to gain that confidence doesn’t shift the best guess forecast more.
    1. We subtracted the judge’s estimated initial resilience from their updated resilience.
  3. Providing a foundation for others to build off of: High quality comments, especially those that provide background research, clean decompositions, and other re-usable components, are useful in that future forecasts can build off of them without as much effort.
    1. We approximated this with a subjective quality rating.

 

To rank submissions, we took into account these 3 components with approximately equal weight.

 

  1. ^

    A forecast is more resilient if I’d expect it to move less with more research. See https://forum.effectivealtruism.org/posts/m65R6pAAvd99BNEZL/use-resilience-instead-of-imprecision-to-communicate.

  2. ^

    And of course also due to the everpresent planning fallacy.

  3. ^

    We didn’t necessarily observe this during the contest, though it would be hard to know if it occurred.


Ryan Beck @ 2022-03-29T17:12 (+15)

This was a cool contest, thanks for running it! In my view there's a lot of value in doing this. Doing a deep dive into polygenic selection for IQ was something I had wanted to do for quite a while and your contest motivated me to finally sit down and actually do it and to write it up in a way that would be potentially useful to others.

I think your initial criteria of how much a writeup changed your minds may have played a role in fewer than expected entries as well. Your forecasts on the set of questions seemed very reasonable and my own forecasts were pretty similar on the ones I had forecasted, so I didn't feel that I had much to contribute in terms of the contest criteria for most of them.

Hopefully that's helpful feedback to you or anyone else looking to run a contest like this in the future!

Misha_Yagudin @ 2022-03-29T17:37 (+3)

Yes, thank you; that makes sense and is very helpful!

elifland @ 2022-03-29T17:52 (+2)

Thanks for sharing Ryan, and that makes sense in terms of another unintended consequence of our judging criteria; good to know for future contests.

Ryan Beck @ 2022-03-29T19:18 (+1)

No problem!

Also if you're interested in elaborating about why my scenarios were unintuitive I'd appreciate the feedback, but if not no worries!

elifland @ 2022-03-31T14:50 (+2)

At first I thought the scenarios were separate so they would be combined with an OR to get an overall probability, which then made me confused when you looked at only scenario 1 for determining your probability for technological feasibility.

I was also confused about why you assigned 30% to polygenic scores reaching 80% predictive power in Scenario 2 while assigning 80% to reaching saturation at 40% predictive power in the Scenario 1, because when I read 80% to reach saturation at 40% predictive power I read this as "capping out at around 40%" which would only leave a maximum of 20% for scenarios with much greater than 40%?

Finally, I was a little confused about where the likelihood of iterated embryo selection fit into your scenarios; this seems highly relevant/important and is maybe implicitly accounted for in e.g. "Must be able to generate 100 embryos to select from"? But could be good to make more explicit.

Ryan Beck @ 2022-03-31T18:34 (+2)

There are good points and helpful, thanks! I agree I wasn't clear about viewing the scenarios exclusively in the initial comment, I think I made that a little clearer in the follow up.

when I read 80% to reach saturation at 40% predictive power I read this as "capping out at around 40%" which would only leave a maximum of 20% for scenarios with much greater than 40%?

Ah I think I see how that's confusing. My use of the term saturation probably confuses things too much. My understanding is saturation is the likely maximum that could be explained with current approaches, so my forecast was an 80% chance we get to the 40% "saturation" level, but I think there's a decent chance our technology/understanding advances so that more than the saturation can be explained, and I gave a 30% chance that we reach 80% predictive power.

That's a good point about iterated embryo selection, I totally neglected that. My initial thought is it would probably overlap a lot with the scenarios I used, but I should have given that more thought and discussed it in my comment.

FJehn @ 2022-03-29T17:57 (+11)

Oh wow, did not really enter to win anything. I just participated because I thought the idea is really cool and it gave me a good opportunity diving into a variety of topics. A pleasant surprise :)

I am a bit surprised by how few people participated. If I remember correctly, 4/13 submissions were by me.  I talked about this prize with several people and all seemed eager to participate, but apparently they didn't.  So, I am not sure if the lack of forecasters is due to too little promotion (though more would probably helped as well). Seems like there is a large gap between "I like the idea of this" and really sit down and participating. 

However, I hope you do something like this again, as it helped me a lot to have a selection of meaningful questions to do forecasts on.  Just opening Metaculus can be a bit overwhelming. 

If this happens again, I'll try to hold the people that like the idea accountable, so that they do a forecast and not only think about it. 

qassiov @ 2022-03-31T17:54 (+6)

Just saw this, thank you! I'm honoured to be listed alongside these insightful forecasters — and thank you for running it. I found this competition very fun and motivational, and think this kind of thing works well for fixing some incentive problems. 

I can't say why more people didn't submit, but I can say what did help prompt me to submit two rationales:

Finally, I expect another reason why the timing may have led to fewer entries was because Future Fund applications were due 10 days later (that seems like a long time, but I know that its approaching deadline was on my mind when I submitted my rationales).

jwithing @ 2022-03-29T22:37 (+5)

I didn’t realize this was going on—I gotta check these forums more often! I love the transparency provided in your reflections here.

Anthony Repetto @ 2022-03-31T12:44 (+1)

There are a few blind-spots with this competition, and I am sitting where they intersect:

  1. The Ethical Demand to Stop Bad Predictions - If I have a prediction that "this plane will crash", then it is wrong of me to make money off of that catastrophe, by allowing it to happen. I would be an accomplice. So, if I predict a negative outcome, I will attempt to prevent that outcome. Necessarily, my success would make my prediction false, yet it would be false for the wrong reason.
  2. Who Creates and Chooses the Topics - Metacalculus is generating the topics, and then your team is curating them. This leaves-out everyone who has a unique prediction. [For example: "Brick-laying robots will cut house construction costs, which will ripple-out quickly due to 'comps', the way real estate is valued. That's when real estate investors stop buying; it's a falling knife. As a result, most Americans will be underwater on their mortgage in a decade." Where do I tell people that prediction? How does it pass your curation?]

It's a troublesome sign when the boss of the company says "Here is THE problem - how do you solve it?" Better when the boss asks "What's the problem?" I don't see forecasting succeeding at its aim, if it fails to be receptive to unique outsider predictions, or those are lost in a pile due to curation-by-tiny-groups. And, waiting for bad things to happen just to make a few bucks sounds like becoming a henchmen to me. Who wants to avert calamity, instead?