Getting Actual Value from “Info Value”: Example from a Failed Experiment

By Nikola, tlevin, aL, NickGabs @ 2023-01-26T17:48 (+65)

TL;DR: Experiments that generate info value should be shared between EA groups. Publication bias makes it more likely that groups repeat each other’s mistakes. Doing things “for the info value” produces info value only insofar as 1) you actually try to do them well and 2) you process and share the information that they generate. As an example, we present a failed experiment we ran in the Spring of 2022 where we changed the format of our Precipice Reading Group, which had very low attendance due (we think) to a structure that felt impersonal and unaccountable. This, among other experiences, has taught us that personalization and accountability is very important for the success of a program. We invite other group organizers (or anyone else who does things “for the info value”) to share potentially helpful results of failed experiments in the comments.

How “Info Value” Experiments Lose Their Potential Value

EA group organizers often consciously change a program’s setup, or run an unusual program, “for the info value” – that is, they might not have high expectations that it will succeed, but “at least we’ll know if something like this works.” It is very valuable to try new things; EA groups have explored very little of the space of field-building programs, and good discoveries can be scaled up!

But we want to raise two caveats:

Running things “for the info value” is often paired with “80-20’ing,” where organizers run low-overhead versions of programs and don’t sweat the details. But if you actually expect a program to be impactful, you might not want to “80-20 it” – you might want to make the operations smooth, pay lots of attention to participants, and so on. This means that even while piloting a program, you get much worse info value if the program is understaffed or otherwise badly run and understaffed. If the program works anyway, this can indeed signal to organizers that an even better version could be great, but often the program goes poorly and fails to generate the info value of whether a well-executed version of the program would have worked. (We think our Precipice group described below mostly executed everything except the format reasonably well and does not fit in this category, but have witnessed this a few times, including in our own programs.)
People seem much more likely to post on the forum about their groups’ successful experiments than their failed experiments, which might be resulting in lots of wasted effort. It’s understandable not to want to write a whole post about a failed experiment, though, so we’d like to invite group organizers to post about other things that didn’t work but generated useful information value in the comments!

Example: Precipice Reading Group Format Change

Harvard organizers have been doing Precipice reading groups every semester since the Fall semester of 2021. Their format was similar to an introductory fellowship: specific people were assigned to cohorts with an assigned discussion leader, and they were to meet at specific times and locations. There wasn’t an extensive application form, just a sign up form, and people were automatically admitted to the program.

Dropoff rates were pretty high (~60% of signups came to the first meeting, and as we got more into the semester, only around 15% of signups would actually finish the reading group). This dropoff seems to widely apply to reading groups/book clubs. For example, a remote summer book club hosted by the Harvard Data Science Initiative went from around 15-20 participants to around 5-7 regulars. For comparison, in the Fall 2022 Arete Fellowship (with a more formal syllabus and application process), ~90% of admitted students attended the first meeting and ~75% attended four or more (out of seven) weeks.

In the Fall semester of 2022, we experimented by changing the meeting format: our new format was such that there were big meetings on Monday and Tuesday where we’d provide pizza, and people could show up to either meeting based on their availability. People weren’t “assigned” to either of the days, and they could change their day every week if they wanted. Not much else changed. In theory, this made attendance easier for participants, and it dramatically reduced organizer overhead. We advertised the program moderately widely and got about 50 signups.

But the format bombed. Only ~20% of signups came to the first week, and while dropoff was lower due to up-front filtering for people who were more excited, only ~10% of the initial 50 finished the reading group.

This has taught us a valuable lesson: people will not show up to things if they don’t feel like they are expected to be there or held accountable if they don’t show. If you have specific people assigned to specific groups, they will know that their absence will be noticed. If you have a huge melting pot of people, individual absences will mostly go unnoticed. And, in general, people respond to signals that a program is well-organized and has lots of individual attention to go around. We do not recommend this format for reading groups in the future, including how selective it is. Also, playing to peoples’ schedules is important; a significant number of people were surprisingly not free on either day, and we suspect that people picking a time that specifically works for them makes them more likely to come to that time.