What should the norms around privacy and evaluation in the EA community be?

By NunoSempere @ 2021-06-16T17:31 (+66)

Recently, I've been thinking about this question in the context of this post: 2018-2019 Long Term Future Fund Grantees: How did they do?. I was considering the options of:

  1. Publishing all evaluations, including the negative ones.
  2. Publishing all evaluations except the most embarrassing evaluations, and the aggregate summary
  3. Publishing only the positive evaluations, and the aggregate summary
  4. Publishing only two evaluations, one about someone I asked permission to do so, and another one about a fairly public figure, and the aggregate summary
  5. Just publish the summary.

In the end, I decided to go with option 4., as it seemed the least risky. More open options have the drawback that they might ruffle some feathers and make people feel uncomfortable. Repeating my rationale on the post:

some people will perceive evaluations as unwelcome, unfair, stressful, an infringement of their desire to be left alone, etc. Researchers who didn’t produce an output despite getting a grant might feel bad about it, and a public negative review might make them feel worse, or have other people treat them poorly. This seems undesirable because I imagine that most grantees were taking risky bets with a high expected value, even if they failed in the end, as opposed to being malicious in some way. Additionally, my evaluations are fairly speculative, and a wrong evaluation might be disproportionately harmful to the person the mistake is about.

On the other hand, making evaluations public is more informative for readers, who may acquire better models of reality if the evaluations are correct, or be able to point out flaws if the evaluation has some errors.

I'd also be curious about whether evaluators generally should or shouldn't give the people and organizations being evaluated the chance to respond before publication. On the one hand, the two perspectives side by side might produce more accurate impressions, but on the other hand, it really adds a lot of overhead. On the third hand, the organizations being evaluated also don't generally point to their criticisms on their promotional material (as argued on example 5 here). I remember reading some discussion about this in EA Forum comments, but can't find it.

Lastly, it seems to be that evaluations of public figures and organizations seem generally "fair game", whether positive or negative. Though I'd be interested in more nuanced considerations, if they exist.

I'd be curious to hear your thoughts and perspectives.


Larks @ 2021-06-16T18:45 (+46)

I'd also be curious about whether evaluators generally should or shouldn't give the people and organizations being evaluated the chance to respond before publication. 

My experience is that it is generally good to share a draft, because organisations can be very touchy about irrelevant details that you don't really care much about and are happy to correct. If you don't give them this opportunity they will be annoyed and your credibility will be reduced when the truth comes out, even if it doesn't have any real logical bearing on your conclusions. This doesn't protect you against different people in the org having different views on the draft, and some objecting afterwards, but it should get you most of the way there.

On the other hand it is a little harder if you want to be anonymous, perhaps because you are afraid of retribution, and you're definitely right that it adds a lot of time cost.

I don't think there's any obligation to print their response in the main text however. If you think their objections are valid, you should adjust your conclusions; if they are specious, let them duke it out in the comment section. You could include them inline, but I wouldn't feel obliged to quote verbatim. Something like this would seem perfectly responsible to me:

Organisation X said they were going to research ways to prevent famines using new crop varieties, but seem to lack scientific expertise. In an email they disputed this, pointing to their head of research, Dr Wesley Stadtler, but all his publications are in low quality journals and unrelated fields.

This allows readers to see the other POV, assuming you summarise it fairly, without giving them excessive space on the page or the last word.

I agree that any organisation that is soliciting funds or similar from the public is fair game. It's unclear to me to what extend this also applies to those which solicit money from a fund like the LTFF which is itself primarily dependant on soliciting money from the public.

Aaron Gertler @ 2021-06-17T05:21 (+8)

...your credibility will be reduced when the truth comes out, even if it doesn't have any real logical bearing on your conclusions.

I've had this happen to me before, and it was annoying...

...but I still think that it's appropriate for people to reduce their trust in my conclusions if I'm getting "irrelevant details" wrong. If an author makes errors that I happen to notice, I'm going to raise my estimate for how many errors they've made that I didn't notice, or wouldn't be capable of noticing. (If a statistics paper gets enough basic facts wrong, I'm going to be more suspicious of the math, even if I lack the skills to fact-check that part.) 

This extends to the author's conclusion; the irrelevant details aren't discrediting, but they are credibility-reducing.

(For what it's worth, if someone finds that I've gotten several details wrong in something I've written, that's probably a sign that I wrote it too quickly, didn't check it with other people, or was in some other condition that also reduced the strength of my reasoning.)

NunoSempere @ 2021-06-17T07:54 (+4)

...but I still think that it's appropriate for people to reduce their trust in my conclusions if I'm getting "irrelevant details" wrong. If I notice an author make errors that I happen to notice, I'm going to raise my estimate for how many errors they've made that I didn't notice

This makes sense, but I don't think this is bad. In particular, I'm unsure about my own error rate, and maybe I do want to let people estimate my unknown-error rate as a function of my "irrelevant details" error rate.

Aaron Gertler @ 2021-06-17T08:24 (+2)

This makes sense, but I don't think this is bad. 

I also don't think it's bad. Did I imply that I thought it was bad for people to update in this way? (I might be misunderstanding what you meant.)

NunoSempere @ 2021-06-17T09:54 (+2)

Did I imply that I thought it was bad for people to update in this way?

Reading it again, you didn't

Ozzie Gooen @ 2021-06-17T00:13 (+8)

Just brainstorming:
I imagine we could eventually have infrastructures for dealing with such situations better. 

Right now this sort of work requires:
 

  • Figuring out who in the organization is a good fit for asking about this.
  • Finding their email address.
  • Emailing them.
  • If they don't respond, trying to figure out how long you should wait until you post anyway.
  • If they do respond and it becomes a thread, figure out where to cut off things.
  • If you're anonymous, setting up an extra email account.

Ideally it might be nice to have policies and infrastructure for such work. For example:

  1. Coded practices and norms for responses. Organizations can specify which person is responsible and what their email address is. They also commit to responding in some timeframe.
  2. Services for responses. Maybe there's a middleman who knows the people at the orgs and could help do some of the grunt work of routing signals back and forth.
MichaelA @ 2021-06-17T08:16 (+8)

I think part of the problems you point to (though not all) could be easily fixed by just simple tweaks to the initial email: In the initial email, say when you plan to post by if you don't get a response (include that in bold) and say something to indicate how much back-and-forth you're ok with / how much time you're able/willing to invest in that (basically to set expectations). 

I think you could also email anyone in the org out of the set of people whose email address you can quickly find and whose role/position sounds somewhat appropriate, and ask them to forward it to someone else if that's better.

Linch @ 2021-06-16T20:08 (+8)

My experience is that it is generally good to share a draft, because organisations can be very touchy about irrelevant details that you don't really care much about and are happy to correct. If you don't give them this opportunity they will be annoyed and your credibility will be reduced when the truth comes out, even if it doesn't have any real logical bearing on your conclusions.

To defend the side of the organizations a little, one reason for this is that they may have fairly different threat models from you/evaluators. 
 
A concrete example in our community recently is the Scott Alexander/New York Times kerfuffle, where the seemingly irrelevant detail of Scott's real last name was actually critical (in a way that the NYT journalist didn't understand or chose not to understand) to maintaining a within-institution job of being a psychiatrist. There was a similar example with Naomi Wu iirc.

A much more minor example is that I noticed Peter (and others) usually being somewhat touchy and quick to correct people about any misrepresentations related to how much they pay employees, eg see here. I don't think his correction at all altered Ben_West's core point, but from the perspective of leading a growing organization, having correct public numbers on how much new employees are paid  may be pretty important for hiring. 
 

Larks @ 2021-06-17T01:17 (+4)

Yup, I agree with that, and am typically happy to make such requested changes. 

Nathan Young @ 2021-06-16T22:41 (+14)

I guess you could ask people to veto their evaluation being published and if they do describe only whether it was positive or negative but not who it was about our what it was.

NunoSempere @ 2021-06-17T08:32 (+3)

I find the simplicity of this appealing.

Aaron Gertler @ 2021-06-17T05:34 (+13)

I think it depends on how much information you have.

If the extent of your evaluation is a quick search for public info, and you don't find much, I think the responsible conclusion is "it's unclear what happened" rather than "something went wrong". I think this holds even for projects that obviously should have public outputs if they've gone well. If someone got a grant to publish a book, and there's no book, that might look like a failure -- but they also might have been diagnosed with cancer, or gotten a sudden offer for a promising job that left them with no time to write. (In the latter case, I'd hope they would give the grant back, but that's something a quick search probably wouldn't find.)

(That said, it still seems to good to describe the search you did, just so future evaluators have something more to work with.)

On the other hand, if you've spoken to the person who got the grant, and they showed you their very best results, and you're fairly sure you aren't missing any critical information, it seems fine to publish a negative evaluation in almost every case (I say "almost" because this is a complicated question and possible exceptions abound.)

Depending on the depth of your search and the nature of the projects (haven't read your post closely yet), I could see any of 1-5 being what I would do in your place.

NunoSempere @ 2021-06-17T08:31 (+6)

If the extent of your evaluation is a quick search for public info, and you don't find much, I think the responsible conclusion is "it's unclear what happened" rather than "something went wrong". I think this holds even for projects that obviously should have public outputs if they've gone well.

So to push back against this, suppose that if you have four initial probabilities (legibly good, silently good, legibly bad, silently bad). Then you also have a ratio (legibly good + silently good) : (legibly bad + silently bad). 

Now if you learn that the project was not legibly good or legibly bad, then you update to (silently good, silently bad). The thing is, I expect this ratio  silently good : silently bad to be different than the original (legibly good + silently good) : (legibly bad + silently bad), because I expect that most projects, when they fail, do so silently, but that a large portion of successes have a post written about them. 

For an intuition pump, suppose that none of the projects from the LTF had any information to be found online about them. Then this would probably be an update downwards. But what's true about the aggregate seems also true probabilistically about the individual projects.

So overall, because I disagree that the "Bayesian" conclusion is uncertainty, I do see a tension between the thing to do to maintain social harmony and the thing to do if one wants to transmit a maximal amount of information. I think this is particularly the case "for projects that obviously should have public outputs if they've gone well".

But then you also have other things, like:

  • Some areas (like independent research on foundational topics) might be much, much more illegible than others (e.g, organizing a conference)
  • Doing this kind of update might incentivize people to go into more legible areas
  • An error rate changes things in complicated ways. In particular, maybe the error rate in the evaluation increases the more negative the evaluation is (though I think that the opposite is perhaps more likely). This would depend on your prior about how good most interventions are.
  • ...
Aaron Gertler @ 2021-06-17T08:51 (+4)

I was too vague in my response here: By "the responsible conclusion", I mean something like "what seems like a good norm for discussing an individual project" rather than "what you should conclude in your own mind". 

I agree on silent success vs. silent failure and would update in the same way you would upon seeing silence from a project where I expected a legible output. 

If the book isn't published in my example, it seems more likely that some mundane thing went poorly (e.g. book wasn't good enough to publish) than that the author got cancer or found a higher-impact opportunity. But if I were reporting an evaluation, I would still write something more like "I couldn't find information on this, and I'm not sure what happened" than "I couldn't find information on this, and the grant probably failed". 

(Of course, I'm more likely to assume and write about genuine failure based on certain factors: a bigger grant, a bigger team, a higher expectancy of a legible result, etc. If EA Funds makes a $1m grant to CFAR to share their work with the world, and CFAR's website has vanished three years later, I wouldn't be shy about evaluating that grant.)

I'm more comfortable drawing judgments about an overall grant round. If there are ten grants, and seven of them are "no info, not sure what happened", that seems like strong evidence that most of the grants didn't work out, even if I'm not past the threshold of calling any individual grant a failure. I could see writing something like: "I couldn't find information on seven of the ten grants where I expected to see results; while I'm not sure what happened in any given case, this represents much less public output than I expected, and I've updated negatively about the expected impact of the fund's average grant as a result."

(Not that I'm saying an average grant necessarily should have a legible positive impact; hits-based giving is a thing. But all else being equal, more silence is a bad sign.)

MichaelA @ 2021-06-16T20:11 (+6)

I think this is a good question, and I'm interested to see more people's answers.

One perhaps obvious point is that I think it depends to a substantial extent on how many people/organisations are mentioned in a similar way in a similar place at the same time, since that affects how much people are or feel "singled out". E.g., if you published a post with an unusually candid/detailed evaluation of 20 people/orgs, this could indeed create some degree of harm/discomfort/anger - but if you publish the same post with only 1 of those evaluations (randomly selected), that would increase the chance that that person/org is harmed, made uncomfortable, or made angry. It makes it harder for them to hide in the crowd and makes the post look more like a hit piece, even if what you say about that person/org is the same in both instances.

(Your linked post avoided that issue because the evaluations you published weren't randomly selected but rather were ones that were less likely that average to cause issues when published.)

RobertHarling @ 2021-06-21T19:57 (+5)

I think I would have some worry that if external evaluations of individual grant recipients became common, this could discourage people from applying from grants in future, for fear of being negatively judged should the project not work out. 

Potential grant recipients might worry that external evaluators may not have all the information about their project or the grant makers reasoning for awarding the grant. This lack of information could then lead to unfair or incorrect evaluations. This would be more a risk if it becomes common for people to write low quality evaluations that are weakly reasoned, uncharitable or don't respect privacy. I'm unsure whether it would be easy to encourage high quality evaluations (such as your own) without also increasing the risk of low quality evaluations. 

The risk of discouraging grant applications would probably be greater for more speculative funds such as the Long Term Future Fund (LTFF), as it's easier for projects to not work out and look like wasted funds to uninformed outsiders.

There could also be an opposite risk that by seeking to discourage low quality evaluations, we discourage people too much from evaluating and criticizing work. It might be useful to establish key principles for writing evaluations that enable people to right respectful and useful evaluations, even with limited knowledge or time.

I'm unsure where the right trade-off between usefully evaluating projects, and not discouraging grant applications would be. Thank you for your review of the LTFF recipients and for posting this question, I found both really interesting. 

Vhanon @ 2021-06-20T08:54 (+3)

On the other hand, making evaluations public is more informative for readers, who may acquire better models of reality if the evaluations are correct, 

I am in agreement. Please, let me note that people can still get a good model of reality even if they do not know the names of the people involved.

If evaluations did not contain the name of the subjects, do you think it would still be easy for readers to connect the evaluation to the organisations being evaluated? Perhaps you could frame the evaluation so that links are not clear. 

or be able to point out flaws if the evaluation has some errors.

Although this is the reviewer's responsibility, it would be nice to have extra help indeed. (Is this you goal?) Though, the quality of feedback you receive is linked to the amount of information you share, and specific organisation details might be important here. Perhaps, you could share the detailed information to a limited set of interested people working while asking them to sign a confidentiality agreement.

I'd also be curious about whether evaluators generally should or shouldn't give the people and organizations being evaluated the chance to respond before publication.

Would that make the reviewers change their mind?

If there is a specific issue the reviewer is worried about, I believe the reviewer can query the organisation directly.

If it is a more general issue, it is likely to be something the reviewer need to do further research about. Probably the reviewer does not have enough time to carry out the needed research, and a rushed evaluation does not help. 

Nonetheless, it is important to give the organisations an opportunity to give a post-evaluation feedback, so that the reviewer has chance to address the general issue before the next round of reviews.

Furthermore, let's not forget that one of the evaluation criteria is the ability of the applicants to introduce the problem, describe clearly the plans and address risks and contingencies. If something big is missing, it is generally a sign that the applicant needs a bit more time to complete the idea, and the reviewer should probably advise waiting for the next round.