Request for comments: EA Projects evaluation platform

By Jan_Kulveit @ 2019-03-20T22:36 (+34)

Edit: It is likely there will be a second version of this proposal, modified based on the feedback and comments.

The effective altruism community has a great resource - its members, motivated to improve the world. Within the community, there are many ideas floating around, and entrepreneurially-minded people keen to execute on them. As the community grows, we get more effective altruists with different skills, yet in other ways it becomes harder to start projects. It’s hard to know who to trust, and hard to evaluate which project ideas are excellent , which are probably good, and which are too risky for their estimated return.

We should be concerned about this: the effective altruism brand has a significant value, and bad projects can have repercussions for both the perception of the movement and the whole community. On the other hand, if good projects are not started, we miss out on value, and miss opportunities to develop new leaders and managers. Moreover, inefficiencies in this space can cause resentment and confusion among people who really want to do good and have lots of talent to contribute.

There's also a danger that as a community we get stuck on the old core problems, because funders and researchers trust certain groups to do certain things, but lack the capacity to vet new and riskier ideas, and to figure out which new projects should form. Overall, effective altruism struggles to use its greatest resource - effective people. Also, while we talk about “cause X”, currently new causes may struggle to even get serious attention.

One idea to address this problem, proposed independently at various times by me and several others, is to create a platform which provides scalable feedback on project ideas. If it works, it could become an efficient way to separate signal from noise and spread trust as our community grows. In the best case, such a platform could help alleviate some of the bottlenecks the EA community faces, harness more talent and energy than we are currently able to do, and make it easier for us to make investments in smaller, more uncertain projects with high potential upside.

As discussed in a previous post, What to do with people, I see creating new network-structures and extending existing ones as one possible way to scale. Currently, effective altruists use different approaches to get feedback on project proposals depending on where they are situated in the network: there is no ready-made solution that works for them all.

For effective altruists in the core of the network, the best method is often just to share a google doc with a few relevant people. Outside the core, the situation is quite different, and it may be difficult to get informative and honest feedback. For example, since applications outnumber available budget slots, by design most grant applications for new projects are rejected; practical and legal constraints mean that these rejections usually come without much feedback, which can make it difficult to improve the proposals. (See also EA is vetting-constrained)

For all of these reasons, I want to start an EA projects evaluation platform. For people with a project idea, the platform will provide independent feedback on the idea, and an estimate of the resources needed to start the project. In a separate process, the platform would also provide feedback on projects further in their life, evaluating team and idea fit. For funders, it can provide an independent source of analysis.

What follows is a proposal for such a platform. I’m interested in feedback and suggestions for improvement: the plan is to launch a cheap experimental run of the evaluation process in approximately two weeks. I’m also looking for volunteer evaluators.

Evaluation process

Project ideas will get evaluated in a multi-step process:

1a. Screening for infohazards, proposals outside of the scope of effective altruism, or otherwise obviously unsuitable proposals (ca 15m / project)

1b. Peer review in a debate framework. Two referees will write evaluations, one focusing on the possible negatives, costs and problems of the proposal; and the other on the benefits. Both referees will also suggest what kind of resources a team attempting the project should have. (2-5h / analyst / project)

1c. Both the proposal and the reviews will get anonymously published on the EA forum, gathering public feedback for about one week. This step will also allow back-and-forth communication with the project initiator.

1d. A panel will rate the proposal, utilizing the information gathered in phases b. and c., highlighting which part of the analysis they consider particularly important. (90m / project)

1e. In case of disagreement among the panel, the question will get escalated and discussed with some of the more senior people in the field.

1f. The results will get published, probably both on the EA projects platform website, and on the forum.

In a possible second stage, if a team forms around a project idea, it will go through similar evaluation, focusing on the fit between the team and the idea, possibly with the additional step of a panel of forecasters predicting the success probability and expected impact of the project over several time horizons.

Currently, the plan is to run a limited test of the viability of the approach, on a batch of 10 project ideas, going through steps 1a-f.

Why this particular evaluation process

The most bottlenecked resource for evaluations, apart from structure, is likely the time of experts. This process is designed to utilize the time of experts in a more leveraged way, utilize the inputs from the broader community, and also to promote high-quality discussion on the EA forum. (Currently, problematic project proposals posted on the forum often attract downvotes, but rarely detailed feedback.)

Having two “opposing” reviews attempts to avoid the social costs of not being nice: by having clear roles, everyone will understand that writing an analysis which tries to find flaws and problems was part of the job. Also, it can provoke higher quality public discussion.

Splitting steps b.,c. and d. is motivated by the fact that mapping arguments is a different task than judging them.

Project ideas are on a spectrum where some are relatively robust to the choice of team, while the impact of other projects may mostly depend on the quality of the team, including the sign of the impact. By splitting the evaluation of ideas from the evaluation of (idea+team), it should be possible to communicate opinions like “this is a good idea, but you are likely not the best team to try it” with more nuance.

Overall the design space of possible evaluation processes is large, and I believe it may just be easier to run an experiment and iterate. Based on the results, it should be relatively easy to make some steps from 1.a-e simpler, omit them altogether, or make them more rigorous. Also the stage 2 process can be designed based on the stage 1 results.

Evaluators

I’m looking looking for 5-8 volunteer analysts, who will write the reviews for the second step (1b) of the process. The role is suitable for people with similar skills to generalist research analyst at OpenPhil, such as:

Expected time-commitment is about 15-20h for the first run of evaluations, and if the project continues, about 15-20h per month. The work will mostly happen online in a small team, communicating on Slack. There isn’t any remuneration, but I hope there will be events like a dinner during EA Global, or similar opportunities to meet.

Good reasons to volunteer

you want to help with alleviating an important bottleneck in the EA project ecosystem
the work experience should be useful if you are considering working as a grant evaluator, analyst, or similar

Bad reasons to volunteer

you feel some specific project by you or your friends was undeservedly rejected by existing grant-making organizations, and you want help the project

Strong reason not to volunteer

there is a high chance you will flake out from voluntary work even if you commit to do it

If you want to join, please send your linkedin/CV and a short paragraph-long description of your involvement with effective altruism to eaprojectsorg@gmail.com

Projects

In the first trial, I’d like to test the viability of the process on about 10 project ideas. You may want to propose a project idea either where you would be interested in running the project or in cases where you would want someone else to lead the project, with you helping e.g. via advice or funding. At present, it probably isn’t very useful to propose projects you don’t plan to support in some significant way.

It is important to understand that the evaluations absolutely do not come with any promise of funding. I would expect the evaluations may help project ideas which come out with positive feedback from the process, because funders or EAs earning to give or potential volunteers or co-founders may pick up the signal. Negative feedback may help with improving the projects, or having realistic expectations about necessary resources. There is also value in bad projects not happening, and negative feedback can help people to move on from dead-end projects to more valuable things.

Also it should be clear that the project evaluations will not constitute any official “seal of approval” - this is a test run of volunteer project and has not been formally endorsed by any particular organization.

I’d like to thank Max Daniel, Rose Hadshar, Ozzie Gooen, Max Dalton, Owen Cotton-Barratt, Oliver Habryka, Harri Besceli, Ryan Carey, Jah Ying Chung and others for helpful comments and discussions on the topic.

RyanCarey @ 2019-03-21T12:11 (+27)

I'm a big fan of the idea of having a new EA projects evaluation pipeline. Since I view this as an important idea, I think it's important to get the plan to the strongest point that it can be. From my perspective, there are only a smallish number of essential elements for this sort of plan. It needs a submissions form, a detailed RFP, some funders, and some evaluators. Instead, we don't yet have these (e.g. detail re desired projects, consultation with funders). But then I'm confused about some of the other things that are emphasised: large initial scale, a process for recruiting volunteer-evaluators, and fairly rigid evaluation procedures. I think the fundamentals of the idea are strong enough that this still has a chance of working, but I'd much prefer to see the idea advanced in its strongest possible form. My previous comments on this draft are pretty similar to Oliver's, and here are some of the main ones:

This makes sense to me as an overall idea. I think this is the sort of project where if you do it badly, it might dissuade others from trying the same. So I think it is worth getting some feedback on this from other evaluators (BERI/Brendon Wong). It would also probably be useful to get feedback from 1-2 funders (maybe Matt Wage? Maybe someone from OpenPhil?), so that you can get some information about whether they think your evaluation process would be of interest to them, or what might make it so. It could also be useful to have unofficial advisors.

I predict the process could be refined significantly with ~3 projects.

You only need a couple of volunteers and you know perhaps half of the best candidates, so for the purpose of a pilot, did you consider just asking a couple of people you know to do it?

I think you should provide a ~800 word request for proposals. Then you can give a much more detailed description of who you want to apply. e.g. just longtermist projects? How does this differ from the scope of EA grants, BERI, OpenPhil, etc etc? Is it sufficient to apply with just an idea? Do you need a team? A proof of concept? etc etc etc.

This would be strengthened somewhat by already having obtained the evaluators, but this may not be important.

John_Maxwell_IV @ 2019-03-22T05:13 (+15)

Since I view this as an important idea, I think it's important to get the plan to the strongest point that it can be.

It's also important not to let the perfect be the enemy of the good. Seems to me like people are always proposing volunteer-lead projects like this and most of them never get off the ground. Remember this is just a pilot.

I think this is the sort of project where if you do it badly, it might dissuade others from trying the same.

The empirical reality of the EA project landscape seems to be that EAs keep stumbling on the same project ideas over and over with little awareness of what has been proposed or attempted in the past. If this post goes like the typical project proposal post, nothing will come of it, it will soon be forgotten, and 6 months later someone will independently come up with a similar idea and write a similar post (which will meet a similar fate).

John_Maxwell_IV @ 2019-03-27T06:11 (+9)

As a concrete example of this "same project ideas over and over with little awareness of what has been proposed or attempted in the past" thing, https://lets-fund.org is a fairly recent push in the "fund fledgling EA projects" area which seems to have a decent amount of momentum behind it relative to the typical volunteer-lead EA project. What are the important differences between Let's Fund and what Jan is working on? I'm not sure. But Let's Fund hasn't hit the $75k target for their first project, even though it's been ~5 months since their launch.

The EA Hotel is another recent push in the "fund fledgling EA projects" area which is struggling to fundraise. Again, loads of momentum relative to the typical grassroots EA project--they've bought a property and it's full of EAs. What are the relative advantages & disadvantages of the EA Hotel, Let's Fund, and Jan's thing? How about compared with EA Funds? Again, I'm not sure. But I do wonder if we'd be better off with "more wood behind fewer arrows", so to speak.

Habryka @ 2019-03-20T23:57 (+24)

Peer review in a debate framework. Two referees will write evaluations, one focusing on the possible negatives, costs and problems of the proposal; and the other on the benefits. Both referees will also suggest what kind of resources a team attempting the project should have. (2-5h / analyst / project)

I thought about this stage a bunch since Jan shared me on the original draft, and I don't expect a system like this to provide useful evaluations.

When I have had good conversations about potential projects with people in EA, it's very rarely the case that the assessments of people could be easily summarized by two lists of the form "possible negative consequences" and "possible benefits". In practice I think people will have models that will output a net-positive impact or a net-negative impact, depending on certain facts that they have uncertainty about, and understanding those cruxes and uncertainties is the key thing in understanding whether a project will be worth working on. I tried for 15 minutes to write a list of just the negative consequences I expect a project to have, and failed to do so, because most of the potential negative consequences are entwined with the potential benefits of the project in a way that doesn't really make it easy to just focus on the negatives or the positives.

John_Maxwell_IV @ 2019-03-21T22:51 (+10)

This is true, but "possible negative consequences" and "possible benefits" are good brainstorming prompts. Especially if someone has a bias towards one or the other, telling them to use both prompts can help even things out.

Habryka @ 2019-03-22T02:41 (+1)

I agree. I think having reviewers that use those as prompts for their evaluations and their feedback seems like a decent choice. But I wouldn't want to force one person to only use that one prompt.

Jan_Kulveit @ 2019-03-21T04:09 (+9)

It is very easy to replace this stage with e.g. just two reviews.

Some of the arguments for the contradictory version

the point of this stage is not to produce EV estimate, but to map the space of costs, benefits, and considerations
it is easier to be biased in a defined way than unbiased
it removes part of the problem with social incentives

Some arguments against it are

such adversarial setups for truth-seeking are uncommon outside of judicial process
it may contribute to unnecessary polarization
the splitting may feel unnatural

dpiepgrass @ 2019-03-24T01:13 (+3)

Based on Habryka's point, what if "stage 1b" allowed the two reviewers to come to their own conclusions according to their own biases, and then at the end, each reviewer is asked to give an initial impression as to whether it's fund-worthy (I suppose this means its EV is equal to or greater than typical GiveWell charity) or not (EV may be positive, but not high enough).

This impression doesn't need to be published to anyone if, as you say, the point of this stage is not to produce an EV estimate. But whenever both reviewers come to the same conclusion (whether positive or not), a third reviewer is asked to review it too, to potentially point out details the first reviewers missed.

Now, if all three reviewers give a thumbs down, I'm inclined to think ... the applicant should be notified and suggested to go back to the drawing board? If it's just two, well, maybe that's okay, maybe EV will be decidedly good upon closer analysis.

I think reviewers need to be able (and encouraged) to ask questions of the applicant, as applications are likely to be have some points that are fuzzy or hard to understand. It isn't just that some proposals are written by people with poor communication skills; I think this will be a particular problem with ambitious projects whose vision is hard to articulate. Perhaps the Q&As can be appended to the application when it becomes public? But personally, as an applicant, I would be very interested to edit the original proposal to clarify points at the location where they are first made.

And perhaps proposals will need to be rate-limited to discourage certain individuals from wasting too much reviewer time?

agdfoster @ 2019-03-22T16:48 (+14)

I regularly simplify my evaluations into pros and cons lists and find them surprisingly good. Open Phil’s format essentially boils down into description, pros, cons, risks.

Check out kialo. It allows really nice nested pro con lists built of claims and supporting claims +discussion around those claims. It’s not perfect but we use it for keeping track of logic for our early stage evals / for taking a step back on evals we have gotten too in the weeds with

Habryka @ 2019-03-22T18:21 (+13)

Do you have an example of an Open Phil report that boils down to that format? This would be an update for me. I tried my best to randomly sample reports from their website (by randomly clicking around), and the three I ended up looking at didn't seem to follow that structure at all:

Ben Pace @ 2019-03-22T22:49 (+14)

I imagined Alex was talking about the grant reports, which are normally built around “case for the grant” and “risks”. Example: https://www.openphilanthropy.org/giving/grants/georgetown-university-center-security-and-emerging-technology

Habryka @ 2019-03-22T23:46 (+4)

Ah, yes. I didn't end up clicking on any of those, but I agree, that does update me a bit.

I think the biggest crux that I have is that I expect that it would be almost impossible for one person to only write the "risk" section of that article, and a separate person to only write the "case" section of the article, since they are usually framed in contrast to one another. Presenting things in the form of pro/con is very different from generating things in the form of pro/con.

Jan_Kulveit @ 2019-03-26T00:54 (+2)

My impression is you have in mind something different than what was intended in the proposal.

What I imagined was 'priming' the argument-mappers with prompts like

Imagine this projects fails. How?
Imagine this project works, but has some unintended bad consequences. What they are?
What would be a strong reason not to associate this project with the EA movement?

(and the opposites). When writing their texts the two people would be communicating and looking at the arguments from both sides.

The hope is this would produce more complete argument map. One way to think about it, is each person is 'responsible' for the pro/con section, trying to make sure it captures as much important considerations as possible.

It seems quite natural for people to think about arguments in this way, with "sides" (sometimes even single authors expose complex arguments in the "dialogue" way).

There are possible benefits - related to why 'debate' style is used in justice

It levels the playing field in interesting ways (when compared to public debate on the forum). In the public debate, what "counts" is not just arguments, but also discussion and social skills, status of participants, moods and emotions of the audience, and similar factor. The proposed format would mean both the positives and negatives have "advocates" ideally of "similar debate strength" (anonymous volunteer). This is very different from a public forum discussion, where all kinds of "elephant in the brain" biases may influence participants and bias judgements.
It removes some of the social costs and pains associated with project discussions. Idea authors may get discouraged by negative feedback, downvotes/karma, or similar.

Also, just looking at how discussions on the forum look now, it seems in practice it is easy for people to look at things from positive or negative perspectives: certainly I have seen arguments structured like (several different ways how something fails + why is it too costly if it succeeded + speculation what harm it may cause anyway).

Overall: in my words, I'm not sure whether your view is 'in the space of argument-mapping, noting in the vicinity of debate, will work - at least when done by humans and applied to real problems'. Or 'there are options in this space which are bad' - where I agree something like bullet-pointed lists of positives and negatives where the people writing them would not communicate seems bad.

John_Maxwell_IV @ 2019-03-22T05:31 (+12)

It seems like Jan is getting a lot of critical feedback, so I just want to say, big ups to you Jan for spearheading this. Perhaps it'd be useful to schedule a Skype call with Habryka, RyanCarey, or others to try & hash out points of disagreement.

The point of a pilot project is to gather information, but if information already exists in the heads of community members, a pilot could just be an expensive way of re-gathering that info. The ideal pilot might be something that is controversial among the most knowledgable people in the community, with some optimistic and some pessimistic, because that way we're gathering informative experimental data.

Jan_Kulveit @ 2019-03-20T23:26 (+10)

To make the discussions more useful, I'll try to briefly recapitulate parts of the discussions and conversations I had about this topic in private or via comments in the draft version. (I'm often coalescing several views into more general claim)

There seems to be some disagreement about how rigorous and structured the evaluations should be - you can imagine a scale where on one side you have just unstructured discussion on the forum, and on the opposite side you have "due diligence", multiple evaluators writing detailed reviews, panel of forecasters, and so on.

My argument is: unstructured discussion on the forum is something we already have, and often the feedback project ideas get is just a few bits from voting, plus a few quick comments. Also the prevailing sentiment of comments is sometimes at odds with expert views or models likely used by funders, which may cause some bad surprises. That is too "light". The process proposed here is closer to the "heavy" end of the scale. My reason is it seems easier to tune the "rigour" parameter down than up, and trying it on a small batch has higher learning value.

Another possible point of discussion is whether the evaluation system would work better if it was tied to some source of funding. My general intuition is this would create more complex incentives, but generally I don't know and I'm looking for comments.

Some people expressed uncertainty if there is a need for such system. Some because they believe that there aren't many good project ideas or projects (especially unfunded ones). Others expressed uncertainty if there is a need for such system, because they feel proposed projects are almost all good, there are almost no dangerous project ideas, and even small funders can choose easily. I don't have good data, but I would hope having largely public evaluations could at least help everyone to be better calibrated. Also, when comparing the "EA startup ecosystem" with the normal startup ecosystem, it seems we are often lacking what is provided by lead investors, incubators or mentors.

Habryka @ 2019-03-20T23:48 (+19)

(Here are some of the comments that I left on the draft version of this proposal that I was sent, split out over multiple comments to allow independent voting):

[Compared to an open setup where any reviewer can leave feedback on any project in an open setting like a forum thread] An individual reviewer and a board is much less likely to notice problems with a proposal, and a one-way publishing setup is much more likely to cause people to implement bad projects than a setup where people are actively trying to coordinate work on a proposal in a consolidated thread. In your setup the information is published in at least two locations, and the proposal is finalized and no longer subject to additional input from evaluators, which seems like it would very likely cause people to just take an idea and run with it, without consulting with others whether they are a good fit for the project.

Setting up an EA Forum thread with good moderation would take a lot less than 20 hours.

You are also planning to spend over 40 hours in evaluating projects for the first phase, which is quite costly, and you are talking about hiring people part-time. [...]

From my perspective, having this be in the open makes it a lot easier for me and other funders in the space to evaluate whether the process is going well, whether it is useful, or whether it is actively clogging up the EA funding and evaluation space. Doing this in distinct stages, and with most of the process being opaque, makes it much harder to figure out the costs of this, and the broader impact it has on the EA community, moving the expected value of this into the net-negative.

I also think that we already have far too many random EA websites that are trying to do a specialized thing, and not doing it super well. The whole thing being on a separate website will introduce a lot of trivial inconveniences into the process that could be avoided by just having all of it directly on the EA Forum.

To put this into perspective, you are requesting about a total of 75h - 160h of volunteer labor for just this first round of evaluations, not counting the input from the panel which will presumably have to include already very busy people who can judge proposals. That in itself is almost a full month of work, and you are depleting a limited resource of the goodwill of effective-altruists in doing this, and an even more limited resource of the most busy people to help with initiatives like this.

Jan_Kulveit @ 2019-03-21T00:07 (+1)

As I've already explained in the draft, I'm still very confused by what

An individual reviewer and a board is much less likely to notice problems with a proposal than a broad discussion with many people contributing would ...

should imply for the proposal. Do you suggest that steps 1b. 1d. 1e. are useless or harmful, and having just the forum discussion is superior?

The time of evaluators is definitely definitely definitely not free, and if you treat them as free then you end up exactly in the kind of situation that everyone is complaining about. Please respect those people's time.

Generally I think this is quite strange misrepresentation of how I do value people's time and attention. Also I'm not sure if you assume the time people spend arguing on fora is basically free or does not count, because it is unstructured.

Generally almost all of the process is open, so I don't see what should be changed. If the complain is the process has stages instead of unstructured discussion, and this makes it less understandable for you, I don't see why.

Habryka @ 2019-03-21T00:44 (+20)

My overall sense of this is that I can imagine this process working out, but the first round of this should ideally just be run by you and some friends of yours, and should not require 100+ hours of volunteer time. My expectation is that after you spend 10 hours trying to actually follow this process, with just one or two projects, on your own or with some friends, that you will find that large parts of it won't work as you expected and that the process you designed is a lot too rigid to produce useful evaluations.

Habryka @ 2019-03-21T00:33 (+13)

As I've already explained in the draft, I'm still very confused by what [...] should imply for the proposal. Do you suggest that steps 1b. 1d. 1e. are useless or harmful, and having just the forum discussion is superior?

I am suggesting that they are probably mostly superfluous, but more importantly, I am suggesting that a process that tries to separate the public discussion into a single stage, that is timeboxed at only a week, will prevent most of the value of public discussion, because there will be value from repeated back and forth at multiple stages in this process, and in particular value from integrating the step of finding a team for a project with the process of evaluating a proposal.

To give you an example, I expect that someone will have an idea for a project that is somewhat complicated, and will write an application trying their best to explain it. I expect for the majority of projects the evaluators will misunderstand what the project is about (something I repeatedly experienced for project proposals on the LTF-Fund), and will then spend 2-5 hours writing a negative evaluation for a project that nobody thought was a good idea. The original person who proposed the project will then comment during the public discussion stage and try to clarify their idea, but since this process currently assigns most of the time for the evaluators and board members in the evaluation stage, there won't be any real way in which he can cause the evaluators to reevaluate the proposal, since the whole process is done in batches and the evaluators only have that many hours set aside (and they already spend 2-5 hours on writing an evaluation of the proposal).

On the other hand, if the evaluators are expected to instead participate mostly in a back-and-forth discussion over the course of a week, or maybe multiple weeks, then I think most likely the evaluators would comment with some initial negative impressions of the project which would probably be written in 5-10 minutes. The person writing the proposal would respond and clarify, and then the evaluator would ask multiple clarifying questions until they have a good sense of the proposal. Ideally, the person putting in the proposal would also be the person interested in working on it, and so this back-and-forth would also allow the evaluator to determine whether this person is a good fit for the project, and allow other people to volunteer their time to participate and help with the project. The thread itself would serve as the location for other people to find interesting projects to work on, and to get up to speed on who is working on what projects.

---

I also think that assigning two evaluators to each project is a lot worse than assigning evaluators in general and allowing them to chime in when they have pre-existing models for projects. I expect that if they don't have pre-existing models in the domain that a project is in, an evaluator will find it almost impossible to write anything useful about that project, without spending many hours building basic expertise in that domain. This again suggests a setup where you have an open pool of proposals, and a group of evaluators who freely choose which projects to comment on, instead of being assigned individual projects.

Jan_Kulveit @ 2019-03-21T01:12 (+4)

I don't understand why you assume the proposal is intended as something very rigid, where e.g. if we find the proposed project is hard to understand, nobody would ask for clarification, or why you assume the 2-5h is some dogma. The back-and-forth exchange could also add to 2-5h.

With assigning two evaluators to each project you are just assuming the evaluators would have no say in what to work on, which is nowhere in the proposal.

Sorry but can you for a moment imagine also some good interpretation of the proposed schema, instead of just weak-manning every other paragraph?

Habryka @ 2019-03-21T01:39 (+18)

I am sorry for appearing to be weak-manning you. I think you are trying to solve a bunch of important problems that I also think are really important to work on, which is probably why I care so much about solving them properly and have so many detailed opinions about how to solve them. While I do think we have strong differences in opinion on this specific proposal, we probably both agree on a really large fraction of important issues in this domain, and I don't want to discourage you from working in this domain, even if I do think this specific proposal is a bad idea.

Back to the object level: I think as I understand the process, the stages have to necessarily be very rigid because they require the coordination of 5+ volunteers, a board, and a set of researchers in the community, each of which will have a narrow set of responsibilities like writing a single evaluation or having meetings that need to happen at a specific point in time.

I think coordinating that number of people gives naturally rise to very rigid structures (I think even coordinating a group of 5 full-time staff is hard, and the amount of structure goes up drastically as individuals can spend less time), and your post explicitly says that step 1.c, is the step in which you expect back and forth with the person who proposed the project, making me think that you do not expect back and forth before that stage. And if you do expect back-and-forth before that stage, then I think it's important that you figure out a way to make that as easy as possible, and given the difficulty of coordinating large numbers of people, I think if you don't explicitly plan for making it easy, it won't happen and won't be easy.

Jan_Kulveit @ 2019-03-21T02:07 (+3)

I don't see why continuous coordination of a team of about 6 people on slack would be very rigid, or why people would have very narrow responsibilities.

For the panel, having some defined meeting and evaluating several projects at once seems time and energy conserving, especially when compared to the same set of people watching the forum often, being manipulated by karma, being in a way forced to reply to many bad comments, etc.

Habryka @ 2019-03-21T00:38 (+12)

Generally almost all of the process is open, so I don't see what should be changed. If the complain is the process has stages instead of unstructured discussion, and this makes it less understandable for you, I don't see why.

One part of the process that is not open is the way the evaluators are writing their proposals, which is as I understand it where the majority of person-time is being spent. It also seems that all the evaluations are going to be published in one big batch, making it so that feedback on the evaluation process would take until the complete next grant round to be acted on, which is presumably multiple months into the future.

The other process that is not open are these two stages:

1d. A panel will rate the proposal, utilizing the information gathered in phases b. and c., highlighting which part of the analysis they consider particularly important. (90m / project)

1e. In case of disagreement among the panel, the question will get escalated and discussed with some of the more senior people in the field.

I expect the time of the panel, as well as the time of the more senior people in the field are the most valuable resources that could be wasted by this process, and the current process gives very little insight into whether that time is well-spent or not. In a simple public forum setup, it would be easy to see whether the overall process is working, and whether the contributions of top people are making a significant difference.

Jan_Kulveit @ 2019-03-21T00:50 (+2)

With the first part, I'm not sure what would you imagine as the alternative - having access to evaluators google drive so you can count how much time they spent writing? The time estimate is something like an estimate how much it can take for volunteer evaluators - if all you need is in the order of 5m you are either really fast or not explaining your decisions.

I expect much more time of experts will be wasted in forum discussions you propose.

Habryka @ 2019-03-21T01:27 (+8)

I think in a forum discussion, it's relatively easy to see how much someone is participating in the discussion, and to get a sense of how much time they spent on stuff. I am not super confident that less time would be wasted in the forum discussions I am proposing, but I am confident that I and others would notice if lots of people's time was wasted, which is something I am not at all confident about for your proposal and which strongly limits the downside for the forum case.

Jan_Kulveit @ 2019-03-21T01:56 (+3)

On the contrary: on slack, it is relatively easy to see the upper bound of attention spent. On the forum, you should look not on just the time spent to write comments, but also on the time and attention of people not posting. I would be quite interested how much time for example CEA+FHI+GPI employees spend reading the forum, in aggregate (I guess you can technically count this.)

Habryka @ 2019-03-21T02:04 (+4)

*nods* I do agree that you, as the person organizing the project, will have some sense of how much time has been spent, but I think it won't be super easy for you to communicate that knowledge, and it won't by default help other people get better at estimating the time spent on things like this. It also requires everyone watching to trust you to accurately report those numbers, which I do think I do, but I don't think everyone necessarily has reason to.

I do think on Slack you also have to take into account the time of all the people not posting, and while I do think that there will be more time spent just reading and not writing on the EA Forum, I generally think the time spent reading is usually worth it for people individually (and importantly people are under no commitment to read things on the EA Forum, whereas the volunteers involved here would have a commitment to their role, making it more likely that it will turn out to be net-negative for them, though I recognize that there are some caveats where sometimes there are controversial topics that cause a lot of people to pay attention to make sure that nothing explodes).

Habryka @ 2019-03-21T00:41 (+1)

Nevermind.

Habryka @ 2019-03-21T00:10 (+15)

To respond more concretely to the "due diligence" vs. unstructured discussion section, which I think refers to some discussion me and Jan had on the Google doc he shared:

I think the thing I would like to see is something that is just a bit closer towards structured discussion than what we currently have on the forum. I think there doesn't currently exist anything like an "EA Forum Project discussion thread" and in particular not one that has any kind of process like

"One suggestion for a project per top-level comment. If you are interested in working on a project, I will edit the top-level post to reflect that you are interested in working on it. If you want to leave a comment anonymously, please use this form."

I think adding a tiny bit of process like this will cause there to be valuable discussion, will actually be better at causing good projects to be funded and for teams to start working on it, and is much less effort to set up than the process you are proposing here.

I am also worried that this process, even though it is already 7 stages long and involves at least 10 people, only covers less than half of the actual pipeline towards causing people to work on projects. I know that you want to explicitly separate the evaluation of projects from the evaluation of teams working on those project, but I don't think you can easily do that.

I think 90% of the time whether a project is good or bad depends on the team that wants to work on it, which is something that you strongly see reflected in the startup and investment world. It's extremely rare for a VC to fund or even evaluate a project without knowing what team is working on it, and I think you will find that any evaluation that doesn't include the part of matching up the teams with the projects will find that that part will quickly block any progress on this.

aarongertler @ 2019-03-21T07:10 (+16)

I share Habryka's concern for the complexity of the project; each step clearly has a useful purpose, but it's still the case that adding more steps to a process will tend to make it harder to finish that process in a reasonable amount of time. I think this system could work, but I also like the idea of running a quick, informal test of a simpler system to see what happens.

Habryka, if you create the "discussion thread" you've referenced here, I will commit to leaving at least one comment on every project idea; this seems like a really good way to test the capabilities of the Forum as a place where projects can be evaluated.

(It would be nice if participants shared a Google Doc or something similar for each of their ideas, since leaving in-line comments is much better than writing a long comment with many different points, but I'm not sure about the best way to turn "comments on a doc" into something that's also visible on the Forum.)

Habryka @ 2019-03-21T22:03 (+10)

I am currently quite busy, so only 50% on me finding the time to do it, but I will seriously look into making the time for this. I am also happy to chat to anyone else who wants to do this, and help out both with setting it up, and to participate in the thread.

Jan_Kulveit @ 2019-03-21T22:55 (+7)

FWIW, part of my motivation for the design, was

1. there may be projects, mostly in long-term, x-risk, meta- and outreach spaces, which are very negative, but not in an obvious way

2. there may be ideas, mostly in long-term and x-risk, which are infohazard

The problem with 1. is most of the EV can be caused by just one project, with large negative impact, where the downside is not obvious to notice.

It seems to me standard startup thinking does not apply here, because startups generally can not go way bellow zero.

I also do not trust arbitrary set of forum users to handle this well.

Overall I believe the very lightweight unstructured processes are trading some gain in speed and convenience in most cases for some decreased robustness in worst cases.

In general I would feel much better if the simple system you want to try would avoid projects in long-term, x-risk, meta-, outreach, localization, and "searching for cause X" areas.

Habryka @ 2019-03-20T23:40 (+13)

Here are some of the comments that I left on the draft version of this proposal that I was sent (split out over multiple comments to allow independent voting):

I continue to think that just having an open discussion thread, with reviewers participating in the discussion with optional private threads, will result in a lot more good than this.

Based on my experience with the LTF-Fund, I expect 90% of the time there will be one specific person who you need a 5 minute judgement from to judge a project, much more than you need a 2-5h evaluation. This makes an open setup where all evaluators can see all applications and provide input on the things they are particularly suited to contribute to a lot more valuable than an assignment process.

A simple invite-only, or fully open, discussion is also much easier to test than a more elaborate evaluation system, and I think you are overestimating the risk from infohazards and PR risk after some initial screening.

I do think it is important to allow reviewers to be completely anonymous when participating in the discussion.

After some more discussion:

My perspective is more that the simple intervention of:

"Create an EA Forum projects thread, with some incentive for people to leave reviews of projects"

should be tried before you do something as complicated as this. I agree that the resulting incentives can be messy, but I expect we will get a lot more data and information on what is important than we would by spending 20-50 hours of competent-person time on producing reviews plus setting up a vetting process, plus setting up a website, plus setting up a panel, plus setting up an infohazard policy before we try the obvious solution to the problem that takes 5 hours to implement.

[...]

I am pretty excited about someone just trying to create and moderate a good EA Forum thread, and it seems pretty plausible to me that the LTF fund would be open to putting something in the $20k ballpark into incentives for that

Jan_Kulveit @ 2019-03-21T00:52 (+7)

I would be curious about you model why the open discussion we currently have does not work well - like here, where user nonzerosum proposed a project, the post was heavily downvoted (at some point to negative karma) without substantial discussion of the problems. I don't think the fact that I read the post after three days and wrote some basic critical argument is a good evidence for an individual reviewer and a board is much less likely to notice problems with a proposal than a broad discussion with many people contributing would.

Also when you are making these two claims

Setting up an EA Forum thread with good moderation would take a lot less than 20 hours.

...

I am pretty excited about someone just trying to create and moderate a good EA Forum thread, and it seems pretty plausible to me that the LTF fund would be open to putting something in the $20k ballpark into incentives for that

at the same time I would guess it probably needs more explanation from you or other LTF managers.

Generally I'm in favour of solutions which are quite likely to work as opposed to solutions which look cheap but are IMO likely worse.

I also don't see how complex discussion on the forum with the high quality reviews you imagine would cost 5 hours. Unless, of course, the time and attention of the people who are posting and commenting on the forum does not count. If this is the case, I strongly disagree. The forum is actually quite costly in terms of time, attention, and also emotional impacts on people trying to participate.

Habryka @ 2019-03-21T01:23 (+33)

I also don't see how complex discussion on the forum with the high quality reviews you imagine would cost 5 hours.

I think an initial version of the process, in which you plus maybe one or two close collaborators, would play the role of evaluators and participate in an EA Forum thread, would take less than 5 hours to set up and less than 15 hours of time to actually execute and write reviews on, and I think would give you significant evidence about what kind of evaluations will be valuable and what the current bottlenecks in this space are.

I would be curious about you model why the open discussion we currently have does not work well - like here, where user nonzerosum proposed a project, the post was heavily downvoted (at some point to negative karma) without substantial discussion of the problems.

I think that post is actually a good example of why a multi-stage process like this will cause a lot of problems. I think the best thing for nonzerosum to do would have been to create a short comment or post, maybe two to three paragraphs, in which he explained the basic idea of a donor list. At this point, he would have not been super invested in it, and I think if he had posted only a short document, people would have reacted with openness and told him that there has been a pretty long history of people trying to make lots of EA donor coordination platforms, and that there are significant problems with unilateralist curse-like problems. I think the downvotes and negative reaction came primarily from people perceiving him to be prematurely charging ahead with a project.

I do think you need some additional incentive for people to actually write up their thoughts in addition to just voting on stuff, which is why a volunteer evaluator group, or maybe some kind of financial incentive, or maybe just some kind of modifications to the forum software (which I recognize is not something you can easily do but which I have affordances for), is a good idea. But I do think you want to be very hesitant to batch the reviews too much, because as I mentioned elsewhere in the thread, there is a lot of value from fast feedback loops in this evaluation process, as well as allowing experts in different domains to chime in with their thoughts.

And we did see exactly that. I think the best comment (next to yours) on that post is Ben West's comment and Aaron Gertler's comments that were both written relatively soon after the post was written (and I think would have been written even if you hadn't written yours) and concisely explained the problems with the proposal. I don't think a delay of 2-3 days is that bad, and overall I think nonzerosum successfully received the feedback that the project needed. I do think I would like to ensure that people proposing projects feel less punished by doing so, but I think that can easily be achieved by establishing a space in which there is common knowledge that a lot of proposals will be bad and have problems, and that a proposal being proposed in that space does not mean that everyone has to be scared that someone will rush ahead with that proposal and potentially cause a lot of damage.

If I understood your setup correctly, it would have potentially taken multiple weeks for nonzerosum to get feedback on their proposal, and the response would have come in the form of an evaluation that took multiple hours to write, which I don't think would have benefited anyone in this situation.

Raemon @ 2019-03-23T02:11 (+9)

Mild formatting note: I found the introduction a bit long as well as mostly containing information I already knew.

I'm not sure how to navigate "accessible" vs "avoid wasting people's time". But I think you could have replaced the introduction with a couple bullet links, like:

....

Building off of:

– What to Do With People?

– Can EA copy Teach For America?

– EA is Vetting Constrained

The EA community has plenty of money and people, but is bottlenecked on a way to scalably evaluate new projects. I'd like to start a platform that:

provides feedback on early stage projects
estimates what resources would be necessary to start a given project
further in the project's life-cycle, evaluating the team and idea fit.

...

Or, something like that (I happen to like bullet points, but in any case seems like it should be possible to cut the opening few paragraphs into a few lines)

oliverbramford @ 2019-03-27T18:39 (+7)

Meta-suggestion: In-person, professionally facilitated small workshop, sponsored and hosted by CEA, to build-consensus around a solution to the EA project bottleneck - with a view to CEA owning the project.

There are a range of carefully-considers, well-informed, and somewhat divergent perspectives on how to solve the EA project bottleneck. At the same time, getting the best version possible of a solution to the EA project bottleneck is likely to be very high value; a sub-optimal version may represent a large counterfactual loss of value.

As an important and complex piece of EA infrastructure, this seems to be a good fit for CEA to own. CEA is well-placed to lend legitimacy and seed funding to such a project, so that it has the ongoing human and financial resources and credibility to be done right.

It also seems quite likely that an appropriate - optimally impactful - solution to this problem would entail work beyond a project evaluation platform (e.g. a system to source project ideas from domain experts; effective measures to dampen overly-risk projects). This kind of 'scope creep' would be hard for an independent project to fulfil, but much easier for CEA to execute, given their network and authority.

sudhanshu_kasewa @ 2019-03-22T00:08 (+6)

Reposting my message to Jan after reading a draft of this post. I feel like I’m *stepping into it* with this, but I’m keen to get some feedback of my own on the thoughts I’m expressing here.

Epistemic status: confident about some things, not about all of them.

Emotional state: Turbulent/conflicted, triggering what I label as fear-of-rejection as I don’t think this confirms to many unstated standards; overruled, because I must try and be prepared to fail, for I cannot always learn in a vaccuum.

Why piggy-back off this post: Because 1) it’s related, 2) my thoughts are only partially developed, 3) I did not spend longer than an hour on this, so it doesn’t really warrant its own post, but 4) I think it can add some more dimensions to the discussions at hand.

Please be kind in your feedback. I prefer directness to euphemism, and explanations in simpler words rather than in shibboleths.

Edited for typos and minor improvements.

-----------

Hey Jan!

Thanks for leading the charge on the EA project platform project. I agree with your motivation, that community members can benefit from frequent, high-resolution, nuanced feedback, on their ideas (project or otherwise). I've left some comments / edits on your doc, but feel free to ignore them.

Also feel free to ignore the rest of this, but I thought I'd put down a few thoughts anyway:

I think separating project evaluation from team evaluation is a good idea; however, doing a project well runs deeper than just being able to select the right team for the job. It's also about assigning/communicating the right responsibilities to the team members, along with how they can expect those to change over time as the project grows. Some people might be good at managing a project when it's a tiny team, but may not be cut out to be CEO when there are 20 (or maybe 50) people involved, and should be prepared to take on roles better suited to their gifts as a project grows. This, I feel, is a difficult message to communicate, especially to eager young effective altruists who can see that most of their community heroes/leaders are still in or barely out of their 20s; this seems like a form of survivor bias coupled with EA being a genuinely young movement populated with generally young people ; many who got into EA in their teens 5 or 10 years ago are now in positions of prestige, and this is (I feel) an aspirational but ultimately untenable dream for most of today's new EAs.
I think one of your reasons for having step 1b in place is to generate some substance around a project to encourage engagement for the community once it is released on a platform / forum. Unfortunately, it is also the bit that takes up the most amount of time from a very small pool of highly skilled volunteers, and is ultimately the same bottleneck faced by grant makers and analysts everywhere. If we find another form of promoting engagement, would this step be necessary?
Continuing from above, could the following system work? I know you said the design space of interventions is large, but here goes anyway:

1) On EA forum, set up a separate space where project ideas can be posted anonymously; this is phase 1. Basic forum moderation (for info hazards, extremely low quality content) applies here.

2a) Each new post starts out with the following tags: Needs Critics , Needs Steelmen , Needs Engagement

2b) Community members can then (anonymously) either critique, steelman, or otherwise review the proposal ; karma scores for the reviewer can be used to signal the weight of the review (and the karma scores can be bucketed to protect their anonymity)

2c) The community can also engage with the proposal, critiques, steelmen with up/down votes or karma or whatever other mechanism makes sense

2d) As certain \sigma (reviewer karma + multiplier*review net votes) thresholds are crossed, the proposal loses its Needs Critics / Needs Steelmen / Needs Engagement tags as appropriate. Not having any more of those tags means that the proposal is now ready to be evaluated to progress to a real project. A net of Steelman Score - Critic Score or something can signal its viability straight out the gate.

3a) Projects that successfully manage to lose all three tags automatically progress to phase 2, where projects solicit funding, mentorship, and team members. Once here, the proposer is deanonymised, and reviewers have the option to become deanonymised. The proposal is now tagged with Proposal Needs Revision, Needs Sponsors, Needs Mentorship, Needs Team Members

3b) The proposer revises the proposal in light of phase 1, and adds a budget, an estimate of initial team roles + specifics (skills / hours / location / rate). This can be commented on and revised, but can only be approved by (insert set of chosen experts here). Experts can also remove the proposal from this stage with a quick review, much like an Area Chair (e.g. by noting that the net score from phase 1 was too low, indicating the proposal requires significant rework). This is the only place where expert intervention is required, by which time the proposal has received a lot of feedback and has gone through some iterations. After an expert approves, the proposal now loses its Needs Proposal Revision tag, the proposal is for all intents frozen.

3c) Members with karma > some threshold can volunteer (unpaid) to be mentors, in the expectation that they provide guidance to the project of up to some X hours a week ; a project cannot lose this tag till it has acquired at least Y mentors.

3d) All members can choose to donate towards / sponsor a project via some charity vehicle (CEA?) under the condition that their money will be returned, or their pledge won't be due, until the required budget is not exceeded... I'm unfamiliar with how something like this can work in practice though.

3e) Members can (secretly) submit their candidature for the roles advertised in the proposal, along with CVs ; proposer, mentors, and high-value sponsors (for some specification of high-value) can vet, interview, select, and secure team members from this pool, and then a mentor can remove the Needs Team Members tag.

3f) Once a project loses all four tags in phase 2, it has everything it needs to launch. Team members are revealed, proposer (with mentor / CEA / local EA Chapter support) sets up the new organization, the charity vehicle transfers the money to the organization, that then officially hires the staff, and the party gets started.

4) Between mentors, high-value sponsors, and proposer, they submit some manner of updates on the progress of the project on the forum every Z months.

Essentially the above looks like an almost entirely open, community driven exercise, but it doesn't resolve how we get engagement (in steps 2b, 2c) in the first place. I don't have an answer to that question, but I think requesting the community to be Steelmen or Critics will signal that we need more than up/downvotes on this particular item. LessWrong has a very strong commenter base, so I suspect the denizens of the EA Forum could rise up to the task.

Of course, after having written this all up, I can see how implementing this on a platform might take a little more time. I'm not a web developer, so I'm not sure how easy it is to implement all the above logic flows (e.g. when is someone anonymous v/s when are they not?), but I estimate it might take a week (between process designers and the coders) to draw out (and iterate on) all the logic and pathways, two weeks to implement in code, and another week to test.

Sorry for all this, I hope it wasn't complete waste of your time. If I've overstayed my welcome in your mindspace, or overstepped in my role as 'someone random on the internet giving me feedback', I deeply apologise. Know that writing all this down was cathartic for me, so in the worst case, I did it selfishly, but not maliciously.

Jan_Kulveit @ 2019-03-22T00:30 (+10)

Thanks Sundanshu! Sorry for not replying sooner, I was a bit overwhelmed by some of the negative feedback in the comments.

I don't think step 1b. has the same bottleneck as current grant evaluator face, because it is less dependent on good judgement.

With your proposal, I think part of it may work, I would be worried about other parts. With step 2b I would fear nobody would feel responsible for producing the content.

With 3a or any automatic steps like that, what does that lack is some sort of (reasonably) trusted expert judgement. In my view this is actually the most critical step in case of x-risk, long-term, meta-, and similarly difficult to evaluate proposals.

Overall

I'm sceptical the karma or similar automated system is good for tracking what is actually important here
I see some beauty in automation, but I don't see it applied here in the right places

Jan_Kulveit @ 2019-03-22T02:30 (+4)

Summary impressions so far: object-level

It seems many would much prefer expediency in median project cases to robustness and safety in rare low frequency possibly large negative impact cases. I do not think this is the right approach, when the intention is also to evaluate long-term oriented, x-risk, meta-, cause-X, or highly ambitious projects.
I'm afraid there is some confusion about project failure modes. I'm more worried about projects which would be successful in having a team, working successfully in some sense, changing the world, but achieving large negative impact in the end.
I feel sad about the repeated claims the proposal is rigid, costly or large-scale. If something would not work in practice it could be easily changed. Spending something like 5h of time on a project idea which likely was result of much longer deliberation and which may lead to thousands hours of work seems reasonable. Paradoxically, just the discussion about whether the project is costly or not likely already had higher cost than what setting the whole proposed infrastructure for the project + phases 1a,1d,1c would cost.

Meta:

I will .not have time to participate in the discussion in next few days. Thanks for the comments so far.

yhoiseth @ 2019-03-27T10:21 (+1)

In a possible second stage, if a team forms around a project idea, it will go through similar evaluation, focusing on the fit between the team and the idea, possibly with the additional step of a panel of forecasters predicting the success probability and expected impact of the project over several time horizons.

I'd just like to mention I've co-founded Empiricast, which would be helpful in this step. The software is already being used by some EAs. I'd be happy to answer any questions or discuss further how we can help.