Discussion about AI Safety funding (FB transcript)

By Akash @ 2023-04-30T19:05 (+104)

Kat Woods recently wrote a Facebook post about Nonlinear's new funding program.

This led to a discussion (in the comments section) about funding norms, the current funding bar, concerns about lowering the bar, and concerns about the current (relatively centralized) funding situation.

I'm posting a few of the comments below. I'm hoping this might promote more discussion about the funding landscape. Such discussion could be especially valuable right now, given that:

Many people are starting to get interested in AI safety (including people who are not from the EA/rationalist communities)
AGI timeline estimates have generally shortened
Investment in overall AI development is increasing quickly
There may be opportunities to spend large amounts of money in the upcoming year (e.g., scalable career transition grant programs, regranting programs, 2024 US elections, AI governance/policy infrastructure, public campaigns for AI safety).
Many ideas with high potential upside also have noteworthy downside risks (phrased less vaguely, I think that among governance/policy/comms projects that have high potential upside, >50% also have non-trivial downside risks).
We might see pretty big changes in the funding landscape over the next 6-24 months
- New funders appear to be getting interested in AI safety
- Governments are getting interested in AI safety
- Major tech companies may decide to invest more resources into AI safety

Selected comments from FB thread

Note: I've made some editorial decisions to keep this post relatively short. Bolding is added by me. See the full thread here. Also, as usual, statements from individuals don't necessarily reflect the views of their employers.

Kat Woods (Nonlinear)

I often talk to dejected people who say they tried to get EA funding and were rejected
And what I want to do is to give them a rousing speech about how being rejected by one funder doesn't mean that their idea is bad or that their personal qualities are bad.
The evaluation process is noisy. Even the best funders make mistakes. They might just have a different world model or value system than you. They might have been hangry while reading your application.
That to succeed, you'll have to ask a ton of people, and get a ton of rejections, but that's OK, because you only need a handful of yeses.

(Kat then describes the new funding program from Nonlinear. TLDR: People submit an application that can then be reviewed by a network of 50+ funders.)

Claire Zabel (Program Officer at Open Philanthropy)

Claire's comment:

(Claire quoting Kat:) The evaluation process is noisy. Even the best funders make mistakes. They might just have a different world model or value system than you. They might have been hangry while reading your application.
(Claire's response): That's true. It's also possible the project they are applying for is harmful, but if they apply to enough funders, eventually someone will fund the harmful project (unilateralist's curse). In my experience as a grantmaker, a substantial fraction (though certainly very far from all) rejected applications in the longtermist space seem harmful in expectation, not just "not cost-effective enough"

Selected portions of Kat's response to Claire:

1. We’re probably going to be setting up channels where funders can discuss applicants. This way if there are concerns about net negativity, other funders considering it can see that. This might even lead to less unilateralist curse because if lots of funders think that the idea is net negative, others will be able to see that, instead of the status quo, where it’s hard to know what other funders think of an application.

2. All these donors were giving anyways, with all the possibilities of the unilateralist’s curse. This just gives them more / better options to choose from. From this alone, it actually might lead to less net-negative projects being funded because smaller funders have access to better options.

4. Big EA funders are also composed of fallible humans who might also miss large downside risk projects. This system could help unearth downside risks that they hadn’t thought of. There’s also the possibility of false alarms, where they thought something was net negative when it wasn’t.

Given how hard it is to evaluate ideas/talent in in AI safety, I think we get better outcomes if we treat it less like bridge-building and more like assessing startups. Except harder! At least YCombinator finds out years later if any of their bets worked. With AI safety, we can’t even agree if Eliezer or Paul are net positive!

Larissa's response to Claire:

Over time I've become somewhat skeptical of people who talk about the harm from other people's projects in this way. It seems like it is used as an argument to centralize decision making to a small group and one that at this point I'm not sure has a strong enough track record. In the EA movement building space, the people I heard this from the most are the people who've now themselves caused the most harm to the EA movement. I think it's plausible that a similar thing is true the in AI spaces.

Kerry's response to Claire:

Historically, this kind of argument has been weaponized to centralize funding in the hands of Open Phil and Open Phil-aligned groups. I think it's important that funding on AI-related topics not be centralized in this way as Open Phil is a major supporter of AGI danger labs via support for Open AI and more recently the strong connections to Anthropic.
Let's inform donors about the risk that grants could cause harm and trust them to make sensible decisions on that basis.

Caleb Parikh (Executive Director of EA Funds)

I’d be interested in hearing examples of good AI safety projects that failed to get funding in the last year. I think the current funders are able to fund things down to the point where a good amount of things being passed on are net negative by their lights or have pretty low upside.

Kat's response to Caleb:

The most common examples are people who get funding but aren't fully funded. This happens all the time. This alone means that the Nonlinear Network can add value.
I think for the ones where the funders feel like there isn’t a ton of upside, that pretty straightforwardly should still have other people considering funding them. The big funders will often be wrong. Not because they aren’t amazing at their jobs (which I think they are), but because of the nature of the field. Successful investors miss opportunities all the time, and we should expect the nonprofit world to be worse at this because of even worse feedback loops, different goals, and a very inefficient market.
And even for the people who do think that an idea is net negative - how confident are they in that? You’d have to be quite confident that something is bad to think that other people shouldn’t even be able to think for themselves about the idea. That level of confidence in a field like AI safety seems unwarranted.
Especially given that if you asked 100 informed, smart, and value aligned EAs, you’d rarely get over 50% of people thinking it’s net negative. It’s really hard to get EAs to agree on anything! And for most of the ideas that some people think are net negative, you’d have a huge percentage of EAs thinking it’s net positive.
Because evaluating impact is *hard*. We should expect to be wrong most of the time. And if that’s the case, it’s better to harness the collective wisdom of a larger number of EAs, to have a ton of uncorrelated minds trying different strategies and debating and trying to seek truth. To not be overly confident in our own ability to evaluate talent/ideas, and to encourage a thriving marketplace of EA ideas.

Thomas Larsen's response to Caleb

(note: Thomas is a grant evaluator at EA funds):

Ways to turn $$$ into impact that aren't happening:
1. Funding CAIS more (to pay their people more, to hire more people, etc)
2. Funding another evals org
3. Creating a new regranting program
4. Increasing independent alignment researcher salary to like 150k/year (depending on location) to enable better time money tradeoffs.
5. Just decreasing the funding bar for passive applications -- a year ago the funding bar was lower, and there are grants that would have been funded (and are confidently net positive EV) yet are below the current bar
Seems to me that if EA has 10B in the bank, and timelines are 10 years, it's not unreasonable to spend 1B / year right now, and my guess is we currently spend ~100-200M / year.

Akash (that's me)

Lots of the comments so far seem to be about the funding bar; I think there's also a lot to be said about barriers to applying, missed opportunities, and the role of active grantmaking.
For instance, I got the sense that many of the valuable FTX regrants were grants that major funders would have funded. So sometimes people would say things like "this isn't counterfactual because LTFF would've just funded it."
But in many cases, the grants *were* counterfactual, because the grantee would've never thought to apply. The regranting program did lower the bar for funding, but it also created a culture of active grantmaking, proactively finding opportunities, and having people feel like it was their responsibility to find ways to turn money into impact.
My impression is that LTFF/OP spends a rather small fraction of time on active grantmaking. I don't have enough context to be confident that this is a mistake, but I wouldn't be surprised if most of the value being "left on the table" was actually due to lack of active grantmaking (as opposed to EG the funding bar being too high).
Things LTFF/OP could do about this:
1. Have more programs/applications that appeal to particular audiences (e.g., "mech interp fund", "career transition fund", "AIS Hub Travel Grant"
2. Regranting program
3. More public statements around what kind of things they are interested in funding, especially stuff that lots of people might not know about. I think there's a lot of Curse of Knowledge going on, where many grantees don't know that they're allowed to apply for X.
4. Hiring someone friendly/social/positive-vibesy to lead active grantmaking. Their role would be to go around talking to people and helping them brainstorm ways they could turn money into impact.
5. Have shorter forms where people can express interest. People find applying to LTFF/OP burdensome, no matter how much people try to say it's supposed to be unburdensome. Luke's recent "interest form for AI governance" seems like a good template IMO.

Note: Since Kat's post is public, I didn't ask for permission to post peoples' comments on LessWrong. I think this is the right policy, but feel free to DM me if you disagree.

Chris Leong @ 2023-04-30T20:15 (+24)

I suspect creating funds with clearer guidelines for what would be considered reasonable would help. For example, many people might not apply for an AIS Hub Travel Grant to the Bay Area because when you start looking into how expensive finding somewhere to stay in the Bay Area is, it starts to look ridiculous.

Ryan Kidd @ 2023-05-02T05:07 (+21)

Copying over the Facebook comments I just made.

Response to Kat, intended as a devil's advocate stance:

As Tyler said, funders can already query other funders regarding projects they think might have been rejected. I think the unilateralist's curse argument holds if the funding platform has at least one risk-averse big spender. I'm particularly scared about random entrepreneurs with typical entrepreneurial risk tolerance entering this space and throwing money at projects without concern for downsides, not about e.g. Open Phil + LTFF + Longview accessing a central database (though such a database should be administered by those orgs, probably).
I'm very open to hearing solutions to the risk of a miscommunication-induced unilateralist's curse. I think a better solution than a centralized funding database would be a centralized query database, where any risk-averse funders can submit a request for information to a trusted third party, who knows every proposal that was rejected by all parties and can connect the prospective funders with the funders who rejected the proposal for more information. This reduces the chances that potentially risk-tolerant funders get pinged with every grant proposal but increases the chances that risk-averse funders request information that might help them reject too-risky proposals. I know it's complicated, but it seems like a much better mechanism design if one is risk-averse.
It seems pretty unlikely that small projects will get noticed or critiqued on the EA Forum, but low-quality small projects might be bad en masse. Future Fund gave a lot of money to projects and people that got low visibility but might have contributed to "mass movement building concerns" around "diluted epistemics", "counterfactually driving the wheel of AI hype and progress," and "burning bridges for effective outreach."
Open-sourcing funding analysis is a trade-off between false positives and false negatives for downside risk. Currently, I'm much more convinced that Open Phil and other funders are catching the large downside projects than an open source model would avoid these. False alarms seem safer than downside risk to me too, but this might be because I have a particularly low opinion of entrepreneurial risk tolerance and feel particularly concerned about "doing movement building right" (happy to discuss MATS' role in this, btw).

A few key background claims:

AI safety is hard to do right, even for the experts. Doing it wrong but looking successful at it just makes AI products more marketable but doesn't avert AGI tail-risks (the scary ones).
The market doesn't solve AI risk by default and probably makes it worse, even if composed of (ineffectively) altruistic entrepreneurs. Silicon Valley's optimism bias can be antithetical to a “security mindset." “Deploy MVP + iterate” fails if we have to get it right first on the first real try. Market forces cannot distinguish between AI “saints” and “sycophants" unaided.
Big AI x-risk funders are generally anchoring on the "sign-value" of impact rather than "penny-pinching" when rejecting projects. Projects might sometimes get submaximal funding because the risk/benefit ratio increases with scale.

Larks @ 2023-05-01T16:37 (+17)

We’re probably going to be setting up channels where funders can discuss applicants. This way if there are concerns about net negativity, other funders considering it can see that. This might even lead to less unilateralist curse because if lots of funders think that the idea is net negative, others will be able to see that, instead of the status quo, where it’s hard to know what other funders think of an application.

I'm not convinced this is a very satisfactory mechanism. It can be very hard to share negative feedback, especially as this is often very personal (e.g. the project idea is good but the applicant is untrustworthy, or they would mess it up very badly and spoil the water for another attempt, etc.) and with a sufficiently large pool of funders viewing your written evaluation the risk that the applicant is informed could become unacceptably high.

MichaelStJules @ 2023-05-01T18:31 (+4)

Why is it hard to share negative feedback? Would it be bad in expectation if the applicant is informed?

With more funders, it could be closer to anonymous. Even some internal discussion could be made anonymous if there are concerns about leaks.

Ofer @ 2023-05-02T07:04 (+4)

The local incentives people face often discourage publicly giving negative feedback that may cause an applicant to not get funding. ("I would gain nothing from giving negative feedback, and that person might hate me.")

Jason @ 2023-05-01T01:32 (+9)

Asking people in the funding network to discuss applications they think are net negative seems like a big ask. At present, I assume that each grantmaker has a pass/fail evaluation function in connection to their own funding bar.

I'm sure this is simplified/imprecise/overquantified, but their thought process might look something like this: As the grantmaker evaluates each application, they implicitly develop a point estimate with a confidence interval (let's say 90% CI for this example). As the grantmaker spends more time on evaluation, the CI shrinks but may remain quite broad. The grantmaker will prematurely abort their evaluation process if the entire CI is below their funding bar at any time. At that point, the risk of erroneous rejecting the application is low enough that devoting further resources to the evaluation is not a good use of their time from the perspective of their grantmaking program.

However, the grantmaker may often reach the conclusion that the entire 90% CI is below the funding bar before they investigate deeply enough to reach a sufficiently firm conclusion that the project is net-negative that they would feel comfortable voicing that. Understandably, many grantmakers will want to be fairly confident that an application is net-negative before making a semi-public statement that could unilaterally torpedo the applicant and/or application.

As can be seen in some of the comments above, a grantmaker who calls out a grant to others is also exposing themselves to reputational risk, which (being human) is another reason they might feel the need to mitigate by conducting additional evaluation on a grant they already decided to reject. ^[1] Plus there's the work of writing up a rationale in enough detail to mitigate reputational risk and actually persuade others. Therefore, it seems that asking grantmakers to identifiably communicate their net-negativity findings would be asking them to do a meaningful amount of extra work. That would detract to some extent from the time they spend evaluating grants they might actually make.

I would be more inclined to rely on a more robust process than mere communication. If we assume all funders are "informed, smart, and value aligned EAs,"^[2] then a random screening panel could be an option. Conditional on all three members selecting best estimate is net harmful, ^[3] what are the odds that asking 100 "informed, smart, and value aligned EAs" would yield a majority finding the proposal net positive? ^[4] Probably low enough to justify refusing to send the application on to the funding network. That will lead to erroneous rejections, but that risk has to be weighed against the probability that the proposal truly is net harmful in expectation and that forwarding it onward risks a unilateralist funding it.

Of course, one could use a randomish-sample-of-evaluators method in other ways, such as asking each funder who evaluates an application to anonymously rate it as likely net beneficial, likely net negative, or declines to opine. The current balance could be displayed with the application, and if the balance ever reveals a high enough probability that a majority of the whole group would vote net harmful, the proposal is removed. That has a lower chance than a screening panel of removing a majority-net-harmful application; maybe the unilateralist will be one of the first few evaluators. But it at least mitigates some of the disadvantages of expecting grantmakers who think the application net negative to take extra time away from their own grants and incur personal reputational risk for the collective good of mitigating unilateralist risk.

^{^}
This is not inconsistent with the idea that a substantial fraction of applications are overall harmful in expectation. It is much easier to be confident on a shallow review that (say) about 25% of all applications one sees would be net harmful than to be confident on a shallow review that a specific application would be net harmful.
^{^}
If not all funders would qualify, that raises other concerns.
^{^}
A 2-1 split in either direction could go to a second panel (although experience might ultimately reveal that a 2-1 split ultimately led to a certain outcome in a high percentage of cases).
^{^}
Moreover, erroneously rejecting a proposal that 51 or even 60 of these EAs would find net positive would be at most a fairly minor error. (I express no opinion here as to whether a percentage of evaluators somewhat under 50% concluding that an application was net harmful should also justify its rejection.)

Sanjay @ 2023-05-12T17:59 (+5)

Seeing this prompted me to post a draft document that I've been meaning to finalise for a while: https://forum.effectivealtruism.org/posts/oLkoKCo3cDzuc996G/coattailing-and-funging-to-learn-strategies-for-non-expert