FLI report: Policymaking in the Pause

By Zach Stein-Perlman @ 2023-04-15T17:01 (+29)

This is a crosspost, probably from LessWrong. Try viewing it there.

null
trevor1 @ 2023-04-15T18:21 (+1)

Shouldn't #6 be at the top? It seems to be one of the most critical factors for getting larger numbers of hardcore-quantitative people (e.g. polymaths, string theorists) to go into AI safety research. It will probably be one of those people, not anyone else, who makes a really big discovery. 

There's already several orders of magnitude more hardcore-quant people dithering about trying to do pointless things like solving physics, and most of what we have so far is from the tiny fraction of the hardcore-quant people who ended up in AI safety research.

Geoffrey Miller @ 2023-04-15T19:28 (+4)

I think an extremely dangerous failure mode for AI safety research would be to prioritize 'hardcore-quantitative people' trying to solve AI alignment using clever technical tricks, without understanding much about the 8 billion ordinary humans they're trying to align AIs with. 

If behavioral scientists aren't involved in AI alignment research -- and as skeptics about whether 'alignment' is even possible -- it's quite likely that whatever 'alignment solution' the hardcore-quantitative people invent is going to be brittle, inadequate, and risky.

trevor1 @ 2023-04-15T19:50 (+3)

I agree that behavioral science might be important to creating a non-brittle alignment, and I am very, very, very, very, very bullish about behavioral science being critically valuable for all sorts of factors related to AI alignment support, AI macrostrategy, and AI governance (including but not limited to neurofeedback). In fact, I think that behavioral science is currently the core crux deciding AI alignment outcomes, and that it will be the main factor determining whether or not enough quant people end up going into alignment. In fact, I think that the behavioral scientists will be remembered as the superstars, and the quant people are interchangeable.

However, the overwhelming impression I get with the current ML paradigm is that we're largely stuck with black box neural networks, and that these are extremely difficult and dangerous to align at all; they have a systematic tendency to generate insurmountably complex spaghetti code that is unaligned by default. I'm not an expert here, I specialized in a handful of critical elements in AI macrostrategy, but the things I've seen so far indicates that the neural network "spaghetti code" is much harder to work with than the human alignment elements. I'm not strongly attached to this view though.

Geoffrey Miller @ 2023-04-15T19:52 (+2)

Trevor -- yep, reasonable points.  Alignment might be impossible, but it might be extra impossible with complex black box networks with hundreds of billions of parameters & very poor interpretability tools.