7 traps that (we think) new alignment researchers often fall into
By Akash @ 2022-09-27T23:13 (+73)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullAlex HT @ 2022-09-28T14:42 (+18)
I claim that you can get near the frontier of alignment knowledge in ~6 months to a year.
How do you think people should do this?
Geoffrey Miller @ 2022-09-28T18:38 (+14)
Akash - very nice post, and helpful for (relative) AI alignment newbies like me.
I would add that many AI alignment experts (and many EAs, actually) seem to assume that everyone getting interested in alignment is in their early 20s and doesn't know much about anything, with no expertise in any other domain.
This might often be true, but there are also some people who get interested in alignment who have already had successful careers in other fields, and who can bring new interdisciplinary perspectives that alignment research might lack. Such people might be a lot less likely to fall into traps 3, 4, 5, and 6 that you mention. But they might fall into other kinds of traps that you don't mention, such as thinking 'If only these alignment kids understood my pet field X as well as I do, most of their misconceptions would evaporate and alignment research would progress 5x faster....' (I've been guilty of this on occasion).
Guy Raveh @ 2022-09-28T12:50 (+11)
In my perspective, new and useful innovations in the past, especially in new fields, came from people with a wide and deep education and skillset that takes years to learn; and from fragmented research where no-one is necessarily thinking of a very high level terminal goal.
How sure are you that advice like "don't pursue proxy goals" or "don't spend years getting a degree" are useful for generating a productive field of AI alignment research, and not just for generating people who are vaguely similar to existing researchers who are thought of as successful? Or who can engage with existing research but will struggle with stepping outside its box?
After all:
- Many existing researchers who have made interesting and important contributions do have PhDs,
- And it doesn't seem like we're anywhere close to "solving alignment", so we don't actually know that being able to engage with their research without a much broader understanding is really that useful.
Yonatan Cale @ 2022-09-28T17:12 (+3)
I like this, it's a higher resolution description of what I think of as "not staying stuck in other people's boxes", or at least part of it. I plan to try avoiding these in the next few months, and live blog about it, so that people can correct mistakes I make early on, as opposed to me keeping doing wrong things for long. Any chance you'd like to follow?
Sharmake @ 2022-09-28T01:02 (+2)
B. They lose sight of the terminal goal. The real goal is not to skill-up in ML. The real goal is not to replicate the results of a paper. The real goal is not even to “solve inner alignment.” The real goal is to not die & not lose the value of the far-future.
I'd argue that if they solved inner alignment totally, then the rest of the alignment problems becomes far easier if not trivial to solve.