Newsletter for Alignment Research: The ML Safety Updates

By Esben Kran, Thomas Steinthal, Sabrina Zaki, Apart Research @ 2022-10-22T16:17 (+30)

Introducing the ML Safety Updates

TLDR; We present a new AI safety update series in podcastYouTube and newsletter format released weekly to stay updated in alignment and ML safety research and get exposed to Ai safety opportunities. Read the latest newsletter further down in this post.

Our motivations for this are two-fold:

Our newsletters cover a summary of the past week’s research, both within alignment and safety-relevant AI work, as well as promote opportunities in the AI safety space. 

The past 7 weeks, we have released these updates as a YouTube video series summarizing novel AI and ML safety research in 4-6 minutes. This week, we released them in podcast and newsletter format, and future updates will also be released to LessWrong. Subscribe here.

The case for an AI safety update series

There are already a few amazing resources on AI safety similar to newsletters. However, the ones that exist are either biased towards specific topics or have not been kept up to date that past year. See our summary below.

Additionally, there are several update feeds for alignment research.

Do share if you think there’s any major channels we missed in the update feeds sheet and the research update channels sheet.

Risks & uncertainties

  1. We misrepresent someone’s research or perspective in AI safety. This is a real risk since these updates will be published once a week.
  2. The research we summarize plays into an existing paradigm and limits the creation of new ideas in AI safety research.
  3. Wrong representation of AI safety in the alignment updates leads to stigmatizing of AIS from the ML community and public actors.
  4. We stop updating the weekly newsletter, and our past existence prevents people from making a new one.

Risk mitigation

  1. We are very open for feedback and will keep a link in the description of our manuscript for you to comment on so we can add any corrections (you can also go to the manuscript in the link further down).
  2. We will consciously look beyond the traditional channels of AI safety to find further resources every week. Additionally, we won’t disregard an article just because it doesn’t have the right amount of karma.
  3. We will strive to optimize the feedback mechanisms from the community to ensure that we integrate rather than separate from the machine learning field. We will report publicly if we stop making the episodes and call for others to take over the task. If this is not possible, we will be very public about the complete shutdown of the series so others can fill the gap.

Feedback

Give anonymous feedback on the series here or write your feedback in the comments here or on YouTube. You’re also very welcome to contact us at operations@apartresearch.com or book a meeting here.

Please do reach out to us or comment in the manuscript doc if you think we misrepresented an article, opinion or perspective during an update.

Subscribe to our newsletters here, listen to the podcast here (Spotify), watch the YouTube videos here and read the newsletters here.

Thank you very much to Alexander Briand for in-depth feedback on this post.

This week’s ML Safety Update

This week, we’re looking at counterarguments to the basic case for why AI is an existential risk to humanity, looking at how strong AI might come very soon, and sharing interesting papers.

But first a small note: You can now subscribe to our newsletter and listen to these updates in your favorite podcasting app. Check out newsletter.apartresearch.com and podcast.apartresearch.com.

Today is October 20th and this is the ML Safety Progress Update!

AI X-risk counterarguments

Existential risk of AI does not seem overwhelmingly likely according to Katja Grace from AI Impacts. She writes a long article arguing against the major perspectives on how AI can become very dangerous, and notes that enough uncertainty makes AI safety seem like a relevant concern despite the relatively low chance of catastrophe.

Her counterarguments go against the three main cases for why superintelligent AI will become an existential risk: 1) Superhuman AI systems will be goal-directed, 2) goal-directed AI systems’ goals will be bad, and 3) superhuman AI will overpower humans.

Her counterarguments for why AI systems might not be goal-directed are that many highly functional systems can be “pseudo-agents”, models that don’t pursue utility maximization but optimize for a range of sub-goals to be met. Additionally, to be a risk, the bar for goal-directedness is extremely high.

Her arguments for why goal-directed AI systems’ goals might not be bad are that: 1) Even evil humans broadly correspond to human values and that slight diversity  from the optimal policy seem alright. 2) AI might just learn the correct thing from the dataset since humans also seem to get their behavior from the diverse training data of the world. 3) Deep learning seems very good at learning fuzzy things from data and values seem learnable in slightly the same way as generating faces (and we don’t see faces without noses for example). The last counterargument is that 4) AIs who learn short-term goals will both be highly functional and have a low chance of optimizing for dangerous, long-term goals such as power-seeking.

Superhuman AI might also not overpower humans since: 1) A genius human in the stone age would have a much harder time getting to space than an average intelligence human today, which shows that intelligence is a much more nuanced concept than we set it to be. 2) AI might not be better than human-AI combinations. 3) AI will need our trust to take over critical infrastructure. 4) There are many other properties than intelligence which seem highly relevant. 5) Many goals do not end in taking over the universe. 6) Intelligence feedback loops could havemany speeds and you need a lot of confidence that it will be fast to say it leads to doom. And 7) key concepts in the literature are quite vague, meaning that we lack an understanding of how they will lead to existential risk.

Erik Jenner and Johannes Treutlein give their response to her counterarguments. Their main point is that there’s good evidence that the difference between AI and humans will be large and that we need Grace’s slightly aligned AI to help us reach a state where we do not build much more capable and more misaligned systems.

Comprehensive AI Services (CAIS)

A relevant text to mention in relation to these arguments is Eric Drexler’s attempt at reframing superintelligence into something more realistic in an economic world. Here, he uses the term “AI services” to describe systems that can solve singular tasks that will be economically relevant. The comprehensive in comprehensive AI services is what we usually call general. The main point is that we will see a lot of highly capable but specialized AI before we get the monolithic artificial general intelligence. We recommend reading the report if you have the time.

Strong AGI coming soon

At the opposite end of the spectrum from Grace, Porby shares why they think AGI will arrive in the next 20 years with convincing arguments on 1) how easy the problem of intelligence is, 2) how immature current machine learning is, 3) how quickly we’ll reach the level of hardware needed, and 4) how we cannot look at current AI systems to predict future abilities.

Other news

Opportunities

And now, diving into the many opportunities available for all interested in learning and doing more ML safety research!

This has been the ML Safety Progress Update and we look forward to seeing you next week!