Announcing the Introduction to ML Safety Course

By TW123, Dan H, Oliver Z @ 2022-08-06T02:50 (+136)

TLDR

We're announcing a new course designed to introduce students with a background in machine learning to the most relevant concepts in empirical ML-based AI safety. The course is available publicly here.

Background

AI safety is a small but rapidly growing field, and both younger and more experienced researchers are interested in contributing. However, interest in the field is not enough: researchers are unlikely to make much progress until they understand existing work, which is very difficult if they are simply presented with a list of posts and papers to read. As such, there is a need for curated AI safety curricula that can get new potential researchers up to speed.

Richard Ngo’s AGI Safety Fundamentals filled a huge hole in AI safety education, giving hundreds of people a better understanding of the landscape. In our view, it is the best resource for anyone looking for a conceptual overview of AI safety.

However, until now there has been no course that aims to introduce students to empirical, machine learning-based AI safety research, which we believe is a crucial part of the field. There has also been no course that is designed as a university course usually is, complete with lectures, readings, and assignments; this makes it more likely that it could be taught at a university. Lastly, and perhaps most importantly, most existing resources assume that the reader has higher-than-average openness to AI x-risk. If we are to onboard more machine learning researchers, this should not to be taken for granted.

In this post, we present a new, publicly-available course that Dan Hendrycks has been working on for the last eight months: Introduction to ML Safety. The course is a project of the Center for AI Safety.

Introduction to ML Safety

Philosophy

The purpose of Introduction to ML Safety is to introduce people familiar with machine learning and deep learning to the latest directions in empirical ML safety research and explain existential risk considerations. Our hope is that the course can serve as the default for ML researchers interested in doing work relevant to AI safety, as well as undergraduates who are interested in beginning research in empirical ML safety. The course could also potentially be taught at universities by faculty interested in teaching it.

The course contains research areas that many reasonable people concerned about AI x-risk think are valuable, though we exclude those that don’t (yet) have an empirical ML component, as they aren’t really in scope for the course. Most of the areas in the course are also covered in Open Problems in AI X-Risk.

The course is still very much in beta, and we will make improvements over the coming year. Part of the improvements will be based on feedback from students in the ML Safety Scholars summer program.

Content

The course is divided into seven sections, covered below. Each section has lectures, readings, assignments, and (in progress) course notes. Below we present descriptions of each lecture, as well as a link to the YouTube video. The slides, assignments, and notes can be found on the course website.

Background

The background section introduces the course and also gives an overview of deep learning concepts that are relevant to AI safety work. It includes the following lectures:

This section includes a written assignment and a programming assignment designed to help students review deep learning concepts.

Hazard Analysis

In Complex Systems for AI Safety, we discussed the systems view of safety. It’s unclear to what extent AI safety is like particular other safety problems, like making cars, planes, or software programs safer. The systems view of safety provides general abstract safety lessons that have been applicable across many different industries. Many of these industries, such as information security and the defense community, must contend with powerful adversarial actors, not unlike AI safety. The systems view of safety thus provides a good starting point for thinking about AI safety. The hazard analysis section of the course discusses foundational systems safety concepts and applies them to AI safety. It includes the following lectures:

This section includes a written assignment where students test their knowledge of the section.

Robustness

In Open Problems in AI X-Risk, we covered the relevance of robustness to AI safetyRobustness focuses on ensuring models behave acceptably when exposed to abnormal, unforeseen, unusual, highly impactful, or adversarial events. We expect such events will be encountered frequently by future AI systems. This section includes the following lectures:

This section includes a written assignment where students test their knowledge of the section, and a programming assignment where students implement various methods in adversarial robustness.

Monitoring

We also covered monitoring in Open Problems in AI X-Risk, which we define as research that reduces exposure to hazards as much as possible and allows their identification before they grow. The course covers monitoring in more depth, and includes the following lectures:

This section includes a written assignment where students test their knowledge of the section, and two programming assignments: one focused on anomaly detection, and the other on Trojan detection.

Alignment

We also cover alignment, which has varying definitions but we define as reducing inherent model hazards: hazards that result from models (explicitly or operationally) pursuing the wrong goals. The course covers the following areas, which are also covered in Open Problems in AI X-Risk.

This section includes a written assignment where students test their knowledge of the section, and a programming assignment where students use transparency tools to identify inconsistencies with language models trained to model ethics.

Systemic Safety

In addition to directly reducing hazards from AI systems, there are several ways that AI can be used to make the world better equipped to handle the development of AI by improving sociotechnical factors like decision making ability and safety culture. This section covers a few of such areas, which are also covered in Open Problems in AI X-Risk.

Additional Existential Risk Discussion

As is typical for a topics course, the last section covers the broader importance of the concepts covered earlier: namely, existential risk and possible existential hazards. We also cover strategies or tractably reducing existential risk, following Pragmatic AI Safety and X-Risk Analysis For AI Research.

This section includes a final reflection assignment where students review the course and notably encourages students to evaluate AI safety arguments for themselves.

Next Steps

All course content is available online, so anyone can work through it on their own. The course is currently being trialed by the students in ML Safety Scholars, who are providing valuable feedback.

We are interested in running additional formal versions of this course in the future. If you have the operational capacity to run this course virtually, or are interested in running it at your university, please let us know!

If you notice bugs in the lectures and/or assignments, you can message any of us or email info@centerforaisafety.org.

Acknowledgements

Dan Hendrycks would like to thank Oliver Zhang, Rui Wang, Jason Ding, Steven Basart, Thomas Woodside, Nathaniel Li, and Joshua Clymer for helping with the design of the course, and the students in ML Safety Scholars for testing the course.


doroff @ 2022-08-12T10:01 (+5)

Is there a forum or a chat where I can talk to others who take the course?

Guy Raveh @ 2022-08-08T15:21 (+3)

Looks great! Thanks for making and posting it.

I think this is exactly the sort of resource I need to get further into AI safety.

node @ 2022-08-19T20:07 (+2)

Suggestion:

Add an interest list where people can register to get information for future formats of the course.

There seems to be significant interest to 'take' this course beyond self study. I think that there are also many who could be interested in running it, provided sufficient participation. Removing uncertainty in the latter should make commitments and funding more likely.

ThomasW @ 2022-08-19T22:58 (+4)

I have another post planned in a few weeks, in which I will probably include something like this. If you haven't already seen, we made a post about the ML safety component of the course here (though this doesn't answer the question about a formal program). We are already going to be running a version of it in the fall at some universities, but if anyone else is interested imminently in running it, please DM me here!