What is the role of Bayesian ML for AI alignment/safety?

By mariushobbhahn @ 2022-01-11T08:07 (+39)

I started a Ph.D. in Bayesian ML because I thought it was a relevant approach to AI safety. Currently, I think that this is unlikely and other paths within ML are more promising. 

The main purpose of this post is to spark a discussion, get feedback and collect promising paths within Bayesian ML. Before that, I want to give a very short overview of why I changed my mind. 

If there are relevant projects in the space of Bayesian ML for AI safety I would be very keen on knowing them. Please share them in the comments.

For this post, I'll call everything Bayesian ML that uses Bayes theorem in the context of ML or tries to estimate probability distributions rather than point estimates. 

Why I changed my mind

I was bullish on Bayesian ML for AI safety because:

  1. The Bayesian framework is very powerful in general. It can be used to describe a lot of phenomena in all kinds of disciplines, e.g. neuroscience, economics, medicine, … . Thus, I thought that such a general framework might be scaled up to something AGI-like and I should understand it better.
  2. Quantified uncertainty seemed relevant for AI alignment. This might be through quantifying what a model doesn’t know or implementing constraints, e.g. through choosing specific priors.

I changed my mind because:

  1. It currently looks like TAI will come from really large NNs. Making NNs Bayesian is usually quite computationally expensive, especially for large models. Furthermore, it looks like the current Bayesian techniques are still quite flawed, i.e. the quality of their uncertainty isn’t that great---at least that’s the vibe I’m getting after working with it for about a year.
  2. As a consequence, when relevant changes in AI come around, the Bayesian version is usually years behind. This is way too long for safety-critical applications and thus not very useful.
  3. I’m not sure quantified uncertainty matters that much for alignment. Assume you have an AGI that is unaligned and you can quantify that in some way. This still doesn’t solve the underlying problem that the AGI is not aligned.

I still think that Bayesian ML might be really useful in other aspects and I hope that Bayesian techniques will be increasingly used for statistical analysis in economics, neuroscience, medicine, etc. I’m just not sure it matters for alignment. 

Possibly relevant projects

This is a shortlist from the top of my head. I’m not an expert on most of these topics, so I might misrepresent them. Feel free to correct me and add further ones. 

“Knowing what we don’t know” & Out-of-distribution detection 

There are a ton of papers that use Bayesian techniques to quantify predictive uncertainty in neural networks. One of my Ph.D. projects is focused on such a technique and my impression is that they work OKish---at least in classification settings. However, many simple non-Bayesian techniques such as ensembles yield results of similar or even better quality. Thus, I’m not sure if the Bayesian formalism adds anything of value while it is usually much more expensive. 

Reward uncertainty

In the context of RL, the reward is usually given as a point estimate. Using distributions for the reward instead of point estimates might make the agent more robust and thus less prone to errors. Possibly this is a path to reduce misalignment. 

Learning from human preferences

To align AI systems, it is important to efficiently give them feedback on their actions or beliefs. Many current systems that implement learning from human preferences either are conceptually Bayesian or use probabilistic models such as Gaussian Processes. 

Constraining models

In the Bayesian framework, priors incorporate previous knowledge. We can model safety constraints as strong priors, e.g. punishing the model heavily for crossing a certain threshold. While this is nice in theory, it still doesn’t solve the problem of mapping the real world into mathematical constraints, i.e. we still need to map “don’t do the bad thing” into a probability distribution. 

Causality

Some people believe that causal modeling is the missing piece for AGI. Most causal modeling approaches use techniques that are related to Bayesian ML. I’m not sure yet how realistic causal modeling is but I think there are some insights that the AI safety community is currently overlooking. A post on this topic is in the making and will be published soon. 

I need your help

I’m trying to pivot to projects more related to AI safety within my Ph.D. For this, I want to better understand which kinds of things are relevant. If you have ideas, papers, blog posts, etc. that could be helpful don’t hesitate to comment. 


 


Sebastian_Farquhar @ 2022-01-11T16:50 (+20)

I began my PhD with a focus on Bayesian deep learning with exactly the same reasoning as you. I also share your doubts about the relevance of BDL to long-term safety. I have two clusters of thoughts: some reasons why BDL might be worth pursuing regardless, and alternative approaches.

Considerations about BDL and important safety research:

Big picture, I think intellectual diversity among AGI safety researchers is good, Bayesian inference is important and fundamental, and lots of people glom on to whatever the latest hot thing is (currently LLMs), leading to rapid saturation.

So what is interesting to work on? I'm currently thinking about two main things:

There are other things that are important, and I agree that OOD detection is also important (and I'm working on a conceptual paper on this, rather than a detection method specifically). If you'd like to speak about any of this stuff I'm happy to talk. You can reach me at sebastian.farquhar@cs.ox.ac.uk

mariushobbhahn @ 2022-01-11T17:33 (+6)

Wow. That was really insightful. 

I can confirm that Philipp is a great supervisor! I also don't plan on chasing the next best thing but want to understand ways to combine Bayesian ML with AI safety/alignment relevant things. 

I'll write you a mail soon!

markvdw @ 2022-01-11T10:01 (+11)

I totally see where you're coming from. I would tend to agree that Bayesian inference doesn't seem to be that useful right now. Visible and exciting leaps are being made by large neural networks, like you say. Bayesian inference on the other hand just works OKish, and faces stiff competition in benchmarks from non-Bayesian methods.

In my opinion, the biggest reasons why Bayesian inference isn't making much of an impact at the moment are:

I also think that in AI alignment, the biggest problem is figuring out the right question to ask. Currently, observable failure cases in the current simple test settings are rare*, which makes it hard to do concrete technical work regardless of whether you choose the to use the Bayesian paradigm.

The thought experiments that motivate AI/ML safety are often longer term, and embedded in larger systems (like society). One place where I do think people have a concrete idea of problems that need solving in AI/ML is social science! I have seen interesting points being made e.g. in FAcct [1] about how you should set up your AI/ML system if you want to avoid certain consequences when deploying in society.

This is where I would focus, if I were to want to work on good outcomes of AI/ML that have impact right now.

Now, I still work on Bayesian ML (although not directly on AI safety). Why do I do this when I agree that Bayesian inference doesn't have much to offer right now? Well, because I think there are reasons to believe that Bayesian inference will have new abilities to offer deep learning (not just uncertainty) in the long term. We just need to work on them! I may turn out to be wrong, but I do still believe that my research direction should be explored in case it turns out to be right. Big companies seem to have large model design under control anyway.

What should you work on? This is difficult. Large model research is difficult to do outside of a few organisations, and requires a lot of engineering effort. You could try to join such an organisation, you could try to investigate properties of trained models, you could try and make these models more widely available, or try to find smaller scale test set-ups where you can probe properties (this is difficult if scale is intrinsically necessary).

I do believe that a great way to get impact right now is to work on questions about deploying ML, and where we want to apply it in society.

You could also remember that your career is longer than a PhD, and work on technical problems now, even if they're not related to AI safety. If you want to work on Bayesian ML, or just ML that quantifies uncertainty in a way that matters (right now), I would work on problems with a decision making element in them. Places where the data-gathering<->decision loop is closed.

Self-driving cars are a cool example, but again difficult to work on outside specific organisations. Bayesian Optimisation or experimental design in industrial settings is a cool small-scale example of where uncertainty modelling really matters. I think smaller scale and lower dimensional applied ML problems are undervalued in the ML community. They're hard to find (it may require talking to people in industry), but I think they are numerous. And what's interesting, is that research can make the difference between a poor solution and a great solution.

I'm curious to hear other people's thoughts on this as well.

 

[1] https://facctconference.org/

 

* Correct me if I'm wrong on this. If the failure cases are not rare, then my advice would be to pick one, and try to solve it in a tool agnostic way.

Vanessa @ 2022-01-17T14:38 (+10)

Quantified uncertainty might be fairly important for alignment, since there is a class of approaches that rely on confidence thresholds to avoid catastrophic errors (1, 2, 3). What might also be important is the ability to explicitly control your prior in order to encode assumptions such as those needed for value learning (but maybe there are ways to do it with other methods).

PabloAMC @ 2022-01-11T11:42 (+7)

A similar idea is by the way discussed in a post by Jaime Sevilla on the limits of causal discovery: https://towardsdatascience.com/the-limits-of-graphical-causal-discovery-92d92aed54d6

Related to your causality comment above, two days ago I submitted a research proposal on Causal Representation Learning for AI Safety. You may want to see it here: https://www.lesswrong.com/posts/5BkEoJFEqQEWy9GcL/an-open-philanthropy-grant-proposal-causal-representation

MaxRa @ 2022-01-11T11:04 (+7)

Yeah, difficult question... some random thoughts, not sure how helpful: