Reflections on the PIBBSS Fellowship 2022

By nora, particlemania @ 2022-12-11T22:03 (+69)

Cross-posted from LessWrong and the Alignment Forum

Last summer, we ran the first iteration of the PIBBSS Summer Research Fellowship. In this post, we share some reflections on how the program went.

Note that this post deals mostly with high-level reflections and isn’t maximally comprehensive. It primarily focusses on information we think might be relevant for other people and initiatives in this space. We also do not go into specific research outputs produced by fellows within the scope of this post. Further, there are some details that we may not cover in this post for privacy reasons.

How to navigate this post:

  1. If you know what PIBBSS is and want to directly jump to our reflections, go to sections "Overview of main updates" and "Main successes and failures".
  2. If you want a bit more context first, check out "About PIBBSS" for a brief description of PIBBSS’s overall mission; and "Some key facts about the fellowship", if you want a quick overview of the program design.
  3. The appendix contains a more detailed discussion of the portfolio of research projects hosted by the fellowship program (appendix 1), and a summary of the research retreats (appendix 2).

About PIBBSS

PIBBSS (Principles of Intelligent Behavior in Biological and Social Systems) aims to facilitate research studying parallels between intelligent behavior in natural and artificial systems, and to leverage these insights towards the goal of building safe and aligned AI.

To this purpose, we organized a 3-month Summer Research Fellowship bringing together scholars with graduate-level research experience (or equivalent) from a wide range of relevant disciplines to work on research projects under the mentorship of experienced AI alignment researchers. The disciplines of interest included fields as diverse as the brain sciences; evolutionary biology, systems biology and ecology; statistical mechanics and complex systems studies; economic, legal and political theory; philosophy of science; and more.

This approach broadly -- the PIBBSS bet -- is something we think is a valuable frontier for expanding the scientific and philosophical enquiry on AI risk and the alignment problem. In particular, this aspires to bring in more empirical and conceptual grounding to thinking about advanced AI systems. It can do so by drawing on understanding that different disciplines already possess about intelligent and complex behavior, while also remaining vigilant about the disanalogies that might exist between natural systems and candidate AI designs.

Furthermore, bringing diverse epistemic competencies to bear upon the problem also puts us in a better position to identify neglected challenges and opportunities in alignment research. While we certainly recognize that familiarity with ML research is an important part of being able to make significant progress in the field, we also think that familiarity with a large variety of intelligent systems and models of intelligent behavior constitutes an underserved epistemic resource. It can provide novel research surface area, help assess current research frontiers, de- (and re-)construct the AI risk problem, help conceive of novel alternatives in the design space, etc. 

This makes interdisciplinary and transdisciplinary research endeavors valuable, especially given how they otherwise are likely to be neglected due to inferential and disciplinary distances. That said, we are skeptical of “interdisciplinary for the sake of it”, but consider it exciting insofar it explores specific research bets or has specific generative motivations for why X is interesting.

For more information on PIBBSS, see this introduction post, this discussion of the epistemic bet, our research map (currently undergoing a significant update, to be released soon), and these notes on our motivations and scope.

Some key facts about the fellowship program

The original value proposition of the fellowship

This is how we thought of the fellowships’ value propositions prior to running it: 

Overview of main updates

Main successes and failures

Successes

Overall, we believe that most of the value of the program came from researchers gaining a better understanding of research directions and the AI risk problems. Some of this value manifests as concrete research outputs, some of it as effects on fellows’ future trajectories.  

Failures and/or shortcomings

A brief note on future plans

Overall, we believe the PIBBSS summer research fellowship (or a close variation of it) is worth running again. We applied for funding to do so. 

The key dimensions of improvement we are envisioning for the 2023 fellowship are: 

More tentatively, we might explore ways of running (part of the program) in a (more) mentor-less fashion. While we think this is hard to do well, we also think this is attractive for several reasons, mainly because mentorship is scarce in the field. Some potential avenues of exploration include: 


Beyond the format of the summer research fellowship, we tentatively think the following (rough) formats are worth further thought. Note that we are not saying these programs are, all things considered, worthwhile, but that, given our experience, these are three directions that may be worth exploring further. 

  1. reading group/introduction course to AI risk/alignment suitable for natural and social science demographics. We are considering further developing the pre-fellowship reading group, and experimenting with whether it might be worth running it (also) outside of the fellowship context. 
  2. An ~affiliate program, targeted at people who are already able to pursue (PIBBSS-style) research independently. Such a program would likely be longer (e.g. 6 months or more) and focus on providing more tailored support towards affiliates developing their own (novel) PIBBSS-style research directions.
  3. A range of research workshops/retreats/conferences aimed at specific domains or domain interfaces (within the scope of PIBBSS research interests), aiming to e.g. test or develop specific research bets (e.g., complex systems in interpretability, ALife in agent foundations, predictive processing) and/or create Schelling points for specific demographics (e.g., brain sciences in AI alignment). 

PIBBSS is interested in exploring these, or other, avenues further. If you have feedback or ideas, or are interested in collaborating, feel encouraged to reach out to us (contact@pibbss.ai).

We want to thank… 

For invaluable help in making the program a success, we want to thank our fellow organizing team members Anna Gadjdova and Cara Selvarajah; and several other people who contributed to the different parts of this endeavor, including Amrit Sidhu-Brar, Gavin Leech, Adam Shimi, Sahil Kulshrestha, Nandi Schoots, Tan Zhi Xuan, Tomáš Gavenčiak, Jan Kulveit, Mihaly Barasz, Max Daniel, Owen Cotton-Barrat, Patrick Butlin, John Wentworth, Andrew Critch, Vojta Kovarik, Lewis Hamilton, Rose Hadshar, Steve Byrnes, Damon Sasi, Raymond Douglas, Radim Lacina, Jan Pieter Snoeji, Cullen O’Keefe, Guillaume Corlouer, Elizabeth Garrett, Kristie Barnett, František Drahota, Antonín Kanát, Karin Neumannova, Jiri Nadvornik, and anyone else we might have forgotten to mention here - our sincere apologies!). Of course, we are also most grateful to all our mentors and fellows.

 

Appendix 1: Reflections on Portfolio of Research Bets

In this section, we will discuss our reflections on the portfolio of research bets that fellows worked on, which are distributed across a range of “target domains”, in particular: 

  1. Agent Foundations,
  2. Alignment of Complex Systems,
  3. Digital Minds and Brain-inspired Alignment, 
  4. Prosaic Alignment (Foundations),
  5. Socio-technical ML Ethics,
  6. Experimental and Applied Prosaic Alignment.

We will focus the discussion on aspects like the theory of impact for different target domains and the tractability of insight transfer. The discussion will aim to abstract away from fellow- or project-specific factors. Note, that we shall skip the discussion of specific projects or other details here in the public post.

TL;DR: At a high level, projects aimed towards i) Agent Foundations, ii) Alignment of Complex Systems, and iii) Digital Minds and Brain-inspired Alignment most consistently made valuable progress. Projects aimed at iv) Prosaic Alignment faced the largest challenges. Specifically, they seem to require building new vocabulary and frameworks to assist in the epistemic transfer and would have benefited from fellows having more familiarity with concepts in technical ML, which we were insufficiently able to provide through our pedagogical efforts. We believe this constitutes an important dimension of improvement.

1. Agent Foundations (25–30%) [AF] 


2. Alignment of Complex Systems (20–25%) [CS]


3. Digital Minds (and Brain-inspired alignment research; 5–10%) [DM] 

Discussion of AF, CS and DM: 

The above three target domains (i.e. Agent Foundations, Alignment of Complex Systems, and Digital Minds) are all, in some sense, similar insofar as they are all basically pursuing conceptual foundations of intelligent systems, even if the three approach the problem from slightly different starting positions, and with different methodologies. The three bundles together accounted for about 50-55% of projects, and roughly 50% of them were successful in terms of generating research momentum. This makes it meaningful to pay attention to two other similarities between them: a) the overlapping vocabularies and interests with respective neighboring disciplines, and b) the degree of separation (or indirectness) in their theory of impact.

The object-level interests in AF, CS, or DM mostly have the same type signature as questions that motivate researchers in the respective scientific and philosophical disciplines (such as decision theory, information theory, complex systems, cognitive science, etc.). This also means that interdisciplinary dialogue can be conducted relatively more smoothly, due to shared conceptual vocabulary and ontologies. Consequently, we can interpret the motivational nudges provided by PIBBSS here as being some gentle selection pressure towards alignment-relevance of which specific questions get investigated.

At the same time, the (alignment-relevant) impact from research progress here is mostly indirect, coming from better foresight of AI behavior and as an input to future specification and/or interpretability research (see discussion in Rice and Manheim 2022[6]). This provides an important high-level constraint on value derived here.
 

4. Prosaic Alignment Foundations (25–35%) [PF] 

Discussion of PF: 

While some PF projects did succeed in finding promising research momentum, there was higher variance in the tractability of progress. This bundle also had a meaningfully lower ex-post excitement by mentors (compared to the rest of the portfolio), and caused us significant updates about the epistemic hardness of transferring insights from other disciplines.

Unlike AF+CS+DM discussed above, the interdisciplinary transfer of insights towards prosaic alignment seems to involve building entirely new vocabularies and ontologies to a significantly higher degree. For example, transferring insights from evolutionary theory towards understanding any particular phenomenon of relevance in deep learning, seems to be bottlenecked by the absence of a much richer understanding of isomorphism than what already exists. In fact, the projects in this bundle that did succeed in finding good research momentum were strongly correlated with prior ML familiarity of the fellows.

However, given the potential of high-value insights coming from bets like this, we think further exploring the ways of building relevant ML familiarity for the fellows, such that they can more efficiently and constructively contribute, seems worth investigating further. At the very least, we intend to add pedagogical elements for familiarizing fellows with Machine Learning Algorithms and with ML Interpretability, in addition to improving the pedagogical elements on Information Theory and RL Theory from the previous iteration.

 

5. Socio-technical ML Ethics (10%) [ME] 

6. Experimental and Applied Prosaic Alignment  (5–10%) [EP] 

Appendix 2: Retreats Summary

We organized two retreats during the fellowship program for familiarization to AI risk and the alignment problem, facilitation of cross-cohort dialogue, and other benefits of in-person research gatherings. Both the retreats had a mix of structured and unstructured parts, where the structured parts included talks, invited speakers, etc., as well as sessions directed at research planning and orientation, while the unstructured parts included discussions and breakout sessions. A small sample of recurring themes in the unstructured parts included deconfusing and conceptualizing consequentialist cognition, mechanizing goal-orientedness, role of representations in cognition, distinguishing assistive behavior from manipulative behavior, etc.

The first retreat was organized at a venue outside Oxford, at the beginning of the summer, and included sessions on different topics in:

  1. Introduction to AI Risk (eg. intro to instrumental convergence, systems theoretic introduction to misalignment problems, security mindset, pluralism in risk models, talk on tech company singularities etc.)
  2. Epistemology of Alignment Research (eg. proxy awareness in reasoning about future systems, epistemic strategies in alignment, epistemological vigilance, recognizing analogies vs disanalogies, etc.)
  3. Mathematical Methods in Machine Intelligence (eg. intro lecture on information theory, intro lecture on RL theory, talk on telephone theorem and natural abstractions hypothesis, etc.)
  4. Overview of Research Frontiers and AI Design Space (eg. session on reasoning across varying levels of goal-orientedness, capabilities and alignment in different AI systems, and PIBBSS interdisciplinary research map.)

The second retreat was organized near Prague a few weeks before the formal end of the fellowship, and was scheduled adjacent to the Human-Aligned AI Summer School (HAAISS) 2022. It included fellows presenting research updates and seeking feedback, some talks continuing the themes from the previous retreat (eg. why alignment problems contain some hard parts, problematizing consequentialist cognition and second-person ethics, etc), and practising double crux on scientific disagreements (such as whether there are qualitative differences in the role of representations in human and cellular cognition).

  1. ^

    Close to the end of the fellowship program, Peter Eckerlsey, one of our mentors - as well as a mentor and admired friend of people involved in PIBBSS - passed away. We mourn this loss and are grateful for his participation in our program.

  2. ^

    Here is an explanation of how deep reading groups work. We were very happy with how the format suited our purposes. Kudos to Sahil Kulshrestha for suggesting and facilitating the format!

  3. ^

    By “primary” sources of value, we mean those values that ~directly manifest in the world. By “secondary” values we mean things that are valuable in that they aid in generating (more) primary value (in the future). We can also think of secondary values as “commons” produced by the program.

  4. ^

    6 out of 20 fellows had negligible prior exposure to AI risk and alignment; 10 out of 20 had prior awareness but lack of exposure to AI-risk technical discussions; 4 out of 20 had prior technical exposure to AI risk. 

  5. ^

    Some numbers about our application process:

    - Stage 1:    121 applied,
    - Stage 2:    ~60 were invited for work tasks,
    - Stage 3:    ~40 were invited for interviews,
    - Final number of offers accepted:    20 

  6. ^

    Issa Rice and David Manheim (2022), Arguments about Highly Reliable Agent Designs as a Useful Path to Artificial Intelligence Safetyhttps://arxiv.org/abs/2201.02950

  7. ^

    Nora Ammann (2022), Epistemic Artifacts of (conceptual) AI alignment researchhttps://www.alignmentforum.org/s/4WiyAJ2Y7Fuyz8RtM/p/CewHdaAjEvG3bpc6C

    The write up proposes an identification of “four categories of epistemic artifacts we may hope to retrieve from conceptual AI alignment research: a) conceptual de-confusion, b) identifying and specifying risk scenarios, c) characterizing target behavior, and d) formalizing alignment strategies/proposals.”


NunoSempere @ 2022-12-12T14:19 (+3)
  • Research progress: Out of 20 fellows, we find that at least 6–10 made interesting progress on promising research programs.
    • A non-comprehensive sample of research progress we were particularly excited about includes work on intrinsic reward-shaping in brains, a dynamical systems perspective on goal-oriented behavior and relativistic agency, or an investigation into how robust humans are to being corrupted or mind-hacked by future AI systems. 

I didn't quite get why you didn't link to these.

nora @ 2022-12-20T12:30 (+3)

We did not consider the discussion on specific research projects to be within the scope of this post. As mentioned in the beginning, we tried to cover as much as we could that would be relevant to other field builders and related audiences.

It primarily focusses on information we think might be relevant for other people and initiatives in this space. We also do not go into specific research outputs produced by fellows within the scope of this post

 There are a few reasons for why it made sense this way.

As discussed in other parts of this post, a lot of research output has not yet been published. Some teams did publicly share their work (as an example, one of the two teams that worked on "dynamical systems perspective on goal-oriented behavior and relativistic agency" posted their updates on the Alignment Forum: [1] and [2], which we hugely appreciate), some have submitted their manuscripts to academic venues,  and several others have not yet. This has been for various reasons including e.g. because they are continuing the project and waiting to only publish at some further level of maturity, (info) hazard considerations and sanity checks, preferences over the format of research output they'd want to pursue and working towards that, or the project was primarily directed at informing the mentor's research and that may not involve an explicit public output.

From our end, while we might hold preferences for certain insights to flow outwards more efficiently, we also wanted to defer decisions about the form and content of research outputs to the shared judgement between fellows and their respective mentors. 

Note that in some of the cases this absence of public communication till now is fairly justifiable, especially in the cases of promising projects that became long term collaborations. 

(Fwiw, as we mention in this post, we have also gained a better understanding of how to facilitate outward communication without constraining such research autonomy, we will take into account in future.)

There are also other reasons why detailed evaluation of projects is difficult to do based on partial outputs and mentor-specific inside-view motivations. In the light of all this, we did decide for this reflections post to be a high-level abstraction, and not include either a Research Showcase or a detailed Portfolio Evaluation. Based on what we understand right now, this seems like a reasonable decision.

At the same time, if a project evaluator or somebody in a related capacity wishes to take a look at a more detailed evaluation report, we'd be open to discussing that (under some info sharing constraints) and would be happy to hear from you at contact@pibbss.ai 

MaxRa @ 2022-12-12T15:29 (+3)

I understood that there isn't something to link to yet, based on this from the shortcomings section:

Concrete output: While we are fairly happy with the research progress, we think this insufficiently translated into communicable research output within the timeframe of the fellowship. Accordingly, we believe that a, if not “the”, main dimension of improvement for the fellowship lies in providing more structural support encouraging more and faster communicable output to capture the research/epistemic progress that has been generated. 

DorisGrant @ 2022-12-17T13:19 (+1)

Thanks for sharing it :)