Stampy's AI Safety Info - New Distillations #3 [May 2023]

By markov @ 2023-06-06T14:27 (+10)

This is a linkpost to https://aisafety.info/?state=95LE_967I_85E4_8AER_8AEQ_6939_9358_94D9_935A_8FJZ_6411_6222_8NYD_6410_8QH5_90PK_8VOT_8V5J_8517_8KGQ_

Hey! This is another update from the distillers at the AI Safety Info website (and its more playful clone Stampy).

Here are a couple of the answers that we wrote up over the last month (May 2023). As always let us know in the comments if there are any questions that you guys have that you would like to see answered.

The list below redirects to individual links, and the collective URL above renders all of the answers in the list on one page at once.

How can I help tree

There are many different types of individuals from all manner of different backgrounds who want to contribute to AI Safety. So we added a FAQ with advice on various different ways to help with AI safety.

Here is the unified link for all the answers in the how can I help tree if you wish to navigate to aisafety.info directly. See Steven's post for the full list of individual questions and details.

Other Distillations

Developer Updates

There are also a couple of cool new features that the developers of the website worked on. We now have a tags page, which allows the reader to navigate the answers based on the tag that they are most interested in. There is also an implementation of a glossary function which auto-creates a short definition popup whenever a jargony term pops up in the answer that the reader might not be familiar with. Here is a sample answer that shows how this functionality works. Try hovering over the "agent", "s-risk", "Goodhart" etc... links.

This is all still a work in progress, but it is exciting to share everything that the people who are all part of the Stampy project have been working on.

Cross-posted to Lesswrong: https://www.lesswrong.com/posts/fFtkQCasMjeEpMdhp/stampy-s-ai-safety-info-new-distillations-3-may-2023

Chris Leong @ 2023-06-06T19:52 (+2)

I always get confused about the difference between superposition and poly-semanticy. Would be great if the article clarified this.

markov @ 2023-06-07T21:29 (+1)

It's always hard finding a balance between brevity + de-jargonification in the answers while also touching on every topic that people might be confused about. Mainly because we expect a large diversity of people with varying technical and non-technical backgrounds to interact with the answers. This means that we try and minimize the information per answer to the main essentials required for understanding the topic.

All that being said, I also don't really know if there is a major difference between polysemanticity and superposition. Additionally, I am also confused about if polysemantic and monosemantic neurons refer to the same underlying concept as disentangled vs distributed representations because all these concepts sound like they are describing the same thing. I took note of your comment in the thread about that particular answer, and will get back to it when I learn more.

If you have any resources/posts/papers that point out that they are indeed different let me know and I'll write up a new answer. Something like - What is the difference between polysemantic neurons and superposition?, and, What is the difference between solving polysemanticity in neurons and disentangling latent space representations?

Alternatively, if you come by some awesome explanation and feel like writing it up, you could just edit the stampy docs on superposition or polysemanticity yourself.