AI Welfare Debate Week retrospective

By Toby Tremlett🔹 @ 2024-09-16T09:18 (+39)

I wrote this retrospective to be shared internally in CEA - but in the spirit of more open communication, I'm sharing it here as well. Note that this is a review of the event considered as a product, not a summary or review of the posts from the week.

If you have any questions, or any additional feedback, that'd be appreciated! I'll be running another debate week soon, and feedback has already been very helpful in preparing for it. 

Also, feedback on the retro itself is appreciated- I'd ideally like to pre-register my retros and just have to fill in the graphs and conclusions once the event actually happens, so suggesting data we should measure/ questions I should be asking would be very helpful for making better retro templates. 

How successful was the event?

In my OKRs (Objectives and Key Results- AKA, my goals for the event), I wanted this event to:

Therefore, on our explicit goals, this event was successful 🎊. But how successful was it based on our other, non-KR goals and hopes?

Some other goals that we had for the event- either in the ideation phase, or while it was ongoing, were:

In the next four sections, I examine each of these goals in turn. 

More good content

We had 28 posts with the debate week tag, with 7 being at or above 50 karma. Of the 7, all but one (JWS’s thoughtful critique of the debate’s framing) were from authors I had directly spoken to or messaged about the event.

Compared to Draft Amnesty Week (which led to posts from 42 authors, and 10 posts over 50 karma) this isn’t that many- however, I think we should count these posts as ex ante more valuable because of their focus on a specific topic.

Ex-post, it's hard to assess how valuable the posts were. None of the posts had very high karma (i.e. the highest was 77). However, I did curate one of the posts, and a couple of others were considered for curation. I would be interested to hear takes from readers about how valuable the posts were - did any of them change your mind, lead to a collaboration, or cause you to think more about the topic?

Engagement

How much engagement did the event get?

In total, debate week posts got 127 hours of engagement during the debate week (or 11.6% of total engagement), and 181 hours from July 1-14 (debate week and the week after), 7.5% of that fortnight’s engagement hours. 

Did it increase total daily hours of engagement?

Note: Discussion of Manifest controversies happened in June, and led to higher engagement hours per day in the build up to the event. Important dates: June 17: 244 comments, June 18: 349 comments, June 20: 33 comments, June 25: 38 comments

It doesn’t look as if the debate week meaningfully increased daily engagement. The average daily engagement for the week after the event is actually higher, although the 3rd day of the event (July 3rd- the day I mentioned that the event was ongoing in the EA Digest) remains the highest hours of engagement between July 1st and the date I'm writing this, August 21st.

Did it get us new users?

Not very noticeably, but slightly. We got a peak of new users on the third day of debate week:

And the average daily new users during debate week was (very marginally) higher than that of any other week between June 24 and today (August 22):

Did it increase messaging?

I don’t think so. 

Debate week had a lower average of new convos per day (and some of them would have been me discussing posts with authors), and a slightly higher average messages per user than the next two weeks. It’d be cool if we could bump this metric up next time.

Below is all relevant messaging data, with debate week marked in green. Debate week doesn’t stand out.

How successful were our new features?

For this event we debuted several new features: The debate week banner, the “most influential posts” score, and the in-post debate slider.

The debate-week banner on the frontpage

The frontpage banner

Overall- I think this was very successful. It got a lot of great feedback during the event, as well as many more votes than I expected.

Distribution of votes

We got votes throughout the week, but more towards the start of the week. We also had many more first votes than vote changes.

Below you can see the graph, with yellow cells representing the largest delta score (change on the debate slider from initial vote) and purple the smallest (generally representing a first vote). I’ve marked the peaks that came organically and from the Digest.

Graph showing votes per hour, and colour-coding to show the delta-score of each vote

Votes on the debate were quite nicely distributed (although see the next section on feedback for concerns on visibility). I’ve put together a hacky chart below to show the distribution:

Histogram lined up with the debate slider to show the final vote distribution. 

Feedback on the debate week banner

On the Forum:

From CEA slack:

Most influential posts

This didn’t work out as planned. I hoped it would sort out the posts that most changed user’s minds, allowing us to find the most informative posts from the week. However, only 30 users cited posts when they changed their mind, and the second highest mind changing post is my announcement post (which shouldn’t have ranked at all).

For the next debate week, I’d vote for cutting this feature, or just changing it to count a mind-change as same amount of points, no matter how large the mind-change was (in this case, that’d lead to a more rational leaderboard).

The in-post debate slider

Thanks to a suggestion from @EdoArad we had a debate slider in posts, which would give us a delta score which automatically cited the post you were reading, if you changed your vote while on a post.  We unfortunately don't have stats which show how often this was used as opposed to the frontpage slider. 

Other feedback

There was a lot of feedback clustered around the framing of the debate question, which I responded to in this quicktake. General takeaway: make sure the next debate question is very specific, and nudge people to give feedback on it before it is locked in.

People were generally excited about another debate week. Here is Nathan Young's question post asking for people’s ideas. Users also discussed how frequent debate weeks should be (this and other comments pointed to every 6-8 weeks being a good start).

Feature suggestions:

Takeaways for next time:

Impacts and mentions beyond the Forum

Thanks for reading! If you have more feedback about this event, positive or negative, please comment it below or dm me

  1. ^

    Anonymised just to speed up posting. 


jenn @ 2024-09-17T19:55 (+3)

oh hey i ran that rationality meetup on radical empathy and AI welfare. i think it went pretty well and it was directly prompted by AI welfare debate week happening on the forums, so thanks for organizing!

i can talk a little more about the takeaways from that meetup specifically, which had around a half dozen attendees:

like, i don't think these are amazing take-aways, in that higher quality versions of these conclusions have surely been written up in the forums long before debate week. but i think it's helpful to get them in the water a bit more, and i came out of it with a greater appreciation for the complexity of this question (and also like, more deeply grokking the difficulties of alignment research and just how different ai entities can be from humans).

out of curiousity, do you remember how you came across the meetup posting?

Toby Tremlett🔹 @ 2024-09-18T07:49 (+2)

Thanks for fleshing that out jenn! (and for running the meeting). I'll feed this back to my team. 
I found the meetup either via a google search for "AI Welfare Debate Week" or a backlink checker. 

Toby Tremlett🔹 @ 2024-09-18T14:10 (+3)

This is the backlink checker. It's pretty helpful for seeing whether your posts (or in my case, events) have been mentioned anywhere outside of the Forum. 

SummaryBot @ 2024-09-16T16:56 (+1)

Executive summary: The EA Forum's AI Welfare Debate Week was successful in meeting its key goals of participation and changing minds, but had mixed results on secondary objectives like increasing overall engagement and attracting new users.

Key points:

  1. The event exceeded targets for participation (558 voters vs 50 goal) and mind-changing (53 users vs 25 goal).
  2. 28 posts were created, with 7 high-quality ones (50+ karma), though none had very high karma.
  3. The event did not noticeably increase overall Forum engagement, new user signups, or messaging.
  4. New features like the debate banner and slider were successful, but the "most influential posts" feature needs improvement.
  5. Feedback suggested making future debate questions more specific and empirical.
  6. Recommendations for future events include improving data visualization, adding easier engagement options, and choosing less niche topics.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.