Some of My Current Impressions Entering AI Safety

By Phib @ 2023-03-28T05:18 (+5)

Hello,


I have been engaged with EA for about 4 years, university then ops. I am now trying to contribute to AGI Alignment non-technically, and learning about it to be the best support.

I am in that phase of emotionally confronting the seemingly likely drastic changes of the next few decades (should I even save for retirement?), so please excuse the existential crisis peeking out from behind this post.


Quick Sanity Check:

AI is powerful (AI is >human in narrow applications)

AI is becoming more powerful generally, exponentially (this may not continue)

AI will likely become more powerful than humans.

This is potentially disastrous to humans.


Current Considerations:

I'm kinda hedging my future here on 'this may not continue [at current rate, maybe it's actually pretty hard to get the G in AGI]' and current alignment plans (strongarm by big companies, eventual strongarm by global powers) working out. Or maybe the superintelligent AGI is more chill than we expect.

I'm unsure what I am doing with this post, I think I want to comment on my own anxieties, thoughts, and aspirations (trying to think with a growth mindset here, c'mon). And I also think, strong personal bias from my perspective, that more and more of the EA space is converging toward AI Safety as AI converges toward AGI, and this makes sense (go figure).

Couple of things I'm considering here:

  1. Aligned AGI could be the most incredible tool for human wellbeing! Heck yeah, a superintelligence that eliminates suffering but, like, in a really cool aligned way (seems like the most defining feature of "the long reflection" isn't a lack of existential risk, but rather this superintelligence assisting us).
  2. Aligned AGI seems like a really good solution to existential risks. I have an image in my head of a hand reaching from above to pluck the toy warhead out of the infant's rapidly descending arm.

I can appreciate why someone would want to accelerate AGI considering the (to my mind near infinite in line with its capabilities) upsides to it working out really well. It seems like it'll be really competent at working out the 'best way of providing best results', rendering quite a bit of our hundred-year(+) plans obsolete. In fact, in an ideal (aligned and good) situation, we might be accelerating capabilities as the best means to solve quite a bit of societal problems, if not all (sorry to make light, coping, but x-risks seem to be competing these days).

I have quite a bit of uncertainty about all of the above and this was written in a couple of hours to be posted, but genuinely this is affecting (sometimes terrifying) me quite a bit (I was like 50% serious about the saving for retirement thing, considering both positive and negative outcomes). I have further views (of course) on the field but nothing I really want to share right now.


More on me:

I am thinking about how to best contribute to this space as a non-technical person, with concrete paths available in my current role to perform optimal ops, write/communicate about these ideas, and just channel more resources and talent at the problem (and be smart about it).

I think my exponential model is informed by 3 points in time, TalktoTransformer in a College class (oh neat), ChatGPT (woah), and GPT-4. Oh and also Metaculus/peeps and a belief that LLMs, specifically language as a key to general intelligence, are outperforming expectations in learning and capabilities.

(BTW I was notorious for throwing wooden blocks as an infant).

Thanks for reading and potentially steelmanning.