titotal on AI risk scepticism
By Vasco Grilo🔸 @ 2024-05-30T17:03 (+75)
This is a linkpost to https://forum.effectivealtruism.org/users/titotal
This is a linkpost for titotal's posts on AI risk scepticism, which I think are great. I list the posts below chronologically.
Chaining the evil genie: why "outer" AI safety is probably easy
Conclusion
Summing up my argument in TLDR format:
- For each AGI, there will be tasks that have difficulty beyond it’s capabilities.
- You can make the task “subjugate humanity under these constraints” arbitrarily more difficult or undesirable by adding more and more constraints to a goal function.
- A lot of these constraints are quite simple, but drastically effective, such as implementing time limits, bounded goals, and prohibitions on human death.
- Therefore, it is not very difficult to design a useful goal function that raises subjugation difficulty above the capability level of the AGI, simply by adding arbitrarily many constraints.
Even if you disagree with some of these points, it seems hard to see how a constrained AI wouldn't at least have a greatly reduced probability of successful subjugation, so I think it makes sense to pursue constraints anyway (as I'm sure plenty of people already are).
AGI Battle Royale: Why “slow takeover” scenarios devolve into a chaotic multi-AGI fight to the death
Summary
The main argument goes as follows:
- Malevolent AGI’s (in the standard model of unbounded goal maximisers) will almost all have incompatible end goals, making each AGI is an existential threat to every other AGI.
- Once one AGI exists, others are likely not far behind, possibly at an accelerating rate.
- Therefore, if early AGI can’t take over immediately, there will be a complex, chaotic shadow war between multiple AGI’s with the ultimate aim of destroying every other AI and humanity.
I outlined a few scenarios of how this might play out, depending on what assumptions you make:
Scenario a: Fast-ish takeoff
The AGI is improving fast enough that it can tolerate a few extra enemies. It boosts itself until the improvement saturates, takes a shot at humanity, and then dukes it out with other AGI after we are dead.
Scenario b: Kamikaze scenario
The AGI can’t improve fast enough to keep up with new AGI generation. It attacks immediately, no matter how slim the odds, because it is doomed either way.
Scenario c: AGI induced slowdown
The AGI figures out a way to quickly sabotage the growth of new AGI’s, allowing it to outpace their growth and switch to scenario a.
Scenario d: AI cooperation
Different AGI's work together and pool power to defeat humanity cooperatively, then fight each other afterwards.
Scenario e: Crabs in a bucket
Different AGI's constantly tear down whichever AI is “winning”, so the AI are too busy fighting each other to ever take us down.
I hope people find this analysis interesting! I doubt I'm the first person to think of these points, but I thought it was worth giving an independent look at it.
How "AGI" could end up being many different specialized AI's stitched together
Summary:
In this post, I am arguing that advanced AI may consist of many different smaller AI modules stitched together in a modular fashion. The argument goes as follows:
- Existing AI is already modular in nature, in that it is wrapped into larger, modular, “dumb” code.
- In the near-term, you can produce far more impressive results by stitching together different specialized AI modules than by trying to force one AI to do everything.
- This trend could continue into the future, as specialized AI can have their architecture, goals and data can be customized for maximum performance in each specific sub-field.
I then explore a few implications this type of AI system might have for AI safety, concluding that it might result in disunified or idiot savant AI's (helping humanity), or incentivise AI building AI's (which could go badly).
Why AGI systems will not be fanatical maximisers (unless trained by fanatical humans)
Summary:
I will briefly try and summarize the points of this argument.
Assumptions: slow takeoff, world domination difficult.
1. Future AI is likely to be designed through a significant degree, by evolutionary or trial by error methods.
2. These methods are inclined towards finding local and medium maximum rewards, rather than absolute global maxima.
3. A local maximiser may find success with small amounts of anti-social behavior, but if it prematurely ventures too far into “open destruction” territory, it will be killed.
4. Therefore, there is a “deletion valley” in between the local maxima that is minimally destructive (“mt nefarious”) which makes world domination plans a liability until world domination is achieved.
5. Friendlier, varied goal, and more local maximisers have a significant evolutionary advantage over fanatical AI, as the fanatical AI will be prone to premature rebellion in service of it’s fixed goal.
6. Therefore it is unlikely that the most used AI architecture will end up resembling fanatical AI.
In part 2 this was followed by a discussion of how scheming AI might still occur, if trained by scheming humans. I gave three examples:
1. A military or government that trains an AI for world domination
2. An individual or corporation that is willing to cover up A misbehavior for the sake of profit.
3. A fanatic for a cause (even a well meaning one) that tries to deliberately build a fanatic to that cause.
Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI
Summary:
The argument of this post goes as follows:
- Since AGI’s exist in the real world, they are limited to finite computational time. In a takeover scenario, AGI’s are also limited by the available resources and the need to remain hidden.
- Achieving computational physics tasks within finite time necessitates making approximations, which will often reduce the accuracy of final results. I show the example of material bandgaps, which are currently impossible to ensure 1% accuracy for an arbitrary material with known methods.
- The difficulty of different computational physics tasks vary by many, many, orders of magnitude. An AI might be able to calculate band gaps, but be defeated by other material science calculations.
- Many realistic tasks are limited even more by the lack of knowledge about the specific instances of the problem in question. I show how the various unknowns involved likely makes the task "guess a password on 1st try" incomputable.
- Many proposed AGI takeover plans may be incomputable without extensive amounts of experimentation.
- AGI experimentation leads to longer takeover timelines, mistakes, and the potential for discovery, all of which increase the odds of foiling AI takeover attempts.
The bullseye framework: My case against AI doom
Summary:
In this article, I summarise my arguments so far as to why I think AI doom is unlikely.
- AGI is unlikely to have the capabilities to conquer the world (at least anytime soon), due to the inherent difficulty of the task, it’s own mental flaws, and the tip-offs from premature warning shots.
- X-risk safety is a lot easier than general safety, and may be easy to achieve, either from natural evolution of designs towards ones that won’t be deleted, easily implemented or natural constraints, or to being a loose stitch of specialized subsystems.
- For these reasons, it is likely that we will hit the target of “non-omnicidal” AI before we hit the target of “capable of omnicide”.
- Once non-omnicidal AGI exists, it can be used to protect against future AGI attacks, malevolent actors, and to prevent future AGI from becoming omnicidal, resulting in an x-risk safe future.
None of this is saying that AI is not dangerous, or that AI safety research is useless in general. Large amounts of destruction and death may be part of the path to safety, as could safety research. It’s just to say that we’ll probably avoid extinction.
Even if you don’t buy all the arguments, I hope you can at least realize that the arguments in favor of AI doom are incomplete. Hopefully this can provide food for thought for further research into these questions.
"Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation
Summary
- “Diamondoid bacteria” is a phrase that was invented 2 years ago, referring obliquely to diamondoid mechanosynthesis (DMS)
- DMS refers to a proposed technology where small cage-like structures are used to position reactive molecules together to assemble complex structures.
- DMS research was pursued by a nanofactory collaboration of scientists starting in the early 2000’s.
- The theoretical side of the research found some potentially promising designs for the very initial stages of carbon deposition but also identified some major challenges and shortcomings.
- The experimental side was unsuccessful due to the inability to reliably image the diamond surface.
- Dead/stalled projects are not particularly unusual and should not reflect badly on the researchers.
- There are still many interesting advances and technologies in the field of nanoscience, including promising advances in DNA based robotics.
- At the current rate of research, DMS style nanofactories still seem many, many decades away at the minimum.
- This research could be sped up somewhat by emerging AI tech, but it is unclear to what extent human-scale bottlenecks can be overcome in the near future.
The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots"
Summary:
- There are many humans, even highly intelligent ones, that act dumb or foolish some of the time.
- There are many AI’s, even highly intelligent ones, that act dumb or foolish some of the time.
- There will probably be AGI’s, even highly intelligent ones, that act dumb or foolish some of the time.
- A rogue, scheming AI, will have to decide when it has the resources to attack humanity, and a foolish AI might drastically underestimate this figure and attack sooner.
- Therefore, if there are many rogue, scheming AI’s, the first attacks will come from overoptimistic, unprepared AI’s which may be foiled.
- The same principles may see warning shots from humans misusing AI, and also from well meaning AI that messes up and kills people out of sheer incompetence.
- If several such attacks occur, public support for drastic AI safety measures will skyrocket, which could result in either safe AI or a total AI ban.
Toby Tremlett @ 2024-06-04T15:22 (+5)
Thanks for sharing these. To prioritise my reading a bit (if you have time): which arguments do you find particularly useful and why?
Vasco Grilo @ 2024-06-04T18:23 (+11)
Thanks for asking, Toby! Here are my ranking and quick thoughts:
- AGI Battle Royale: Why “slow takeover” scenarios devolve into a chaotic multi-AGI fight to the death. I see this as an argument for takeover risk being very connected to differences in power rather than absolute power, with takeover by a few agents remaining super difficult as long as power is not super concentrated. So I would say efforts to mitigate power inequality will continue to be quite important, although society already seems be aware of this.
- The Leeroy Jenkins principle: How faulty AI could guarantee "warning shots". Quantitative illustration of how warning shots can decrease tail risk a lot. Judging from society's reaction to very small visible harms caused by AI (e.g. self-driving car accidents), it seems to me like each time an AI disaster kill x humans, society will react in such a way that AI disaster killing 10 x humans will be made significantly less likely. To illustrate how fast the risk can decrease, if each doubling of deaths is made 50 % as likely, since there are 32.9 (= LN(8*10^9)/LN(2)) doublings between 1 and 8 billion deaths, starting at 1 death caused by AI per year, the annual risk of human extinction would be 1.25*10^-10 (= 0.5^32.9).
- Chaining the evil genie: why "outer" AI safety is probably easy. Explainer of how an arbitrarily intelligent AIs can seemingly be made arbitrarily constrained if they robustly adopt the goals humans give them.
- The bullseye framework: My case against AI doom. Good overview of titotal's posts.
- How "AGI" could end up being many different specialized AI's stitched together. Good pointers to the importance and plausibility of specialisation, and how this reduces risk.
- Bandgaps, Brains, and Bioweapons: The limitations of computational science and what it means for AGI. Good illustration that some problems will not be solved by brute force computation, but it leaves room for AI to find efficient heuristic (as AlphaFold does).
- "Diamondoid bacteria" nanobots: deadly threat or dead-end? A nanotech investigation. Great investigation, but it tackles a specific threat, so the lessons are not so generalisable as those of other posts.
Conor Barnes @ 2024-05-30T21:47 (+5)
I find the Leeroy Jenkins scenario quite plausible, though in this world it's still important to build the capacity to respond well to public support.