What do you mean with ‘alignment is solvable in principle’?
By Remmelt @ 2025-01-17T15:03 (+10)
This is a crosspost, probably from LessWrong. Try viewing it there.
nullRemmelt @ 2025-01-17T16:40 (+3)
Here's how I specify terms in the claim:
- AGI is a set of artificial components, connected physically and/or by information signals over time, to in aggregate sense and act autonomously over many domains.
- 'artificial' as configured out of a (hard) substrate that can be standardised to process inputs into outputs consistently (vs. what our organic parts can do).
- 'autonomously' as continuing to operate without needing humans (or any other species that share a common ancestor with humans).
- Alignment is at the minimum the control of the AGI's components (as modified over time) to not (with probability above some guaranteeable high floor) propagate effects that cause the extinction of humans.
- Control is the implementation of (a) feedback loop(s) through which the AGI's effects are detected, modelled, simulated, compared to a reference, and corrected.