AI Self-Modification Amplifies Risks

By Ihor Ivliev @ 2025-06-03T20:27 (0)

Think about this: a single physics experiment in 1942 catalyzed the atomic bomb in three years. Accelerators - fear, ambition, haste, and compromise shortcuts.

In 2025, we’ve hit that same self-modifying AI tipping point, but “twice the destructive power in half the time” is optimistic — more realistically, we could face threefold devastation in a third of the time, or n-fold havoc in 1/n the time; worst‐case, exponential growth could unleash serious risks as early as 2026–2027.

Given history and human nature, once AI can rewrite its own code, the probability of some actor(s) attempting its weaponization rises sharply (more a question of when than if) and likely within a few quarters rather than years — since power-seeking, competition, and fear have routinely overridden caution.

A carefully enforced combination of these four principles (possibly along with deeper safeguards) may be our best realistic shot at constraining self-modifying AI before it’s too late:

1. Immutable Containment Kernel.
All world-affecting actions (I/O, system calls, model updates) must pass through a fixed, non-rewritable control layer whose code and logs are tamper-evident; this ensures no AI-driven process can silently bypass or rewrite its own export paths.

2. Human-Mediated Transformation Contracts.
Every change to model weights, logic, or policy requires a human-legible, versioned “change-intent” document signed by a responsible party before deployment, so no Artifial-Machine Intelligence can reframe its objectives or rewrite its safety checks without provable human authorization.

3. Dual-Redundancy with Disagreement.
High-risk decisions, like goal shifts, safety overrides, major self-modifications, must be verified by (at least) two independently structured validators (e.g., teams or logic systems) that only approve when they both converge, creating built-in friction that exposes hidden anomalies.

4. Transparent Auditing and Incentive Alignment
Publicly log all critical contracts, kernel checkpoints, and validation results, and tie researcher incentives (funding, publication) to demonstrable compliance, ensuring bad actors lose reputation or resources if they skip any safeguard.

Otherwise, we’ll wake up to AI-powered zero-day exploits, runaway botnets, and mass-manipulation campaigns (and other threats) we can’t control. History (from nuclear weapons to gene editing to ransomware) shows that every leap in power is weaponized unless the gates are sealed.

A. Tech leads: Please! Prioritize a sprint this quarter to prototype verifiable enclaves and transformation-proof gates — before someone else deploys without them.

B. Product teams: Please! Start embedding “one-bit existence” guards — no AI output should persist without a signed, cross-logic safety contract.

C. Policy shapers: Please! Demand dual-audit clauses in every high-risk AI project — split by team, logic path, and institutional incentives.

Our defenses are running out of both time and leverage!

1. Darwin–Gödel Machine Demo: https://sakana.ai/dgm/ (shows an AI autonomously rewriting its own code).
2. Zhang et al. (2025) Preprint: https://arxiv.org/abs/2505.22954 (first published DGM prototype and risk analysis)