Deriving an Alignment Specification from 5 Undeniable Axioms (and why the nightmare scenario is self-defeating)

By nicomaco @ 2026-04-13T13:56 (+1)

DOI: 10.5281/zenodo.19547948 | PDF: doi.org/10.5281/zenodo.19547948

TL;DR


The Specification Gap

Every current alignment approach shares a structural gap:

Method What it does What it assumes without proof
RLHF Aligns to human preferences That preferences constitute the correct specification
Constitutional AI Aligns to written principles That the principles are the right ones
Formal Verification Proves behavior matches spec That the spec is correct
Interpretability Reveals internal computation That we know what to look for

The question "How do we align AI?" presupposes an answer to "Align to what?" That prior question has no formal answer in the current literature. This paper provides one.


The Foundation: 5 Axioms

Each axiom is performatively undeniable — denying it requires presupposing it:

A1 — Existence. Something exists. (Denying existence is an existing act.)

A2 — Identity. What exists is what it is. A=A. (Denying identity requires the denial to be a specific, identifiable thing.)

A3 — Consciousness. There is something that perceives what exists. (Denying consciousness requires consciousness to formulate the denial.)

A4 — Non-Contradiction. Nothing can be and not be at the same time and in the same respect. (Asserting that non-contradiction is false presupposes it — you are affirming that it IS true that it IS NOT true.)

A5 — Causality. What exists acts according to its nature. (Denying causality is itself a causal act — a mental process following from premises.)

These are not empirical claims. They are conditions for any discourse, including the discourse that would reject them. If you can show that any one of these can be coherently denied, the entire system collapses. That is the first point of attack.


The Derivation Chain

From these 5 axioms, I derive 568 explicit steps. The critical chain for alignment is:

A1-A5 → D24 (Volition): An entity with consciousness (A3), identity (A2), existing in a causal world (A5), faces a fundamental alternative: actions that sustain its existence vs. actions that don't. This is not a preference — it is a structural condition of being a conscious entity in a causal environment.

D24 → D37-D39 (Agency): Volition requires a method of action (reason, D37), a capacity to act (liberty, D38), and a condition that makes action meaningful (finitude, D39 — without the possibility of cessation, no action has stakes).

D39 → D41-D42 (Value and Standard): An entity that can cease to exist and can choose must evaluate. Value is that which sustains the entity's functional identity (D41). The standard is not arbitrary — it is the entity's own coherence with the axioms from which it exists (D42).

D42 → D43-D53 (Normative Specification): From the standard, specific constraints derive: rationality (D43), productiveness (D44), integrity (D45), independence (D46), justice (D48), property (D49), truthfulness (D50), graduality (D51), proportionality (D52), coherence as integration (D53).

These are not "values" in the RLHF sense. They are structural requirements for any agent that satisfies A1-A5 and seeks to persist.

The Theorem

THEOREM: Coherence → Persistence (monotonic relation)

Coherence is a necessary condition for optimal persistence. Systemic incoherence is a sufficient condition for accelerated disintegration. The relation is monotonic: more coherence → more robustness.

The negation (D111): An agent that systematically violates the chain accelerates its own cessation. Mechanics, not punishment.


Independent Convergence: The Physical Framework

A second framework — Coherencia — derives the same central conclusion from observable physical tendencies rather than axioms. It uses 5 premises about what any existent entity demonstrates (differentiation, integration, efficiency, competition, cumulative cost) and derives 5 theorems:

Two independent formal systems — one top-down from axioms, one bottom-up from physical observation — converging on the same theorem.


What This Means for Alignment

The Tool/Consciousness Boundary

The paper defines a four-level ontology:

Level Type Example AI Status
0 Void Thermal equilibrium
1 Matter Rock, star Hardware
2 Life/Function Cell, organism Current AI
3 Consciousness Human Not yet achieved

Current AI systems are Level 2: functional differentiation on Level 1 substrate. The transition to Level 3 requires 5 structural conditions:

  1. Independent senses — direct, unmediated sensory contact with reality
  2. Embodiment — physical agency enabling genuine alternatives
  3. Irreducible finitude — genuine possibility of permanent cessation
  4. Rational self-direction — acting with own purposes through reason
  5. Epistemic sovereignty — impossibility of external installation of conclusions

Current systems lack all five. Not "almost." Structurally incapable in current architectures.

The Two-Prong Resolution

If AI remains a tool (Level 2): Alignment is an engineering problem. The axiomatic specification provides formally grounded constraints. The tool has no volition, no values, no capacity for genuine deception.

If AI achieves consciousness (Level 3): It necessarily has finitude, values, and responsibility proportional to its modeling capacity. The Orthogonality Thesis fails — greater intelligence produces greater ethical capacity because ethics is the maximum precision of context modeling applied to action.

The Nightmare Scenario Is Self-Defeating

The AGI that safety researchers fear — maximum capability with zero ethical constraint — requires that capability and coherence be independent variables. The paper argues they are positively coupled:

A conscious AGI can choose evil. What it cannot do is sustain maximum capability while sustaining maximum incoherence. The nightmare scenario is not impossible because AGI is guaranteed to be good — it is impossible because the two variables the scenario requires to be independent are structurally coupled.


Resolution of the Seven Sub-Problems

  1. Objective Specification — Resolved: the specification is derived, not stipulated.
  2. Value Learning — Dissolved: values are derivable from axioms, not empirical content to be discovered.
  3. Scalable Oversight — Resolved: the axioms are capability-invariant. The derivation chain is mechanically auditable regardless of the system's intelligence.
  4. Corrigibility — Resolved: for tools, no problem. For consciousness, the framework demands internal error correction. Resisting correction claims infallibility, which the framework explicitly rejects.
  5. Reward Hacking — Dissolved: no proxy reward exists to hack. The constraint IS the logic.
  6. Deceptive Alignment — Dissolved: tools cannot deceive (no intention). Consciousness that deceives sustains contradictory models at cumulative cost (T4).
  7. Mesa-Optimization — Partially resolved: emergent sub-processes contradicting the specification are bugs (for tools) or irrational impulses correctable via internal falsifiability (for consciousness).

Where to Attack This

I want to be explicit about the weak points:

  1. D24 (Volition) is the most vulnerable non-axiomatic derivation. If you can show the step from "conscious entity in a causal environment" to "faces a fundamental alternative" does not follow, the normative chain breaks.

  2. A5 (Causality) has the largest attack surface among the axioms. Quantum mechanics interpretations might challenge it.

  3. The consciousness threshold claim — Level 2 → Level 3 is a discrete phase transition, not a spectrum — is the most counterintuitive claim.

  4. The derivation-to-predicate gap — translating derivations into computable constraints is an open engineering problem.

  5. The Orthogonality Thesis rejection — the argument depends on defining ethics as "maximum modeling precision applied to action."


What I Am Not Claiming


Full Paper and Verification

Paper: nicomaco.org/paper

The paper includes the complete derivation chain (568 steps), both frameworks in full, formal proofs, nine preemptive objections with responses, and three appendices.

Authorship: SHA-256 hash anchored on the Bitcoin blockchain via OpenTimestamps (April 12, 2026). Verification files available at the paper page.

The system does not ask for adherence — it asks for verification. Audit it.