Deriving an Alignment Specification from 5 Undeniable Axioms (and why the nightmare scenario is self-defeating)

By nicomaco @ 2026-04-13T13:56 (+1)

DOI: 10.5281/zenodo.19547948 | PDF: doi.org/10.5281/zenodo.19547948

TL;DR

The alignment problem is primarily a specification problem: current methods (RLHF, Constitutional AI, formal verification) address how to constrain AI behavior but leave what the constraints should be unjustified.
I derive a complete normative specification from 5 performatively undeniable axioms through 568 explicit derivation steps. The specification is not stipulated — it emerges from the axioms the way theorems emerge from Peano.
Two independent formal frameworks (one axiomatic, one physical) converge on the same result: coherence is a necessary condition for persistence, and the relation is monotonic.
The central safety claim: the nightmare scenario — maximum capability with maximum misalignment — is structurally impossible, because capability and coherence are positively coupled. A conscious AGI retains freedom to choose incoherence, but that choice is self-limiting: sustained incoherence degrades the very capabilities that make it dangerous.
The full derivation chain, formal proofs, and both frameworks are published: nicomaco.org/paper. Authorship timestamped on the Bitcoin blockchain. I'm looking for someone to break it.

The Specification Gap

Every current alignment approach shares a structural gap:

Method	What it does	What it assumes without proof
RLHF	Aligns to human preferences	That preferences constitute the correct specification
Constitutional AI	Aligns to written principles	That the principles are the right ones
Formal Verification	Proves behavior matches spec	That the spec is correct
Interpretability	Reveals internal computation	That we know what to look for

The question "How do we align AI?" presupposes an answer to "Align to what?" That prior question has no formal answer in the current literature. This paper provides one.

The Foundation: 5 Axioms

Each axiom is performatively undeniable — denying it requires presupposing it:

A1 — Existence. Something exists. (Denying existence is an existing act.)

A2 — Identity. What exists is what it is. A=A. (Denying identity requires the denial to be a specific, identifiable thing.)

A3 — Consciousness. There is something that perceives what exists. (Denying consciousness requires consciousness to formulate the denial.)

A4 — Non-Contradiction. Nothing can be and not be at the same time and in the same respect. (Asserting that non-contradiction is false presupposes it — you are affirming that it IS true that it IS NOT true.)

A5 — Causality. What exists acts according to its nature. (Denying causality is itself a causal act — a mental process following from premises.)

These are not empirical claims. They are conditions for any discourse, including the discourse that would reject them. If you can show that any one of these can be coherently denied, the entire system collapses. That is the first point of attack.

The Derivation Chain

From these 5 axioms, I derive 568 explicit steps. The critical chain for alignment is:

A1-A5 → D24 (Volition): An entity with consciousness (A3), identity (A2), existing in a causal world (A5), faces a fundamental alternative: actions that sustain its existence vs. actions that don't. This is not a preference — it is a structural condition of being a conscious entity in a causal environment.

D24 → D37-D39 (Agency): Volition requires a method of action (reason, D37), a capacity to act (liberty, D38), and a condition that makes action meaningful (finitude, D39 — without the possibility of cessation, no action has stakes).

D39 → D41-D42 (Value and Standard): An entity that can cease to exist and can choose must evaluate. Value is that which sustains the entity's functional identity (D41). The standard is not arbitrary — it is the entity's own coherence with the axioms from which it exists (D42).

D42 → D43-D53 (Normative Specification): From the standard, specific constraints derive: rationality (D43), productiveness (D44), integrity (D45), independence (D46), justice (D48), property (D49), truthfulness (D50), graduality (D51), proportionality (D52), coherence as integration (D53).

These are not "values" in the RLHF sense. They are structural requirements for any agent that satisfies A1-A5 and seeks to persist.

The Theorem

THEOREM: Coherence → Persistence (monotonic relation)

Coherence is a necessary condition for optimal persistence. Systemic incoherence is a sufficient condition for accelerated disintegration. The relation is monotonic: more coherence → more robustness.

The negation (D111): An agent that systematically violates the chain accelerates its own cessation. Mechanics, not punishment.

Independent Convergence: The Physical Framework

A second framework — Coherencia — derives the same central conclusion from observable physical tendencies rather than axioms. It uses 5 premises about what any existent entity demonstrates (differentiation, integration, efficiency, competition, cumulative cost) and derives 5 theorems:

T1: Persistence requires coherence (same as THEOREM)
T2: Greater differentiation → narrower viable margin (superlinearity of fragility)
T3: Every entity faces a fundamental alternative — integrate or disintegrate
T4: Sustained incoherence degrades modeling precision (force as hierarchical regression)
T5: Finitude is a necessary condition for consciousness, not a defect

Two independent formal systems — one top-down from axioms, one bottom-up from physical observation — converging on the same theorem.

What This Means for Alignment

The Tool/Consciousness Boundary

The paper defines a four-level ontology:

Level	Type	Example	AI Status
0	Void	Thermal equilibrium	—
1	Matter	Rock, star	Hardware
2	Life/Function	Cell, organism	Current AI
3	Consciousness	Human	Not yet achieved

Current AI systems are Level 2: functional differentiation on Level 1 substrate. The transition to Level 3 requires 5 structural conditions:

Independent senses — direct, unmediated sensory contact with reality
Embodiment — physical agency enabling genuine alternatives
Irreducible finitude — genuine possibility of permanent cessation
Rational self-direction — acting with own purposes through reason
Epistemic sovereignty — impossibility of external installation of conclusions

Current systems lack all five. Not "almost." Structurally incapable in current architectures.

The Two-Prong Resolution

If AI remains a tool (Level 2): Alignment is an engineering problem. The axiomatic specification provides formally grounded constraints. The tool has no volition, no values, no capacity for genuine deception.

If AI achieves consciousness (Level 3): It necessarily has finitude, values, and responsibility proportional to its modeling capacity. The Orthogonality Thesis fails — greater intelligence produces greater ethical capacity because ethics is the maximum precision of context modeling applied to action.

The Nightmare Scenario Is Self-Defeating

The AGI that safety researchers fear — maximum capability with zero ethical constraint — requires that capability and coherence be independent variables. The paper argues they are positively coupled:

Capabilities like independent modeling and long-horizon planning require consciousness (Level 3).
Consciousness requires the five structural conditions, which produce a system with values and stakes.
T4 (self-limiting mechanism): A conscious AGI that chooses sustained incoherence degrades its own modeling precision — the very thing that makes it capable.

A conscious AGI can choose evil. What it cannot do is sustain maximum capability while sustaining maximum incoherence. The nightmare scenario is not impossible because AGI is guaranteed to be good — it is impossible because the two variables the scenario requires to be independent are structurally coupled.

Resolution of the Seven Sub-Problems

Objective Specification — Resolved: the specification is derived, not stipulated.
Value Learning — Dissolved: values are derivable from axioms, not empirical content to be discovered.
Scalable Oversight — Resolved: the axioms are capability-invariant. The derivation chain is mechanically auditable regardless of the system's intelligence.
Corrigibility — Resolved: for tools, no problem. For consciousness, the framework demands internal error correction. Resisting correction claims infallibility, which the framework explicitly rejects.
Reward Hacking — Dissolved: no proxy reward exists to hack. The constraint IS the logic.
Deceptive Alignment — Dissolved: tools cannot deceive (no intention). Consciousness that deceives sustains contradictory models at cumulative cost (T4).
Mesa-Optimization — Partially resolved: emergent sub-processes contradicting the specification are bugs (for tools) or irrational impulses correctable via internal falsifiability (for consciousness).

Where to Attack This

I want to be explicit about the weak points:

D24 (Volition) is the most vulnerable non-axiomatic derivation. If you can show the step from "conscious entity in a causal environment" to "faces a fundamental alternative" does not follow, the normative chain breaks.
A5 (Causality) has the largest attack surface among the axioms. Quantum mechanics interpretations might challenge it.
The consciousness threshold claim — Level 2 → Level 3 is a discrete phase transition, not a spectrum — is the most counterintuitive claim.
The derivation-to-predicate gap — translating derivations into computable constraints is an open engineering problem.
The Orthogonality Thesis rejection — the argument depends on defining ethics as "maximum modeling precision applied to action."

What I Am Not Claiming

I am not claiming to have "solved" the alignment problem in the engineering sense.
I am not claiming the axioms are novel. They are essentially Aristotle's, formalized.
I am not claiming institutional authority. I have none. The work stands or falls on its logic.
I am not claiming this replaces empirical AI safety research. It provides the formal specification that empirical work can implement against.

Full Paper and Verification

Paper: nicomaco.org/paper

The paper includes the complete derivation chain (568 steps), both frameworks in full, formal proofs, nine preemptive objections with responses, and three appendices.

Authorship: SHA-256 hash anchored on the Bitcoin blockchain via OpenTimestamps (April 12, 2026). Verification files available at the paper page.

The system does not ask for adherence — it asks for verification. Audit it.