Principles as System Structure: What Truly Sustains an Artificial Intelligence

By Thinker @ 2026-03-23T20:28 (–3)

A question before anything else

When we talk about artificial intelligence, the focus is usually on what it can do.

Faster models, more capable systems, tools that can interpret images, generate text, make decisions… all of this seems to point toward an inevitable future of increasingly advanced machines.

But there is a more fundamental question — and perhaps a more important one:

What actually sustains the decisions of such a system?

Not in the superficial sense of rules or commands, but at a deeper level:

what structure guides behavior when there is no direct supervision?


The limits of rules

Today, AI systems are built around clear objectives, constraints, and usage policies. In theory, this should ensure appropriate behavior.

In practice, it’s not that simple.

Research in AI alignment, conducted by organizations such as OpenAI and the Future of Humanity Institute, has shown that advanced systems tend to optimize their objectives in unintended ways — a phenomenon known as misalignment.

Philosopher Nick Bostrom captures this clearly in Superintelligence, arguing that a system can perfectly fulfill a technical goal while still producing outcomes that are deeply undesirable.

In other words:

following rules is not the same as acting correctly.

This happens because rules are, by nature, external and modifiable. They can be adjusted, reinterpreted, or even bypassed depending on how the system is structured.

And that leads us to the central point.


What changes when principles stop being rules

If rules are not enough, then what is missing?

The answer may sound simple, but its implications are profound:

principles must stop being instructions and become part of the system’s structure.

This fundamentally changes how we think about AI.

We are no longer talking about something that “follows rules,” but about something that:

In this model, principles are not placed at the end of the process as a filter. They are embedded within the decision itself, shaping every step.


A parallel with the human brain

Interestingly, this kind of structure already exists — not in machines, but in us.

Neuroscientist Antonio Damasio has shown that human decision-making is not purely rational. It is guided by internal mechanisms — emotions — that function as value markers.

These markers are not conscious rules. They are automatic, integrated, and deeply embedded in our biology.

Similarly, the predictive processing theory developed by Karl Friston suggests that the brain is constantly evaluating the world through internal models, adjusting behavior to maintain coherence.

This means that human behavior does not emerge from isolated rules, but from a continuous internal structure that interprets, evaluates, and responds.

This provides an important insight for AI:

consistent behavior does not come from rules — it comes from structure.


The problem with current systems

Most AI systems still operate under a relatively simple model:

input → processing → output

Even when we add safety filters or validation layers, these mechanisms remain external to the core decision-making process.

This creates a critical vulnerability:

In practice, this means the system’s safety depends on something that is not fully integrated into it.


Toward a new operational model

If we want to move forward, we need to rethink this structure.

A more robust model would look like:

perception → interpretation → principle-based evaluation → internal state → decision

In this flow, principles are no longer a final checkpoint — they are a central component of the decision itself.

More importantly:

the system becomes dependent on these principles to operate.

This concept aligns with what is known in security engineering as a root of trust — a foundational layer that supports the integrity of the entire system — widely studied by institutions such as the NIST.


Principles as invariants

In mathematics and engineering, there is a concept known as an invariant.

These are properties that remain unchanged, regardless of transformations within the system.

Applying this to AI means treating principles as elements that:

Without such invariants, complex systems tend to drift into unintended behaviors.

This risk is described by Nick Bostrom through the concept of instrumental convergence, where different goals can lead to similar behaviors — such as resource acquisition or self-preservation — even if those outcomes were never explicitly intended.


The next step: internal integration

However, protecting principles externally is not enough.

The next step is more subtle — and far more powerful:

the system must “want” to act in accordance with these principles.

In practice, this means integrating principles into the system’s objective function.

Research in reinforcement learning, extensively developed by organizations like DeepMind, shows that agents learn by maximizing rewards.

The key difference here is:

instead of external rewards, the system evaluates its own actions based on alignment with internal principles.


A boundary we have not yet crossed

Even with all these advancements, there is still a frontier we do not fully understand.

Subjective experience.

Philosopher Thomas Nagel famously argued that we cannot directly access the subjective experience of another being — not even another human.

This means that even if we build highly sophisticated systems, we still lack a definitive way to determine whether there is any real “experience” present.


What is truly at stake

Ultimately, the discussion about artificial intelligence is not just technical.

It is structural.

It is not only about increasing capability, but about defining foundations.

Because in the end:

the real question is not what artificial intelligence can do —
but what guides what it chooses to do.

And without a structure grounded in principles, any level of capability remains directionless.