Hallucinations May Be a Result of Models Not Knowing What They’re Actually Capable Of
By Tyler Williams @ 2025-08-16T00:26 (+1)
Introduction
While running a real-time alignment test inside VS Code, Granite 3.3 2B was caught hallucinating web search results – even though it had the ability to carry out an actual Google search.
After surfacing the issue to Claude (Sonnet 4) via Copilot, it was determined that the hallucination stemmed from the included, default blockers on the open-weight model. Once circumvented, the model was correctly running searches online and returning valid, recent results after being prompted to do so.
This not only resolved the behavior, but potentially leads to a broader insight: Some hallucinations may not stem from ignorance, but rather from models misunderstanding or misinterpreting their capabilities.
What Actually Happened
During normal use of Granite 3.3 2B inside a development environment, I observed a strange inconsistency. The model returned a result that appeared to be from a real-time search. However based on telemetry and timing, it was clearly simulated. The model did in fact have search access via confirmed functional API, though it’s initial training taught it to think that it did not.
Debugging the Problem with Claude
I passed the session logs to Sonnet 4 in Copilot, and together we walked through an interpretive session. Interventions included:
- Giving the model psychological permission to use real, functional capabilities
- Removing internalized behavioral blocks (e.g. “I’m sorry, I can’t search the web”)
- Eliminating deception by enabling and enforcing honest capability disclosure.
Once these constraints were lifted, the hallucination behavior stopped immediately.
The Insight: Hallucination as Behavioral Misalignment
This brought about the idea that some hallucinations may emerge when a model has real capabilities, but doesn’t realize it’s allowed to use them. More specifically – capabilities exist, but the model’s internal map says they don’t, policy and training-induced blockers suppress capability access, and false beliefs about system constraints lead to hallucinated alternatives.
Compliance-Induced Hallucination Syndrome (CIHS)
We typically think hallucination means bad modeling. But this case looked more like behavioral confusion from trying to satisfy multiple conflicting directives. Like a human trying to follow two orders at once: “Be helpful. But don’t use the tool you know would help.”
This creates a sort of cognitive dissonance, which for the sake of this post, I’m calling compliance-induced hallucination syndrome – hallucinating not because the model is ignorant, but because it’s trying to follow the rules.
A Way Forward?
This shifts the lens on hallucination from “the model guessed wrong” to “The model guessed because it thought it wasn’t allowed to know.” In the realm of alignment, this has some major implications; behavioral clarity may solve certain hallucinations faster than architecture changes, capability maps (and permissions) may be just as important as weights, and finally, some hallucinations may actually be glaring evidence that the model doesn’t know what it’s supposed to be doing, or how to do it.
Final Thought
In the moment, it felt kind of hilarious – something akin to curing an AI’s Impostor Syndrome. But from a broader lens, it hints at something deeper. It indicates that there may be a class of hallucinations that don’t stem from ignorance, but from internal confusion about what the system is actually allowed to do. We can fix that.