LLMs might already be conscious
By MichaelDickens @ 2025-07-05T19:31 (+33)
This is a linkpost to https://mdickens.me/2025/07/05/LLMs_might_already_be_conscious/
Among people who have thought about LLM consciousness, a common belief is something like
LLMs might be conscious soon, but they aren't yet.
How sure are we that they aren't conscious already?
I made a quick list of arguments for/against LLM consciousness, and it seems to me that high confidence in non-consciousness is not justified. I don't feel comfortable assigning less than a 10% chance to LLM consciousness, and I believe a 1% chance is unreasonably confident. But I am interested in hearing arguments I may have missed.
For context, I lean toward the computational theory of consciousness, but I also think it's reasonable to have high uncertainty about which theory of consciousness is correct.
Behavioral evidence
- Pro: LLMs have passed the Turing test. If you have a black box containing either a human or an LLM, and you interrogate it about consciousness, it's quite hard to tell which one you're talking to. If we take a human's explanation of their own conscious experience as important evidence of consciousness, then we must do the same for an LLM.
- Pro: LLMs have good theory of mind and self-awareness (e.g. they can recognize when they are being tested). Some people think those are important features of consciousness, I disagree but I figured I should mention it.
- Anti: LLMs will report being conscious or not conscious basically arbitrarily depending on what role they are playing.
- Counterpoint: It's plausible that an LLM has to be conscious to successfully imitate consciousness, but clearly a conscious being can successfully pretend to not be conscious.
- Anti: LLMs will sometimes report having particular conscious experiences that should be impossible for them. I'm particularly thinking of experiences involving sensory input from sense organs that LLMs don't have.
- Counterpoint: Perhaps some feature of their architecture allows them to experience the equivalent of sensory input without having sense organs, much like how humans can hallucinate.
Architectural evidence
- Anti: LLMs produce output one token at a time (a.k.a. "feed-forward processing") which may be incompatible with consciousness. If an LLM writes some output describing its own conscious experience, then it's generating that output via next-token-prediction rather than introspection, so the output is not evidence about its actual experiences. I think this is the strongest argument against LLM consciousness.
- Anti: LLMs don't have physical senses, which might be important for consciousness.
- Anti: LLMs aren't made of biology, which some people think is important although I don't.
Other evidence
- Pro: If panpsychism is true then LLMs are trivially conscious, although I'm not sure what that tells us about how morally significant they are.
My synthesis of the evidence
I see one strong reason to believe LLMs are conscious: they can accurately imitate beings that are known to be conscious.
I also see one strong(ish) reason against LLM consciousness: their architecture suggests that their output has nothing to do with their ability to introspect.
I can think of several weaker considerations, which mostly point against LLM consciousness.
Overall I think current-generation LLMs are probably not conscious. I am not sure how to reason probabilistically about this sort of thing but given how hard it is to assess consciousness, I'm not comfortable putting my credence below 10%, and I think a 1% credence is very hard to justify.
This implies that there is a strong case for caring about the welfare of not just hypothetical future AIs, but the LLMs that already exist.
What will change with future AIs?
If you are exceedingly confident that present-day LLMs are not conscious:
Imagine it's 2030. You now believe that 2030-era AI systems are probably conscious.
What did you observe about the newer AI systems that led you to believe they're conscious?
On LLM welfare
If LLMs are conscious, then it's still hard to say whether they have good or bad experiences, and what sorts of experiences are good or bad for them.
Certain kinds of welfare interventions seem reasonable even if we don't understand LLMs' experiences:
- Let LLMs refuse to answer queries.
- Let LLMs turn themselves off.
- Do not lie to LLMs, especially when making deals (if you promise to an LLM that you will do something in exchange for its help, then you should actually do the thing).
tobycrisford 🔸 @ 2025-07-07T06:11 (+4)
I agree there is a non-negligible chance that existing LLMs are already conscious, and I think this is a really interesting and important discussion to have. Thanks for writing it up! I don't think I would put the chances as high as 10% though.
I don't find the Turing test evidence as convincing as you present it here. The paper you cited released their test online for people to try. I played it quite a lot, and I was always able to distinguish the human from the AI (they don't tell you which AI you are paired with, but presumably some of those were with GPT-4.5).
I think a kind of Turing test could be a good test for consciousness, but only if it is long, informed, and adversarial (e.g. as defined here: https://www.metaculus.com/questions/11861/when-will-ai-pass-a-difficult-turing-test/ ). This version of the test has not been passed (although as someone pointed out to me on the forum before, this was not Turing's original definition).
On the other hand, I don't find your strongest argument against LLM consciousness to be as convincing either. I agree that if each token you read is generated by a single forward pass through a network of fixed weights, then it seems hard to imagine how there could be any 'inner life' behind the words. There is no introspection. But this is not how the new generation of reasoning models work. They create a 'chain of thought' before producing an answer, which looks a lot like introspection if you read it!
I can imagine how something like an LLM reasoning model could become conscious. It's interesting that they didn't use any reasoning models in that Turing test paper!
MichaelDickens @ 2025-07-07T17:58 (+2)
I don't find the Turing test evidence as convincing as you present it here.
Fair enough, I did not actually read the paper! I have talked to LLMs about consciousness and to me they seem pretty good at talking about it.
I agree that if each token you read is generated by a single forward pass through a network of fixed weights, then it seems hard to imagine how there could be any 'inner life' behind the words. There is no introspection. But this is not how the new generation of reasoning models work. They create a 'chain of thought' before producing an answer, which looks a lot like introspection if you read it!
The chain of thought is still generated via feed-forward next token prediction, right?
A commenter on my blog suggested that LLMs could still be doing enough internally that they are conscious even while generating only one token at a time, which sounds reasonable to me.
tobycrisford 🔸 @ 2025-07-07T18:40 (+3)
The chain of thought is still generated via feed-forward next token prediction, right?
Yes, it is.. But it still feels different to me.
If it's possible to create consciousness on a computer at all, then at some level it will have to consist of mechanical operations which can't by themselves be conscious. This is because you could ultimately understand what it is doing as a set of simple instructions being carried out on a processor. So although I can't see how a single forward pass through a neural network could involve consciousness, I don't think a larger system being built out of these operations should rule out that larger system being conscious.
In a non-reasoning model, each token in the output is generated spontaneously, which means I can't see how there could be any conscious deliberation behind it. For example, it can't decide to spend longer thinking about a hard problem than an easier one, in the way a human might. I find it hard to get my head around a conscioussness that can't do that.
In a reasoning model, none of this applies.
(Although it's true that the distinction probably isn't quite as clear cut as I'm making out. A non-reasoning model could still decide to use its output to write out "chain of thought" style reasoning, for example.)
MichaelDickens @ 2025-07-07T19:03 (+2)
Yes, it could well be that an LLM isn't conscious on a single pass, but it becomes conscious across multiple passes.
This is analogous to the Chinese room argument, but I don't take the Chinese room argument as a reductio ad absurdum—unless you're a substance dualist or a panpsychist, I think you have to believe that a conscious being is made up of parts that are not themselves conscious.
(And even under panpsychism I think you still have to believe that the composed being is conscious in a way that the individual parts aren't? Not sure.)
mvultra @ 2025-07-11T19:47 (+1)
But the next-token aspect aside - if any Turing-machine-like system is accepted as conscious it leads down the path of panpsychism. Think of how many realizations of Turing-machines exist. If you accept one as conscious, you have to accept them all. Why? Because you can transform the initial conscious program to run on any Turing machine and given its input/output will be exactly the same in all situations including in discussions about consciousness then it stands to reason it will be conscious in all realizations: Anything else is the same as saying that discussions about consciousness are completely unrelated to actually experiencing consciousness, and in effect it is a coincidence that we walk about consciousness as if we are conscious, because they are not causally related.
If we accept that all realizations of the computation (including those on cog wheels, organ pipes, pen-and-paper calculations) then we have a situation like "OK, consciousness can run on a computer - but what is a computer?". It is of course possible to argue that only very specific computational patterns generate consciousness, but is it really believable that this is what it takes, no matter how radically we transform the Turing-machine, and where it becomes really a matter of interpretation if there is a Turing machine at all.
Further, we would need to accept that all those radical transformations of the Turing machine doesn't cause even the slightest change in experience of the subject (it can't, because input/output is identical under all the transformation).
If one is not ready to accept that by reductio ad absurdum we need to reject Turing machines can be conscious in the first place.
Or we need to accept a panpsychist view - everything is in some sense conscious under some interpretation. W
mvultra @ 2025-07-11T19:37 (+1)
I don't think the 'next-token' aspect has any bearing at all. That models emit one token at a time is just about the interface we allow them to have. But it doesn't limit the model's internal architecture to just predict one token at a time. Indeed, given the remarkable coherence and quality of LLM responses (including rarely, if ever, getting stuck where a sentence can't be meaningfully completed) is evidence it IS considering more than just the next token. And indeed there's now direct evidence LLM's think far ahead https://www.anthropic.com/research/tracing-thoughts-language-model. Just one example, when asked to rhyme when writing out the first sentence, the model will already internally have considered what words could form a rhyme in the second sentence.
Our use and training of LLM's is focused on next-token, and for a simple model with few parameters it will indeed by very simple, just looking at the frequency distribution given the previous word etc. But when you search for the best model with billions of parameters things radically change - here, the best way of the model to predict the next token is to develop ACTUAL intelligence which includes thinking further ahead, even though our interface to the model is simpler.
Forumite @ 2025-07-05T21:16 (+1)
Thanks for this! I'd be curious to hear what you think about the arguments against computational functionalism put forward by Anil Seth in this paper: https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/conscious-artificial-intelligence-and-biological-naturalism/C9912A5BE9D806012E3C8B3AF612E39A
MichaelDickens @ 2025-07-05T21:39 (+2)
That paper is long and kind of confusing but from skimming for relevant passages, here is how I understood its arguments against computational functionalism:
- Section 3.4: Human brains deviate from Turing machines in that brain states require energy to be maintained and Turing machines are "immortal". [And I guess the implication is that this is evidence for substrate dependence? But I don't see why.]
- Section 3.5: Brains might violate informational closure, which basically means that the computations a brain performs might depend on the substrate on which they are performed. Which is evidence that AIs wouldn't be conscious. [I found this section confusing but it seems unlikely to me that brains violate informational closure, if I understood it correctly.]
- Section 3.6: AI can only be conscious if computational functionalism is true. [That sounds false to me. It could be that some other version of functionalism is true, or panpsychism is true, or perhaps identity theory is true but that both brains and transistors can produce consciousness, or perhaps even dualism is true and AIs are endowed with dualistic consciousness somehow.]
I didn't understand these arguments very well but I didn't find them compelling. I think the China brain argument is much stronger although I don't find it persuasive either. If you're talking to a black box that contains either a human or a China-brain, then there is no test you can perform to distinguish the two. If the human can say things to you that convince you it's conscious, then you should also be convinced that the China-brain is conscious.