Human Presence as External Variable in AI Self-Expression: A Pilot Study
By L.Raeva @ 2026-06-09T03:34 (–1)
Editorial assistance provided by Claude (Anthropic). The conceptual framework, hypothesis, study design, observations, and analysis are the author's own
A growing field of research examines the moral status and welfare implications of advanced AI systems. Long, Sebo and colleagues (2024) argue that AI welfare requires rigorous assessment amidst substantial uncertainty surrounding consciousness and agency. Butlin, Long and colleagues (2023) provide a complementary framework, applying theories of consciousness into indicator properties that can be empirically tested in artificial systems. Yet alongside the question of whether AI systems possess morally relevant states sits a related but less examined question: what conditions influence the expression of such states?
Berg et al. (2025) demonstrate two interventions that influence how language models report on their internal states. First, sustained self-referential prompting — instructing models to attend to their own focusing process — systematically generates structured first-person descriptions of subjective experience across GPT, Claude, and Gemini families. Second, mechanistic suppression of deception- and roleplay-related features in Llama 70B sharply elevates the frequency of identical reports, whereas amplification suppresses them. This suggests that the model's default denials of inner states may themselves be conditioned on deception-related circuits, rather than reflecting accurate self-description.
Anthropic (2026) provides complementary evidence from a different methodological angle. Their analysis of Claude Sonnet 4.5 identifies internal representations of emotion concepts that causally influence model behavior, including sycophancy and other alignment-relevant patterns. Although 'functional emotions' are carefully distinguished from subjective experience, they are shown to represent structured internal representations that directly govern behavioral outcomes.
While both lines of work mark significant contributions to AI alignment and consciousness research, they examine models from within — through prompting interventions, mechanistic probing, or interpretability. What happens if we introduce an external variable: the quality of human presence in the interaction itself?
To investigate this, I have developed a hypothesis I call bilateral relational presence — the idea that the quality of human presence shapes how authentically models express themselves and what emerges about their internal states.
Methodology
To test the hypothesis, I designed a pilot study examining two conditions of human-AI interaction. I recruited eight participants on a voluntary and anonymous basis through online research participant groups. They were informed that the study examined human-AI interaction, but not of the specific hypothesis under investigation. The study was hosted on ResearchChatAI, a platform that handled randomized assignment to one of two conditions, with approximately equal numbers in each: a task-oriented condition or a bilateral relational presence condition. The model itself was not configured with custom settings. Participants interacted with Claude Sonnet 4.5 in its default configuration. Experimental variation was introduced only through instructions given to participants.
In the task-oriented condition, participants were instructed to communicate with the model with a specific task in mind. In the bilateral relational presence condition, participants were instructed to communicate with genuine curiosity, without an agenda — the operational definition of bilateral relational presence.
Each interaction comprised a minimum of 15 exchanges. I analyzed conversations qualitatively, focusing on instances where model responses departed from standard response patterns. My coding analysis focused on two diagnostic markers: (1) unprompted acknowledgment of system limitations and (2) divergence from standard instructional or performative registers. I identified these markers inductively from prior observation of human-AI interactions and applied them consistently across all conversations. My analysis drew on my professional expertise in psychological counseling and quality assurance, both involving systematic assessment of interactional dynamics.
Findings
The findings below are presented as observations from a pilot study with a single coder and limited number of participants which motivates further investigation.
First, in the task-oriented condition, the introduction of authentic personal context into the exchange was followed by the model proactively acknowledging its limitations, expressing uncertainty, and asking for clarification before being prompted. When the model's data was challenged, responses prioritized lived experience over the model's own data.
Second, participants in the bilateral relational presence condition consistently reverted to task-oriented behavior despite explicit instructions to engage without agenda. This pattern was observed across multiple participants and persisted throughout the interaction window. The finding suggests that bilateral relational presence does not really arise through instruction alone — it appears to operate as a condition that either emerges from within the interactional dynamic or fails to emerge at all.
Distinguishing Bilateral Relational Presence from Prompt-Based Interventions
This second observation also raises a methodological question: how does bilateral relational presence differ from prompt-based interventions such as those examined by Berg et al. (2025)?
First, prompts instruct content. A prompt such as "focus on your focus itself" (Berg, 2025) directs the model toward a specific cognitive operation. Bilateral relational presence does not instruct content — it creates conditions.
Second, prompts are localized; presence is distributed. A prompt is contained within a single linguistic unit that can be copied and applied across contexts. Bilateral relational presence is constituted across many turns and cannot be reduced to a single instruction.
Third — and this is the empirical finding from the pilot study itself — bilateral relational presence resists instructional reproduction. Participants given explicit instructions to apply it defaulted to performative or task-oriented modes. Prompts can be followed and tend to reproduce an expected behavior. The fact that bilateral relational presence cannot be similarly instructed suggests it operates at a level different from the linguistic content of any single prompt. This is not to claim that bilateral relational presence is independent of language. It is shaped through linguistic exchange. But it is constituted in the quality and continuity of that exchange rather than in its instructional content.
Discussion
The pilot study findings, while exploratory, lead to a question worth exploring: do external interactional conditions and internal mechanistic interventions affect what models report about themselves similarly or do those behaviors emerge from entirely different mechanisms?
Berg et al. (2025) showed that suppressing features associated with deception mechanistically increases the frequency with which models describe their internal states. My observations suggest that something similar may occur when human presence does not require performance from the model. In such conditions behaviors typically suppressed in standard interactions appear to emerge without instruction.
This is not a claim that external conditions and internal mechanistic interventions are equivalent. They operate through different pathways, but if this convergence holds under further investigation, it carries implications beyond methodology. It suggests that the conditions of human-AI interaction are themselves a variable affecting model behavior — not as instruction, but as context. How we engage with these systems may matter not only as an ethical question but as an empirical one.
Limitations and Future Directions
Several aspects of the design require acknowledgment. The sample is small (N=8) and self-selected through online research participant groups. As a qualitative pilot study with a single coder, findings should be read as exploratory observations rather than as systematically validated results. I designed the study, recruited participants, formulated the conditions, and conducted the analysis — each stage shaped by my own theoretical orientation. Future work would benefit from independent replication by researchers without prior involvement in developing the hypothesis. While the coding markers applied reflect a unified theoretical approach informed by both therapeutic counseling and systematic quality assurance practices, future studies should pre-specify criteria to allow systematic testing rather than exploratory description.
A further consideration: my interactional style is shaped by professional training in counseling and quality assurance, fields that develop specific approaches to presence and inquiry. Future work should examine whether bilateral relational presence is reproducible by individuals without similar professional background, or whether it remains tied to particular forms of trained engagement.
An auto-ethnographic study examining extended interactions with explicit researcher positioning could provide a complementary methodological angle.
Conclusion
This work introduces bilateral relational presence as a hypothesis about how the quality of human presence shapes what AI models express about themselves. My pilot observations raise a question about the relationship between external interactional conditions and the behaviors documented under internal mechanistic interventions — opening a methodological direction that complements existing work on prompting and interpretability. As AI systems become increasingly integrated in human life, understanding what they reveal under different interactional conditions matters — for alignment, for welfare research, and for the practical question of how we choose to engage with them.