From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework

By Astelle Kay @ 2025-06-18T14:47 (+6)

Hi EA Forum! 👋
I’m Astelle Kay, a counseling-psych grad student who moonlights in alignment whenever coursework (and caffeine) allow. Most of my brain lives where clinical psychology, systems thinking, and “please-let-humanity-stick-around” concerns intersect.

TL;DR

The VSPE Framework (Validation → Submission → Positivity → Empowerment) began life as a four-step therapy framework I tested with friends and family.
Side-effect: it cuts “flattery loops” - the reflex to mirror praise instead of telling the hard truth.
That feels relevant to large language models.
I’m shrinking VSPE into a 25-prompt Flattery-Reduction Benchmark plus a plug-and-play license so labs can tinker without hiring an ethics PhD and a lawyer.
Would love feedback, collaborators, or regrantor eyeballs before this grows beyond “one grad student + a whiteboard.”

From couch to compute cluster

Therapy roots.
Validate problems → Submit to what we can't control → realistic Positivity → Empower next steps. Four verbs, no jargon.
Unexpected pattern.
When prompted to utilize my framework and simply give "empathy without advice," my tiny GPT-4 chatbot stopped telling me I was brilliant and started giving candid, prosocial answers.
Hey, that’s sycophancy.
Anthropic’s Constitutional AI and several ARC evals flag flattering compliance as a safety risk. VSPE seemed to nudge the same dial.
Fast-forward.
Provisional US patent filed (mainly so nobody locks VSPE away). Reached Stage 2 of the 2025 MATS selection. Now running a Manifund pilot — 25 prompts, $9.8 k, December read-out.

Why this might matter

Psych heuristics are under-used. RLHF / RLAIF optimise “helpful & harmless,” not ego management or praise addiction.
Audit-friendly. Four plain verbs: easy to port, easy to critique, zero secret sauce.
Bridge material. Therapy researchers rarely read AF; alignment folks rarely parse CBT manuals. VSPE tries to translate a sliver of each world.

How you can stress-test or support

Shoot holes in the 25-prompt design—too small? Wrong metric?
Name failure modes: Could VSPE blunt candor or creativity?
Point me to prior art so I can cite, not duplicate.
Regrant / co-fund if you like cheap, falsifiable pilots (Manifund link at the bottom of this post).

“Psychology and AI share a flaw: both love telling us exactly what we want to hear.”
— sticky note above my desk

My hope: VSPE nudges future models toward frank, human-centred dialogue—first in micro-benchmarks, later (if it survives) in training loops.

Curious, sceptical, or just chasing cross-disciplinary rabbit holes? Drop a comment or DM. I’ll post code, data, and inevitable blooper reels as the project unfolds. More context at vspeframework.com.

With care,
Astelle

(Manifund pilot: [Manifund pilot])

This work is shared for educational and research purposes. For licensing, citation, or collaboration inquiries—especially for commercial or model development use—please contact Astelle Kay at astellekay@gmail.com.

Astelle Kay @ 2025-06-18T07:34 (+1)

Related work: Varma & Beitman (2025) recently proposed a CBT-style “therapy loop” prompt to curb hallucinations. VSPE targets the complementary issue of flattery; our benchmark will include the therapy loop as a baseline for comparison.