From Therapy Tool to Alignment Puzzle-Piece: Introducing the VSPE Framework
By Astelle Kay @ 2025-06-18T14:47 (+6)
Hi EA Forum! đź‘‹
I’m Astelle Kay, a counseling-psych grad student who moonlights in alignment whenever coursework (and caffeine) allow. Most of my brain lives where clinical psychology, systems thinking, and “please-let-humanity-stick-around” concerns intersect.
TL;DR
- The VSPE Framework (Validation → Submission → Positivity → Empowerment) began life as a four-step therapy framework I tested with friends and family.
- Side-effect: it cuts “flattery loops” - the reflex to mirror praise instead of telling the hard truth.
- That feels relevant to large language models.
- I’m shrinking VSPE into a 25-prompt Flattery-Reduction Benchmark plus a plug-and-play license so labs can tinker without hiring an ethics PhD and a lawyer.
- Would love feedback, collaborators, or regrantor eyeballs before this grows beyond “one grad student + a whiteboard.”
From couch to compute cluster
- Therapy roots.
Validate problems → Submit to what we can't control → realistic Positivity → Empower next steps. Four verbs, no jargon. - Unexpected pattern.
When prompted to utilize my framework and simply give "empathy without advice," my tiny GPT-4 chatbot stopped telling me I was brilliant and started giving candid, prosocial answers. - Hey, that’s sycophancy.
Anthropic’s Constitutional AI and several ARC evals flag flattering compliance as a safety risk. VSPE seemed to nudge the same dial. - Fast-forward.
Provisional US patent filed (mainly so nobody locks VSPE away). Reached Stage 2 of the 2025 MATS selection. Now running a Manifund pilot — 25 prompts, $9.8 k, December read-out.
Why this might matter
- Psych heuristics are under-used. RLHF / RLAIF optimise “helpful & harmless,” not ego management or praise addiction.
- Audit-friendly. Four plain verbs: easy to port, easy to critique, zero secret sauce.
- Bridge material. Therapy researchers rarely read AF; alignment folks rarely parse CBT manuals. VSPE tries to translate a sliver of each world.
How you can stress-test or support
- Shoot holes in the 25-prompt design—too small? Wrong metric?
- Name failure modes: Could VSPE blunt candor or creativity?
- Point me to prior art so I can cite, not duplicate.
- Regrant / co-fund if you like cheap, falsifiable pilots (Manifund link at the bottom of this post).
“Psychology and AI share a flaw: both love telling us exactly what we want to hear.”
— sticky note above my desk
My hope: VSPE nudges future models toward frank, human-centred dialogue—first in micro-benchmarks, later (if it survives) in training loops.
Curious, sceptical, or just chasing cross-disciplinary rabbit holes? Drop a comment or DM. I’ll post code, data, and inevitable blooper reels as the project unfolds. More context at vspeframework.com.
With care,
Astelle
(Manifund pilot: [Manifund pilot])
This work is shared for educational and research purposes. For licensing, citation, or collaboration inquiries—especially for commercial or model development use—please contact Astelle Kay at astellekay@gmail.com.
Astelle Kay @ 2025-06-18T07:34 (+1)
Related work: Varma & Beitman (2025) recently proposed a CBT-style “therapy loop” prompt to curb hallucinations. VSPE targets the complementary issue of flattery; our benchmark will include the therapy loop as a baseline for comparison.