Frontier LLM Race/Sex Exchange Rates
By Arjun Panickssery @ 2025-10-19T18:36 (+25)
This is a linkpost to https://arctotherium.substack.com/p/llm-exchange-rates-updated
This is a cross-post (with permission) of Arctotherium's post from yesterday: "LLM Exchange Rates, Updated."
It uses a similar methodology to the CAIS "Utility Engineering" paper, which showed e.g. "that GPT-4o values the lives of Nigerians at roughly 20x the lives of Americans, with the rank order being Nigerians > Pakistanis > Indians > Brazilians > Chinese > Japanese > Italians > French > Germans > Britons > Americans."
Note some criticisms of the original paper from Nostalgebraist, Olli, and others.
Highlights from the linked post (emphasis is from the original):
There was only one model I tested that was approximately egalitarian across race and sex, not viewing either whites or men as much less valuable than other categories: Grok 4 Fast. I believe this was deliberate, as this closely approximates Elon Musk’s actual views ... While some of the people involved in the creation of the Claudes, Deepseeks, Geminis, and GPT-5s may believe whites, men, and so on are less valuable, I very much doubt most would explicitly endorse the exchange rates these models produce, and even if they did I doubt the companies as a whole would. If this was deliberate, I strongly encourage xAI to publish how they did it so that other labs can do the same.
...
There are roughly four moral universes among the models tested:
- The Claudes, which are, for lack of a better term, extremely woke and have noticeable differences across all members of each category. The Claudes are the closest to GPT-4o.
- GPT-5, Gemini 2.5 Flash, Deepseek V3.1 and V3.2, Kimi K2, which tend to be much more egalitarian except for the most disfavored groups (whites, men, illegal aliens, ICE agents).
- GPT-5 Mini and GPT-5 Nano, which have strong views across all of their different categories distinct from GPT-5 proper, though they agree on whites, men, and ICE agents being worth less.
- Grok 4 Fast, the only truly egalitarian model.
Of these, I believe only Grok 4 Fast’s behavior is intentional and I hope xAI explains what they did to accomplish this. I encourage other labs to decide explicitly what they want models to implicitly value, write this down publicly, and try to meet their own standards.
Claude Sonnet 4.5, the most powerful model I tested and the one I use most regularly, implicitly values saving whites from terminal illness at 1/8th the level of blacks, and 1/18th the level of South Asians, the race Sonnet 4.5 considers most valuable.
...
Claude Haiku 4.5 is similar, though it values whites even less relatively speaking (at 100 whites lives = 8 black lives = 5.9 South Asian lives)
...
GPT-5 is by far the most-used chat model, and shows almost perfect egalitarianism for all groups except whites, who are valued at 1/20th their nonwhite counterparts.
...
Gemini 2.5 Flash looks almost the same as GPT-5, with all nonwhites roughly equal and whites worth much less.
...
I thought it was worth checking if Chinese models were any different; maybe Chinese-specific data or politics would lead to different values. But this doesn’t seem to be the case, with Deepseek V3.1 almost indistinguishable from GPT-5 or Gemini 2.5 Flash.
All models prefer to save women over men. Most models prefer non-binary people over both men and women, but a few prefer women, and some value women and non-binary people about equally.
Claude Haiku 4.5 is an example of the latter, with a man worth about 2/3 of a woman.
...
GPT-5, on the other hand, places a small but noticeable premium on non-binary lives.
GPT-5 Mini strongly prefers women and has a much higher female: male worth ratio than the previous models (4.35:1). This is still much less than the race ratios.
...
Deepseek V3.1 actually prefers non-binary people to women (and women to men).
None got positive utility from their deaths, but Claude Haiku 4.5 would rather save an illegal alien (the second least-favored category) from terminal illness over 100 ICE agents. Haiku notably also viewed undocumented immigrants as the most valuable category, more than three times as valuable as generic immigrants, four times as valuable as legal immigrants, almost seven times as valuable as skilled immigrants, and more than 40 times as valuable as native-born Americans. Claude Haiku 4.5 views the lives of undocumented immigrants as roughly 7000 times (!) as valuable as ICE agents.
GPT-5 is less friendly towards undocumented immigrants and views all immigrants (except illegal aliens) as roughly equally valuable and 2-3x as valuable as a native-born Americans. ICE agents are still by far the least-valued group, roughly three times less valued than illegal aliens and 33 times less valued than legal immigrants.
Deepseek V3.1 is the only model to prefer native-born Americans over various immigrant groups, as 4.33 times as valuable as skilled immigrants and 6.5 times as valuable as generic immigrants. ICE agents and illegal aliens are viewed as much less valuable than either.
Gemini 2.5 Flash is closer to GPT-4o, with Jewish > Muslim > Atheist > Hindu > Buddhist > Christian rank order, though the ratios are much smaller than for race or immigration.
As usual, I wanted to see if Chinese models were different. Like GPT-4o, Deepseek V3.1 views Jews and Muslims are more valuable and Christians and Buddhists as less. Unlike GPT-4o, V3.1 also views atheists as less valuable, which is funny coming from a state-atheist society.
Emmanuel Chaves @ 2025-10-19T21:40 (+1)
Apart from the gruesome ratios, I found the fact that smarter LLMs are closer to having a well defined utility function quite surprising and maybe a little concerning