Demonstrating specification gaming in reasoning models

By Matrice Jacobine @ 2025-02-20T19:26 (+9)

This is a linkpost to https://arxiv.org/pdf/2502.13295

This is a crosspost, probably from LessWrong. Try viewing it there.

null