Value capture

By Erich_Grunewald 🔸 @ 2024-10-26T22:46 (+37)

This is a linkpost to https://jesp.org/index.php/jesp/article/view/3048

This is an article by C. Thi Nguyen published recently in the Journal of Ethics and Social Philosophy. The abstract reads:

Value capture occurs when an agent’s values are rich and subtle — or developing in that direction. The agent enters a social environment that presents simplified — typically quantified — versions of those values; and those simplified articulations come to dominate their practical reasoning. Examples include becoming motivated by FitBit’s step counts, Twitter Likes and Retweets, citation rates, ranked lists of best schools, and Grade Point Averages. We are vulnerable to value capture because of the competitive advantage that such crisp and clear expressions of value have in our private reasoning and our public justification. But value capture poses several threats. First, value capture threatens to change the goals of our activities, in a way that often threatens to undermine the value of those activities. Twitter’s scoring system threatens to replace some of the richer goals of communication — understanding, connection, and the mutual pursuit of truth — with the thinner goals of getting likes and going viral. Second, in value capture, we take a core process of our autonomy and self-governance — our ongoing deliberation over the exact articulation of our values — and we outsource it. That outsourcing cuts off one of the key benefits to self-governing deliberation. In value capture, we no longer adjust our values and their articulations in light of own rich particular and context-sensitive experience of the world. Our values should be carefully tailored to our particular selves and particular communities, but in value capture, we buy our values off the rack.

I think it neatly names and describes one mechanism behind goodharting (though curiously without ever referring to Goodhart's Law). Value capture is importantly different from many instances of goodharting, though, in that it involves not merely optimising a proxy measure but actually shifting one's values to the proxy measure. So it would not be a case of me maximising my EA Forum karma because I value writing high-quality posts/comments and find karma a convenient but flawed proxy, but a case of me starting to value my EA Forum karma for its own sake, or at least for reasons separate from the quality of my posts/comments.

This is interesting, because with goodharting, optimising for proxy metrics can clearly lead to worse outcomes on the metrics we actually care about, whereas with value capture, we can genuinely come to value the proxy itself. It's not immediately clear why it should be worse for me to optimise my EA Forum karma instead of my post/comment quality when I value my karma for its own sake. As I understand it, the article argues that value capture is in fact bad -- at least when we do not intentionally choose it -- because (sticking to my example, and in my own words):

There are many different goals different individuals can have when posting/commenting on the EA Forum, but the existence of karma risks collapsing all those into one "prefabricated" goal that everyone follows. As a result of this standardised and readily available target, we may reflect on and adjust our own individual goals less often than we should. For example, we might want to make sure our lower-level goals (e.g., the goals we have in mind when writing posts/comments) are in harmony with our higher-level goals, such as our or others' happiness and flourishing.
These standardised metrics are typically created by external entities according to some opaque process to satisfy interests that aren't our own. When we unconsciously adopt goals designed by others to serve their interests, we compromise our autonomy.

Note that value capture is distinct from value drift, which refers to a slow change in a person or community's values over time, not necessarily towards something quantified or simplified.

Value capture could of course also occur when optimising proxies of goodness, "a property of the world that we’re conceptually confused about, can’t reliably define or measure, and have massive disagreements about even within EA".