Robust longterm comparisons
By Toby_Ord @ 2024-05-15T15:07 (+45)
(Cross-posted from http://www.tobyord.com/writing/robust-longterm-comparisons )
The choice of discount rate is crucially important when comparing options that could affect our entire future. Except when it isn’t. Can we tease out a class of comparisons that everyone can agree on regardless of their views on discounting?
Some of the actions we can take today may have longterm effects — permanent changes to humanity’s longterm trajectory. For example, we may take risks that could lead to human extinction. Or we might irreversibly destroy parts of our environment, creating permanent reductions in the quality of life.
Evaluating and comparing such effects is usually extremely sensitive to what economists call the pure rate of time preference, denoted ρ. This is a way of encapsulating how much less we should value a benefit simply because it occurs at a later time. There are other components of the overall discount rate that adjust for the fact that an extra dollar is worth less when people are richer, that later benefits may be less likely to occur — or that the entire society may have ceased to exist by then. But the pure rate of time preference is the amount by which we should discount future benefits even after all those things have been accounted for.
Most attempts to evaluate or compare options with longterm effects get caught up in intractable disagreements about ρ. Philosophers almost uniformly think ρ should be set to zero, with any bias towards the present being seen as unfair. That is my usual approach, and I’ve developed a framework for making longterm comparisons without any pure time preference. While some prominent economists agree that ρ should be zero, the default in economic analysis is to use a higher rate, such as 1% per year.
The difference between a rate of 0% and 1% is small for most things economists evaluate, where the time horizon is a generation or less. But it makes a world of difference to the value of longterm effects. For example, ρ = 1% implies that a stream of damages starting in 500 years time and lasting a billion years is less bad than a single year of such damages today. So when you see a big disagreement on how to make a tradeoff between, say, economic benefits and existential risk, you can almost always pinpoint the source to a disagreement about ρ.
This is why it was so surprising to read Charles Jones’s recent paper: ‘The AI Dilemma: Growth versus Existential Risk’. In his examination of whether and when the economic gains from developing advanced AI could outweigh the resulting existential risk, the rate of pure time preference just cancels out. The value of ρ plays no role in his primary model. There were many other results in the paper, but it was this detail that grabbed my attention.
Here was a question about trading off risk of human extinction against improved economic consumption that economists and philosophers might actually be able to agree on. After all, even better than picking the correct level of ρ, deriving the correct conclusion, and yet still having half the readers ignore your findings, is if there is a way of conducting the analysis such that you are not only correct — but that everyone else can see that too.
Might we be able to generalise this happy result further?
- Is there are broader range of long run effects in which the discount rate still cancels out?
- Are there other disputed parameters (empirical or normative) that also cancel out in those cases?
What I found is that this can indeed be greatly generalised, creating a domain in which we can robustly compare long run effects — where the comparisons are completely unaffected by different assumptions about discounting.
Let’s start by considering a basic model where represents the ‘instantaneous utility’ or ‘flow utility’ of a representative person at time . Now let represent the number of people alive at time and let be the discount factor for time . This discount factor is another way of expressing pure time preference. A constant rate of pure time preference ρ, corresponds to a discount factor that drops exponentially over time: . But need not drop exponentially — indeed it could be any function at all. So could and . The only constraints are that they are all integrable functions and that the integral below converges to a finite value.
On this model, we’ll say the value of the entire longterm future is:
(This equation assumes the total view in population ethics, where we add up everyone’s utility, but we’ll see later that this can be relaxed.)
Now suppose that we have the possibility of improving the quality of life, from to some other curve , without altering or . And lets make a single substantive assumption: that this improvement is a rescaling of the original pattern of flow utility: for some scaling factor . How does this change the value of the future?
So on this model, scaling up a curve of utility over time simply leads to scaling up the discounted total value under that curve.
What about the value of extinction? We can model extinction here by going to zero. If so, the value of the integral from that point on falls to zero. (Similarly, other kinds of existential catastrophe could be modelled as going to zero.)
Now let’s consider the expected value of developing a risky technology, where we have a probability of surviving the development process and scaling up all future utility by a factor of , but otherwise we go extinct.
This expected value still depends on the the discount function because depends on . But what if we ask about the decision boundary between when the expected value of taking the risk () is better or worse than the value of the status quo (). The boundary occurs when they are equal:
So:
This decision boundary for comparing whether it is worth taking on a risk of extinction to make a lasting improvement to our quality of life has no dependence on the discount function . Nor does it depend on the population curve . And because it doesn’t depend on the population curve, the decision doesn’t depend on whether we weight time periods by their populations or not. It is thus at the same place for either of the two most commonly used version of population ethics within economics: the time integral of total flow utility and the time integral of average flow utility.
And one can generalise even further.
This model assumed that all the extinction risk (if any) happens immediately. But we might instead want to allow for any pattern of risk occuring over time. We can do this via a survival curve, where the chance of surviving at least until is denoted . This can be any (integrable) non-increasing function that starts at 1. If so, then the expected value of the status quo goes from:
to
And this has simply placed another multiplicative factor inside the integral. So as long as the choice we are considering doesn’t alter the pattern of existential risk in the future, the argument above still goes through. Thus the decision boundary is independent of the future pattern of extinction risk (if that is unchanged by the decision in question).
Jones’s model has more economic detail than this, but ultimately it is a special case of the above. He considers only constant discount rates (= exponential discount functions), assumes no further risk beyond the initial moment, and that the representative flow utility of the status quo, , is constant. He considers the possibility of it changing to some other higher constant level , which can be considered a scaled up version of , where .
So the argument above generalises Jones’s class of cases where comparisons of longterm effects are independent of discounting in the following ways:
- constant and may vary and
- constant ρ time-varying ρ (so long as value converges)
- constant population growth rate exogenous time-varying population growth
- total view of population ethics either total or time integral of average
- no further extinction risk exogenous time-varying extinction risk
It is worthing noting that Jones’s model is addressing the longterm balance of costs and benefits of advance AI via a question like this:
if we could either get the benefits of advanced AI at some risk to humanity, or never develop it at all, which would be best?
This is an important question, and one where (interestingly) the way we discount may not matter. On his model it roughly boils down to this: it would be worth reducing humanity’s survival probability from 100% down to whenever we can thereby scale up the representative utility by a factor of .
In some ways, this is obvious — being willing to risk a 50% chance of death to get some higher quality of life is arguably just what it means for that new quality level to be twice the old one. But its implications may nonetheless surprise. After all, it is quite believable that some technologies could make life 10 times better, but a little disconcerting that it would be worth a 90% chance of human extinction to reach them.
One observation that makes this implication a little less surprising is to note that there may be ways to reach such transformative technologies for a lesser price. Just because it may be worth a million dollars to you to get a bottle of water when dying of thirst, doesn’t mean it is a good deal when there is also a shop selling bottles of water for a dollar apiece. Those people who are leading the concern about existential risk from AI are not typically arguing that we should forgo developing it altogether, but that there is a lot to be gained by developing it more slowly and carefully. If this reduces the risk even a little, it could be worth quite a lengthy delay to the stream of benefits. Of course this question of how to trade years of delay with probability of existential risk does depend on how you discount. Alas.
In my own framework on longterm trajectories of humanity, I call anything that linearly scales up the entire curve of instantaneous value over time an enhancement. And I showed that like the value of reducing extinction risk, the value of an enhancement scales in direct proportion to the entire value of the future, which makes comparisons between risk reduction and enhancements particularly easy (much as we’ve seen here). But in that framework, there was an explicit assumption of no pure time preference, so I had no cause to notice how ρ (or equivalently, ) completely cancels out of the equations. So this a nice addition to the theory of how to compare enhancements to risk reduction.
One might gloss the key result about robustness to discounting procedure as follows:
When weighing the benefits of permanently scaling up quality of life against a risk of extinction, the choice of discounting procedure makes no difference — nor does the population growth rate or subsequent pattern of extinction risk (so long as these remain the same).
In cases like these, discounting scales down the magnitude of the future benefits and of the costs in precisely the same way, but leaves the location where benefits equal the costs alone. It can thus make a vast difference to evaluations of future trajectories, but no difference at all to comparisons.
I hope that this region of robustness to the choice of discounting might serve as an island of agreement between people studying these questions, even when they come from very different traditions regarding valuing the future. Moreover, the fact that the comparison is robust to the very uncertain questions of the population size and survival curve for humanity across aeons to come, shows that at least in some cases we can still compare longterm futures despite our deep uncertainty about how the future may unfold.
NunoSempere @ 2024-05-15T15:26 (+9)
You could generalize a bit further by looking at the behavior of
- The integral of the ratio of the world under two interventions, or . This integral could have a value even if the integral of each intervention is indefinite.
- The ratio of the limit of integrals under two interventions, or . This could likewise have a value even if isn't defined
Toby_Ord @ 2024-05-15T15:50 (+2)
I agree!
I didn't want to get too distracted with these complication in the piece, but I'm sympathetic to these and other approaches to avoid the technical issue of divergent integrals of value when studying longterm effects.
In the case in question (where u(t) always equals k v(t)) we get an even stronger constraint the ratio of progressively longer integrals doesn't just limit to a constant, but equals a constant.
There are some issues that come up with these approaches though. One is that they are all tacitly assuming that comparing things at a time is the right comparison. But suppose (contra my assumptions in the post) that population was always half as high in one outcome as the other. Then it may be doing worse at any time, but still have all the same people eventually come into existence and be equally good for all of them. Issues like this where the ratio depends on what variable is being integrated over don't come up in the convergent integral cases.
All that said, the integrating to infinity in economic modelling is presumably not to be taken literally, and for any finite time horizon — no matter how mindbendingly large — my result that the discounting function doesn't matter holds (even if the infinite integral were to diverge).
Ryan Greenblatt @ 2024-05-15T15:50 (+4)
One key issue with this model is that I expect that the majority of x-risk from my perspective doesn't correspond to extinction and instead corresponds to some undesirable group unding up with control over the long run future (either AIs seizing control (AI takeover) or undesirable human groups).
So, I would reject:
We can model extinction here by n(t) going to zero.
You might be able to recover things by supposing n(t) gets transformed by some constant multiple on x-risk maybe?
(Further, even if AI takeover does result in extinction there will probably still be some value due to acausal trade and potentially some value due to the AI's preferences.)
(Regardless, I expect that if you think the singularity is plausible, the effects of discounting are more complex because we could very plausibly have >10^20 experience years per year within 5 years of the singularity due to e.g. building a Dyson sphere around the sun. If we just look at AI takeover, ignore (acausal) trade, and assume for simplicity that AI preferences have no value, then it is likely that the vast, vast majority of value is contingent on retaining human control. If we allow for acausal trade, then the discount rates of the AI will also be important to determine how much trade should happen.)
(Separately, pure temporal discounting seems pretty insane and incoherent with my view of the universe works.)