The last era of human mistakes

By Owen Cotton-Barratt @ 2024-07-24T09:56 (+23)

This is a linkpost to https://strangecities.substack.com/p/the-last-era-of-human-mistakes

Suppose we had to take moves in a high-stakes chess game, with thousands of lives at stake. We wouldn't just find a good chess player and ask them to play carefully. We would consult a computer. It would be deeply irresponsible to do otherwise. Computers are better than humans at chess, and more reliable. 

We'd probably still keep some good chess players in the loop, to try to catch possible computer error. (Similarly we still have pilots for planes, even though the autopilot is often safer.) But by consulting the computer we'd remove the opportunity for humans to make a certain type of high stakes mistake.

A lot of the high stakes decisions people make today don't look like chess, or flying a plane. They happen in domains where computers are much worse than humans.

But that's a contingent fact about our technology level. If we had sufficiently good AI systems, they could catch and prevent significant human errors in whichever domains we wanted them to.

In such a world, I think that they would come to be employed for just about all suitable and important decisions. If some actors didn’t take advice from AI systems, I would expect them to lose power over time to actors who did. And if public institutions were making consequential decisions, I expect that it would (eventually) be seen as deeply irresponsible not to consult computers.

In this world, humans could still be responsible for taking decisions (with advice). And humans might keep closer to sole responsibility for some decisions. Perhaps deciding what, ultimately, is valued. And many less consequential decisions, but still potentially large at the scale of an individual’s life (such as who to marry, where to live, or whether to have children), might be deliberately kept under human control[1]

Such a world might still collapse. It might face external challenges which were just too difficult. But it would not fail because of anything we would parse as foolish errors.

In many ways I’m not so interested in that era. It feels out of reach. Not that we won’t get there, but that there’s no prospect for us to help the people of that era to navigate it better.

My attention is drawn, instead, to the period before it. This is a time when AI will (I expect) be advancing rapidly. Important decisions may be made in a hurry. And while automation-of-advice will be on the up, it seems like wildly unprecedented situations will be among the hardest things to automate good advice for. We might think of it as the last era of consequential human mistakes[2].

Can we do anything to help people navigate those? I honestly don’t know. It feels very difficult (given the difficulty at our remove in even identifying the challenges properly). But it doesn’t feel obviously impossible.

What will this era look like?

Perhaps AI progress is blisteringly fast and we move from something like the world of today straight to a world where human mistakes don’t matter. But I doubt it.

On my mainline picture of things, this era — the final one in which human incompetence (and hence human competence) really matters — might look something like this:

That's enough predictions that I'm probably wrong in some of the particulars. But I think the broad brush stroke picture is decently likely.

Central challenges to be borne by humans

What kind of challenges will people actually face at these times?

This is difficult to be particularly confident about. But here are some thoughts:

Trying to help at far remove

Even if we have some sense of their challenges and desire to help — what can we do? A central difficulty is that, however much we can get a sense of their challenges, their own sense of the challenges will be much better. It is inefficient for us to focus too much on specific scenarios[3]. A related issue is that they will have better tools than we do — some work we might want to do could by then be automated.  

I don't know how to think about this systematically, so I may well be missing things. But for now, there are three strategies which seem to me to have some promise — one about helping the future players to act wisely, and two about helping to get the gameboard in a good position.

First, deepening understanding of foundational matters. Having a good grounding in the basics (both theoretical and empirical) seems like it's helpful for understanding all sorts of situations. We have some disadvantage from distance of not knowing which areas of foundations are most relevant, but the space of possible foundations is much much smaller than the space of possible applications, and we can make some educated guesses. In this case that means analysis of the nature of AI, of the senses in which different actors might have values, of the basic dynamics of game theory or bargaining in cases with partial information and partially defined preferences, and so forth. It seems to me like although we have models of all of these things, our models don't always feel like they're capturing all the important things. I wouldn't be surprised if improvements in these foundations were possible, were helpful, and were counterfactual (through the relevant moments).

Second, power seeking on behalf of values one likes. This can include trying to shape the values of various actors, or trying to empower actors with desirable values. Honestly I'm pretty nervous about this one, because (1) it's so common and human for people to delude themselves into thinking that their values are superior, even when they're not, and (2) society has good memetic immune responses against various types of power seeking, so it can be easy for this to backfire. But it definitely is a strategy which can work at this distance, and it has some types of robustness (it doesn’t rely on second-guessing future actors, but is just about setting the gameboard up well). I feel relatively less worried about versions of this which are focused on fundamental values like cooperativeness and a commitment to moral reflection and truth-seeking, and more worried about versions predicated on particular object-level views about which values are correct. 

Third, differential technological development. It seems quite possible that the position people are in will depend in various ways on the state of technologies. Work which facilitates desirable technologically pathways coming sooner relative to less desirable ones seems like a good lever. This can include (as e.g. in the cases of AI alignment and control) work laying the groundwork for future automation of research, including conceptual work helping to inform what things, exactly, are good to automate. Differential technological development, as well as being a strategy in its own right (aiming to positively influence the tech available during the last era of human mistakes), can also be a tactic in service of the two other strategies above — e.g. perhaps differentially advancing research which helps us to think clearly about big novel issues.

What to make of this

Framing in terms of the last era of human mistakes feels to me like it’s capturing some important dynamics (although it may be confused about others). I feel glad to have found the perspective, and to get to interrogate it. It helps to remind me how strange the future will be. And it seems like it provides some seeds which I may later find helpful for my thinking.

At the same time, as of the time of writing I’m not sure how much this perspective will help. It shifts my view of things, but it doesn’t make it very transparent what to do. Still, I felt like there was enough here to be worth sharing. If other people find the perspective useful, or not-useful, I’d be interested to hear about that.

  1. ^

    Or not — there are possible futures where humans are removed from decision loops altogether.

  2. ^

    I've sometimes heard this period, or something close to it, called “crunch time”. I mildly dislike that name because although it points to the importance of the period it sort of obscures the mechanisms via which it's important.

  3. ^

    Although it often seems to be very productive to explore specific scenarios, to help keep general thinking grounded.


Chris Leong @ 2024-07-24T15:50 (+4)

It would be useful to have a term along the lines of outcome lock-in to describe situations where the future is out of human hands.

That said, this is more of a spectrum than a dichotomy. As we outsource more decisions to AI, outcomes become more locked in and, as you note, we may never completely eliminate the human in the loop.

Nonetheless, this seems like a useful concept for thinking about what the future might look like.

Ben Millwood @ 2024-07-24T21:39 (+2)

I think the word "lock-in" can be confusing here. I usually think of "lock-in" as worrying about a future where things stop improving, or a particular value system or set of goals gets permanent supremacy. If this is what we mean, then I don't think "the future is out of human hands" is a sufficient for lock-in, because the future could continue to be dynamic or uncertain or getting better or worse, with AIs facing new and unique challenges and rising to them or failing to rise to them. Whatever story humans have set in motion is "locked in" in the sense that we can no longer influence it, but not in the sense that it'll necessarily have a stable state of affairs persist for those who exist in it. Maybe it's clearer to think of humans being "locked out" here, while AIs continue to have influence.

Owen Cotton-Barratt @ 2024-07-24T19:27 (+2)

I think there's maybe a useful distinction to make between future-out-of-human-hands (what this post was about, where human incompetence no longer matters) and future-out-of-human-control (where humans can no longer in any meaningful sense choose what happens).

SummaryBot @ 2024-07-24T13:27 (+1)

Executive summary: As AI capabilities advance, we are approaching a final era where human mistakes matter greatly before entering an era where AI systems prevent most consequential human errors, raising important questions about how to navigate this transition period.

Key points:

  1. An era is coming where AI will advise on most important decisions, preventing many human errors.
  2. The transition period before this era - the "last era of human mistakes" - will be critical and challenging to navigate.
  3. Key challenges will include setting up the "gameboard" well (players, power distribution, social equilibrium, technology).
  4. Potential strategies to help from our current vantage point: 
    a) Deepening understanding of foundational matters 
    b) Power-seeking on behalf of desirable values (with caution) 
    c) Differential technological development
  5. This framing highlights how strange the future may be, but doesn't provide clear actionable guidance.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.