Owen Cotton-Barratt's Quick takes

By Owen Cotton-Barratt @ 2024-05-18T21:32 (+8)

null

Owen Cotton-Barratt @ 2025-04-30T15:29 (+26)

I wanted to share some insights from my reflection on my mistakes around attraction/power dynamics — especially something about the shape of the blindspots I had. My hope is that this might help to avert cases of other people causing harm in similar ways.

I don’t know for sure how helpful this will be; and I’m not making a bid for people to read it (I understand if people prefer not to hear more from me on this); but for those who want to look, I’ve put a couple of pages of material here.

Owen Cotton-Barratt @ 2024-10-12T23:49 (+25)

I think Leif Wenar's "Open Letter to Young EAs" has significant flaws, but also has a lot going for it, and I would seriously recommend people who want to think about the ideal shape of EA should read it.

I went through the letter making annotations about the bits I thought were good or bad. If you want to see my annotated version, you can do that here. If you want to be able to comment, let me know and I'll quite likely be happy to grant you permission (but didn't want to set it to "anyone with the link can comment" for fear of it getting overwhelmed).

MichaelDickens @ 2024-10-13T01:50 (+9)

As with ~all criticisms of EA, this open letter doesn't have any concrete description of what would be better than EA. Like just once, I would like to see a criticism say, "You shouldn't donate to GiveWell top charities, instead you should donate to X, and here is my cost-effectiveness analysis."

The only proposal I saw was (paraphrased) "EA should be about getting teenagers excited to be effectively altruistic." Ok, the movement-building arm of EA already does that. What is your proposal for what those teenagers should then actually do?

Owen Cotton-Barratt @ 2024-10-13T02:14 (+2)

I mean it kind of has the proposal that they each need to work that out for themselves. (I think this is mistaken, and not the place I found the letter valuable.)

Owen Cotton-Barratt @ 2025-03-20T22:31 (+18)

It seems like "what can we actually do to make the future better (if we have a future)?" is a question that keeps on coming up for people in the debate week.

I've thought about some things related to this, and thought it might be worth pulling some of those threads together (with apologies for leaving it kind of abstract). Roughly speaking, I think that:

~Optimal futures flow from having a good reflective process steering things
It's sort of a race to have a good process steering before a bad process
- Averting AI takeover and averting human takeover are both ways to avoid the bad process thing (although of course it's possible to have a takeover still lead to a good process)
We're going to need higher powered epistemic+coordination tech to build the good process
- But note that these tools are also very useful for avoiding falling into extinction or other bad trajectories, so this activity doesn't cleanly fall out on either side of the "make the future better" vs "make there be a future" debate

There are some other activities which might help make the future better without doing so much to increase the chance of having a future, e.g.:

Try to propagate "good" values (I first wrote "enlightenment" instead of "good", since I think the truth-seeking element is especially important for ending up somewhere good; but others may differ), to make it more likely that they're well-represented in whatever entities end up steering
Work to anticipate and reduce the risk of worst-case futures (e.g. by cutting off the types of process that might lead there)

However, these activities don't (to me) seem as high leverage for improving the future as the more mixed-purpose activities.

Owen Cotton-Barratt @ 2024-05-18T21:32 (+15)

Most possible goals for AI systems are concerned with process as well as outcomes.

People talking about possible AI goals sometimes seem to assume something like "most goals are basically about outcomes, not how you get there". I'm not entirely sure where this idea comes from, and I think it's wrong. The space of goals which are allowed to be concerned with process is much higher-dimensional than the space of goals which are just about outcomes, so I'd expect that on most reasonable sense of "most" process can have a look-in.

What's the interaction with instrumental convergence? (I'm asking because vibe-wise it seems like instrumental convergence is associated with an assumption that goals won't be concerned with process.)

Process-concerned goals could undermine instrumental convergence (since some process-concerned goals could be fundamentally opposed to some of the things that would otherwise get converged-to), but many process-concerned goals won't
Since instrumental convergence is basically about power-seeking, there's an evolutionary argument that you should expect the systems which end up with most power to have the power-seeking behaviours
- I actually think there are a couple of ways for this argument to fail:
  1. If at some point you get a singleton, there's now no evolutionary pressure on its goals (beyond some minimum required to stay a singleton)
  2. A social environment can punish power-seeking, so that power-seeking behaviour is not the most effective way to arrive at power
    - (There are some complications to this I won't get into here)
- But even if it doesn't fail, it pushes towards things which have Omuhundro's basic AI drives (and so pushes away from process-concerned goals which could preclude those), but it doesn't push all the way to purely outcome-concerned goals

In general I strongly expect humans to try to instil goals that are concerned with process as well as outcomes. Even if that goes wrong, I mostly expect them to end up something which has incorrect preferences about process, not something that doesn't care about process.

How could you get to purely outcome-concerned goals? I basically think this should be expected just if someone makes a deliberate choice to aim for that (though that might be possible via self-modification; the set of goals that would choose to self-modify to be purely outcome-concerned may be significantly bigger than the set of purely outcome-concerned goals). Overall I think purely outcome-concerned goals (or almost purely outcome-concerned goals) are a concern, and worth further consideration, but I really don't think they should be treated as a default.

Owen Cotton-Barratt @ 2024-06-21T11:00 (+11)

Just a prompt to say that if you've been kicking around an idea of possible relevance to the essay competition on the automation of wisdom and philosophy, now might be the moment to consider writing it up -- entries are due in three weeks.