Is Optimal Reflection Competitive with Extinction Risk Reduction? - Requesting Reviewers

By Jordan Arel @ 2025-06-29T05:13 (+9)

…and you will tell them the story of how once you found the Artifact that gave you mastery of the universe, and you refused to take more than about three minutes figuring out what to use it for, because that would have been annoying.

~Scott Alexander, “In The Balance”

Disclaimer:

The current version of this essay is for review purposes only. While I think there is a reasonable chance this essay has at least some important findings, I do not endorse any of the conclusions at present.

If you would like to give detailed feedback, please message me and I will send you links to the Google docs.

Optimal Reflection In 1 Page

Optimal Reflection (OR) is defined as the highest expected value process for comprehensively evaluating all crucial considerations and strategizing about humanity’s future. Potential OR candidates include “The Long Reflection,” “Coherent Extrapolated Volition,” and “Good Reflective Governance.”

This essay compares the expected value of marginal work on trajectory change via Optimal Reflection and Extinction Security via AI safety. There were several important and surprising findings:

According to a preliminary mathematical model, Optimal Reflection work is plausibly five times higher expected value than AI safety work on the current margin.
Optimal Reflection may be even more valuable for reducing post-alignment existential risk than for trajectory change – however, these effects are considered out of scope and ignored in this essay.
The expected value of Optimal Reflection work seems to vastly exceed all other “trajectory change” work (work designed to increase the value of the future by methods other than preventing extinction.)
Human-led suboptimal value lock-in may become a threat before or around the same time AI takeover risk materializes, meaning Optimal Reflection is potentially just as urgent as AI safety.
There may be a small window of opportunity to lock in existential security and optimal reflection; this positive lock-in is essential for ensuring a safe and positive long-term future.
If we discover that successful Optimal Reflection increases the value of the future by an infinite amount, it would only be slightly more valuable relative to AI safety than if we discover it increases the value of the future by a factor of 10.
The deciding factor for which cause area is more valuable is marginal tractability via the much greater neglectedness of Optimal Reflection, which leads to far more low-hanging-fruit and much higher information value for early interventions.

Acknowledgments

A huge thank you to Toby Tremlett and the EA forum team for putting on the “Existential Choices Debate Week,” which this essay was written for – sorry it’s a bit late! I have been interested in this topic for a long time, think it’s extremely important, and appreciate the nudge to finally write up my thoughts.

This essay could not exist in its current form without the background work of William MacAskill, Nick Bostrom, Nick Beckstead, Toby Ord, Eliezer Yudkowsky, Owen Cotton-Barratt, Hilary Greaves, and surely numerous unnamed heroes. My deepest gratitude to these luminaries for their pioneering groundwork.

Optimal Reflection In 8 pages

The purpose of this essay is to:

Propose Optimal Reflection as a target for effective “trajectory change” work – work designed to increase the value of the future by methods other than preventing extinction^[1]
Compare the expected value of marginal work on Optimal Reflection vs. Extinction Security^[2] by offering heuristics and a mathematical framework to determine if Optimal Reflection work is plausibly equal or higher expected value.
Argue that marginal tractability (how easy it is to have an impact on the current margin) is the key deciding factor for whether Optimal Reflection work is plausibly competitive with Extinction Security work

What is Optimal Reflection? Why is it important now?

Briefly put,

Optimal Reflection (OR) is the optimally feasible, effective, and actionable reflective process for considering all crucial considerations and strategizing about humanity’s future.

“Optimal” refers to the process which maximizes expected value by optimally balancing three subfactors:

Feasibility of the reflective process
Effectiveness of the process at creating a strategy which achieves the best possible future across all crucial considerations
Actionability of the strategy created by the process

“Reflection” here means comprehensively researching and understanding all crucial considerations, their interactions, and implications for strategy.

Examples of Optimal Reflection candidates include “The Long Reflection,” “Coherent Extrapolated Volition,” and “Good Reflective Governance.”

To give a brief flavor of what these processes concretely look like:

The Long Reflection, proposed by William MacAskill and Toby Ord, is a hypothetical era lasting anywhere from centuries to millions of years. It would occur after we have achieved robust Existential Security, but before we lock in any irrevocable civilization‐shaping decisions. During this pause, humanity would deliberately slow or defer further high-stakes technological and social transformations so it can deepen its understanding of moral philosophy, collective values, and long-term goals under conditions of heightened epistemic rigor, broad participation, and global coordination. The aim is to ensure that when we eventually resume rapid progress, our most powerful capabilities are steered by thoroughly considered, widely endorsed principles that maximize humanity’s prospects across the vast future.
Coherent Extrapolated Volition (CEV), proposed by Eliezer Yudkowsky, is a hypothetical method for aligning powerful AI systems with human values by having AI predict and implement the actions humanity would collectively desire “if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.” Rather than directly using current human preferences—which can be inconsistent, short-sighted, or poorly informed—CEV aims to extrapolate what humanity’s idealized consensus would look like. This process seeks coherence by converging toward a unified, stable set of goals that reflect deep, broadly acceptable human interests.
Good Reflective Governance, proposed by Owen Cotton-Barratt, emphasizes governance systems that continuously reflect on their goals, methods, and outcomes to ensure they are aligned with deeply held and well-informed values. This concept involves systematically improving decision-making by critically examining past decisions, anticipating future challenges, and fostering institutions capable of learning, adapting, and evolving over long timescales. Effective reflective governance seeks not merely to execute established plans, but to develop robust structures for ongoing self-assessment, enabling societies to consistently steer towards better long-term outcomes.

These are all processes which seek to provide ample strategic reflection so that humanity can wisely navigate its future. Optimal Reflection may occur before and/or as humanity progresses forward. It may use human intelligence, AI, or a combination of both to perform reflection. It may include both theoretical, first principles research; and empirical, real-world science. It may span many areas such as economics, philosophy, mechanism design, game theory, systems theory, political science, psychology, sociology, history, futurology, machine learning, computer science, physics, and mathematics. Optimal Reflection seeks to answer questions such as:

What should we do with the deep future?
What is most valuable to humanity, and how can we best pursue it?
How can we resolve or deal with moral and values uncertainty?
Is there a deeper “why,” either intrinsic or extrinsic to humans, something that is verifiably right or good?
What is important to know about the universe and how it works to make the best decisions possible?
How can we intelligently evaluate how to move forward so as not to pass by significant value?
What is the most practical and effective way of answering all these questions, creating a strategy, and acting on the results?

The thing that separates Optimal Reflection from “narrow” long-term trajectory change cause areas such as artificial sentience, space governance, and suffering risks is that Optimal Reflection seeks to achieve “universal trajectory change;” it attempts to definitively act correctly on all crucial considerations simultaneously, making successful Optimal Reflection exceedingly robust.

Surprisingly, due to complex cluelessness and the likely somewhat multiplicative nature of the very large number of known and unknown crucial considerations, many of which may impact orders of magnitude worth of value, it seems likely Optimal Reflection greatly dominates all other trajectory change work in expectation.

The main reason that Optimal Reflection seems important now is that it seems quite possible a “lock-in” event will occur in the near future, and Optimal Reflection seems like the best strategy to ensure that if lock-in occurs it is intentional and positive. A lock-in event is an event which causes the diversity or quality of possible futures to permanently narrow. One such lock-in event is human extinction itself; if humanity goes extinct, it seems highly unlikely to come back later. Lock-in can be intentional or unintentional, positive or negative.^[3]

A primary reason lock-in may occur soon is that as humanity becomes increasingly powerful through advanced technology, some actors or humanity as a whole may become powerful enough that they are able create some mechanism which permanently locks in certain values or certain preferred conditions, permanently foreclosing all other possible futures.

Alternatively, lock-in may occur because with the extremely rapid progress of technology, humanity may unknowingly take certain paths which are “path-dependent,” i.e. paths which are naturally extremely unlikely to be reversed.

The long-term value of work on both Extinction Security and Optimal Reflection depends on achieving lock-in. Without lock-in, any temporary successes are likely to be reversed over the deep future, ultimately leading to extinction or suboptimal value lock-in.

Optimal Reflection differs from other types of value lock-in because it doesn't commit to specific values but rather establishes a process for helping humanity coordinate to determine what values it wants to collectively enact, based on comprehensive evidence and the best achievable reflective process.

It seems that we may be in a fleeting window of opportunity for ensuring humanity is able to achieve its highest potential, as currently few people are trying to influence how advanced technology is used or what we do with the deep future, but this will soon change. This may even be the last such window of opportunity to achieve the best future possible, before some other future is permanently locked in.

Advanced^[4]AI seems to be the primary technology which could make lock-in possible in the near future. In just a few years the best AI chatbots have gone from barely being able to string words together to performing at or beyond human PhD’s on a range of questions including science, math, and machine learning questions.

Indeed, the incredibly rapid advancement of AI seems to be not only the leading extinction risk, but also the primary factor enabling Optimal Reflection. While in normal times it would seem unreasonable to attempt to learn everything important there is to know about the universe and use the results to achieve the best possible future, this does not seem quite so harebrained with god-like AI seemingly right around the corner.

Of course, if humanity goes extinct first due to its inability to align^[5] advanced AI, it will not be possible to do any good with advanced AI, meaning Optimal Reflection only matters conditional on the success of AI alignment (or some other AI safety strategy.)

It seems like a majority of experts now agree that most extinction risk comes from AI, with estimates ranging from a few percent to 90% or more, and the median risk estimate seemingly in the low tens of percents. Timelines for when most risk will materialize often fall in the 2-10 year timeframe. This makes AI safety an incredibly pressing priority.

While many have assumed decisions about AI's values can be postponed until after AI control is solved, a human-led value lock-in could plausibly become a threat before AI takeover. This is because a leading powerful group of humans could use its existing strategic advantages combined with moderately powerful AI to lock in its values, whereas AI would have to build up power from scratch.

Because advanced AI seems to be the leading extinction risk, this essay will evaluate whether marginal work on Optimal Reflection is plausibly equal or higher expected value than marginal work on AI safety, as a simplified approximation for Extinction Security.

Heuristics

Optimal Reflection work seems plausibly competitive with extinction risk work if Optimal Reflection:

Is currently 33% or less likely and Extinction Security is 44% or more likely
Will at least 10X the value of the future if successful
Is at least as tractable on the current margin as Extinction Security
Does not significantly increase extinction risk

One way to understand these heuristics is that they are the values at which marginal Optimal Reflection work and Extinction Security work are exactly equally valuable.

Counterintuitively, holding all other heuristics equal, even if successful Optimal Reflection could increase the future's value by an infinite amount (rather than just 10X), this would only make Optimal Reflection work 4/3 times (or 33%) more valuable than Extinction Security work. This surprisingly small difference occurs because, all other things being equal, increasing chances of Extinction Security automatically increases likelihood of Optimal Reflection – we need to survive in order to reflect – and so as Optimal Reflection becomes more valuable, so does Extinction Security.

Heuristic #4 only requires that Optimal Reflection not significantly increase extinction risk. However, it's plausible that Optimal Reflection is actually essential for reducing future existential risks, particularly in a "vulnerable world" scenario. Its comprehensive foresight gives it the unique ability to identify and navigate complex, post-alignment existential threats from (near) future technologies. This may even be the primary benefit of Optimal Reflection, though such benefits are considered out of scope and ignored in this essay.

The decisive factor for comparing Optimal Reflection and Extinction Security work appears be marginal tractability. Because success in both cause areas is exceedingly important for the value of the future, which is more impactful at the current margin hinges on which it is easier to make progress on.

Marginal tractability seems to hinge on a few key factors.

The most important of these factors seems to be the much greater neglectedness of Optimal Reflection – the essay very roughly estimates 1200 people working on AI safety vs. 100 people or less working on Optimal Reflection – leading to many low hanging fruit opportunities and very high information value for early Optimal Reflection work.

There are likely numerous very basic things the field could do to improve its likelihood of success, such as creating at least one organization fully dedicated to research, advocacy, or field-building around Optimal Reflection. There are likely many such interventions which could be attempted to see which ones gain traction, and doing so may reveal highly valuable information about how tractable the field is, which is, in itself, incredibly valuable.

It is also important that, despite its neglectedness, there seems to be a potentially large workforce motivated to work on Optimal Reflection, with approximately the same mean and median barely favoring “increasing the value of the futures where we survive” as more important than “reducing the chance of our extinction” on the current margin (from the March 2025 existential choices debate week on the EA forum.)

The biggest challenge for Optimal Reflection may be achieving buy-in from decision-makers. Indeed, governance work seems to be where Optimal Reflection most lags behind AI safety, as while AI safety has a number of concrete realistic proposals to present policymakers, Optimal Reflection will need a significant amount of work to design such proposals.

Fortunately, AI labs seem to have values closely aligned to the values behind Optimal Reflection, including deeply understanding the universe, creating massive public benefit, preventing existential risk, and even achieving utopian futures. The cooperation of AI labs could be crucial for technical Optimal Reflection process design and advocating to government.

For achieving buy-in, it is important to understand that Optimal Reflection has several sub-processes:

Meta-reflection: Researching or designing seed reflection, deep reflection, or other aspects of Optimal Reflection
Seed reflection: A process or state of humanity that reliably evolves into deep reflection (early-stage Optimal Reflection)
Deep reflection: A process directly researching all crucial considerations and formulating strategy (end-stage Optimal Reflection)
Ongoing reflection: A reflection process which considers crucial considerations as they arise

By starting with meta-reflection work designing a seed reflection process, which then only gradually evolves into deep reflection, Optimal Reflection becomes much easier than it would be if we started by directly trying to implement or even design deep reflection.

Again, crucially for tractability, advanced AI can be used to help design seed reflection and deep reflection, and perform seed reflection and deep reflection, making technical Optimal Reflection work quite tractable – as long as there is the will to do so.

A factor which makes this political will seem achievable is the extreme abundance advanced AI will enable. The vast scale of the deep future makes it possible to guarantee everyone on earth the experience of virtually any utopia they choose until end of time, using only a trivial amount of cosmic resources.

Such an “existential compromise” could include a guarantee that this possibility is left open for people who choose it, but that Optimal Reflection will also be pursued, with a restriction that leaving the solar system, or perhaps galaxy, will be strictly prohibited until Optimal Reflection is complete and people can collectively make an informed decision on what to do with our cosmic endowment.

In the meantime, perhaps the most politically feasible short-term target likely to lead to effective and actionable reflection is a state of humanity which is both extremely good in and of itself, as well as highly likely to evolve into Optimal Reflection – a global form of seed reflection (what I believe William MacAskill calls “Viatopia,” and the subject of my unpublished book, which I called “Paths To Utopia.”)

In this world we could use the abundant resources and intelligence provided by advanced AI to keep people far beyond satisfied, and at the same time prevent catastrophes via guardrails, continue/accelerate various forms of progress, and allow humanity to naturally converge on Optimal Reflection.

There are various robustly good processes we could implement to gently encourage society to converge on a world which is more happy, free, epistemically attuned, and morally wise. At the same time, security processes can block off paths which are most dangerous or likely to cause harm.

This may mean slowing down many applications of advanced AI (unpredictable, harmful, and dangerous applications), while speeding up others (health, education, wise governance, material abundance and public benefit applications.) While there will certainly be some pushback on any restrictions, it seems that most people don’t actually want terrifyingly rapid AI progress across all areas, and that upon reflection people will support prioritizing wisdom and robustly good applications while slowing down potentially catastrophic applications so that we don’t use our exceptional new intelligence to do irreversible bad or dangerous things.

This need not require the world to become a homogenous global entity. Rather, it could be a “metatopia” in which different societies experiment with numerous different mechanisms, allowing people to move freely among them, and cross-pollinating whatever mechanisms are most effective at increasing epistemic and moral progress.

This essay will begin by more precisely defining Optimal Reflection, defining the subcomponents of Optimal Reflection, and analyzing the heuristics. It will then follow with sections which examine each of these heuristics in detail, and lay out a more detailed, though preliminary, mathematical model for comparing the expected value of marginal work on Optimal Reflection and Extinction Security via AI safety. Finally, it will briefly examine considerations and possible paths for Optimal Reflection work.

Again, this preliminary summary version of the essay is for review purposes only.

If you would like to give detailed feedback, please message me and I will send you links to the Google docs.

^{^}
In Nick Beckstead’s initial usage, trajectory change includes extinction prevention, but I have followed William MacAskill’s usage in “What We Owe The Future,” in which trajectory change refers to all trajectory changes other than extinction reduction
^{^}
“Extinction Security” is a state in which humanity has achieved guaranteed permanent safety from avoidable extinction. I use "Extinction Security" instead of "Existential Security" because Existential Security, by its definition of avoiding ALL reduction of future value, should logically include upside risks—ensuring positive outcomes happen, not just preventing negative ones. Failing to achieve potential upside (e.g. through moral stagnation, coordination failures, or flawed governance and decision processes) itself constitutes a catastrophic loss of future value. Since I'm focusing exclusively on extinction prevention via AI safety rather than this broader concept, the terms "extinction risk" and "Extinction Security" are more accurate.
^{^}
- Intentional lock-in
  - Power or value lock-in by some decisively advantaged party
  - Some group rushing to be the first to settle space at near light-speed^[6]
- Unintentional lock-in
  - Some new technology being discovered and spreading throughout society with no way to put knowledge of it back in Pandora’s box
  - Lock-in due to path-dependent value, technological, or political trajectories that are nearly impossible to reverse.
- Positive lock-in
  - Discovery and dissemination of important beneficial moral or scientific truths
  - Lock-in of objectively correct values or values universally agreed on after exhaustive reflection
- Negative lock-in
  - Extinction
  - Humanity becoming entrenched in an economic system which causes sentient AI to suffer on an astronomical scale, yet no way of coordinating to stop the suffering within the system.
^{^}
Which in this essay could inclusively refer to certain advanced narrow applications, AGI, or ASI
^{^}
Aligned in this essay means “intent aligned,” AI that is intending to do what we ask it to do; could also substitute under control, meaning we are able to get it to reliably do what we want without it taking over
^{^}
Especially if space is defense-biased (link)

SummaryBot @ 2025-06-30T13:45 (+1)

Executive summary: This exploratory essay proposes that marginal work on “Optimal Reflection”—humanity’s deliberate effort to figure out what it should do with the deep future—may be more valuable than AI safety work aimed at preventing extinction, due to its greater neglectedness, potentially high tractability, and essential role in avoiding suboptimal value lock-in; however, the author does not currently endorse any conclusions and seeks feedback on this early-stage model.

Key points:

Definition and significance of Optimal Reflection (OR): OR refers to the most effective, feasible, and actionable process for evaluating all crucial considerations about humanity’s future—examples include the Long Reflection, Coherent Extrapolated Volition, and Good Reflective Governance.
Preliminary model suggests OR may be ~5x higher EV than AI safety work: Despite the overwhelming focus on extinction prevention, early modeling indicates OR might yield more expected value on the current margin due to its neglectedness and high potential upside.
OR is especially crucial due to the risk of “value lock-in”: As humanity approaches the capability to irreversibly shape the future (e.g., via advanced AI), a small window remains to positively “lock in” the conditions for future flourishing; OR helps ensure this lock-in is intentional, inclusive, and wisely chosen.
Marginal tractability is the decisive crux: While both cause areas are vital, OR may currently offer more low-hanging fruit due to a smaller existing field (rough estimate: <100 people working on OR vs. ~1,200 in AI safety) and a large, motivated potential workforce.
Risks and dependencies acknowledged: The essay stresses that OR only matters conditional on surviving AI-related extinction risks, and political buy-in—especially from AI labs and policymakers—may be harder to secure for OR than for AI safety.
Author is seeking feedback and emphasizes tentativeness: The post is part of Existential Choices Debate Week, and the author is explicit about the speculative and preliminary nature of the claims, encouraging critical engagement from readers.

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.