Winners of the Essay competition on the Automation of Wisdom and Philosophy

By Owen Cotton-Barratt, AI Impacts @ 2024-10-29T00:02 (+31)

This is a linkpost to https://blog.aiimpacts.org/p/winners-of-the-essay-competition

We’re delighted to announce the winners of the Essay competition on the Automation of Wisdom and Philosophy

Overview

The competition attracted 90 entries in total (only one of which was obviously just the work of an LLM!), taking a wide variety of angles on the topic. The judges awarded the top four prizes as follows:

Additionally, the judges awarded ten runner-up prizes, of $500 each. These adjustments to the prize schedule were made to better reflect the judges’ assessments — see footnote for details[1]

Many of those essays which did not get prizes still had something valuable to recommend them, and we are very grateful for the deep thought participants put into engaging with these potentially-crucial topics.

In the rest of this post, we’ll link to all fourteen of the prize-winning entries, and provide judges’ commentary on these. We have made a point to include critical commentary as well as praise. In our view this is an essential part of the competition — helping to draw collective attention to the ideas people have presented, and (we hope) helping to advance the discourse on what makes for valuable work. 

Judge introductions

Andreas Stuhlmüller (AS) — Hi, I'm CEO & cofounder of Elicit, an AI company working on scaling up high-quality reasoning, starting with science. I've been interested in how AI can differentially advance wisdom for a long time, and (pre LLMs) founded the non-profit Ought to work on that topic.

Linh Chi Nguyen (CN) — Hi, I have been thinking about long-term risk from AI besides alignment for a few years. I landed on thinking a bunch about AIs’ decision theories, which is what I’m currently doing. In that process, I’ve thought a bit more generally about the philosophical capabilities and inclinations of AI systems and I’m excited about more work in this area.

Bradford Saad (BS) — Hi, I’m a philosopher and senior research fellow at Oxford’s Global Priorities Institute. Most of my past work is in philosophy of mind. But for the past few years I have been working on AI and catastrophic risks, including the risk that future AI agents will cause catastrophes as a result of making philosophical errors. (I’ll add that my comments are not as developed and well-put as I’d like and that with apologies I’ve only managed to write comments for some winning entries in time for the announcements.)

David Manley (DM) — Hi, I’m an Associate Professor of Philosophy at the University of Michigan, Ann Arbor. I’ve worked in philosophical semantics, ontology, epistemology, and global priorities research. 

Due to timing issues, Wei Dai withdrew from the formal judging.

Top Prize Winners

Note that after the winners had been chosen, we provided them with feedback for the judges, and gave them an opportunity to revise their submissions before this announcement. The judges’ comments are generally based on the original submitted versions.

Wisdom, amortised optimisation, and AI — Rudolf Laine — $7,000

This was submitted as one essay, and split into three parts during the revision process, to improve readability:

Summary

A lot of wisdom is about mental qualities shaped more by amortised optimisation than direct optimisation. Direct optimisation is solving problems by searching through alternatives on the fly, while amortised optimisation is solving a problem by applying past data and computation. LLMs are products of amortised optimisation, but can be used for direct optimisation.

Amortised optimisation matters more when growth and change are slow, so the relative importance of it will likely decline with the growth and changed caused by AI. However, a model suggests it's hard for amortised optimisation to entirely decline in relevance.

Human amortised optimisation is enabled by things like cultural evolution and written knowledge. Advanced AI may weaken human cultural evolution, but may themselves be very good at it due to their low copying costs. AIs might help humans distil insights from written knowledge, but ultimately replace the need for it.

A benefit of wisdom is avoiding large-scale, or "strategic", errors. Good strategic planning often relies on amortised optimisation. If we need to get strategy right about novel things like AI, we could use AIs to help with simulation. A specific type of strategy error that seems hard to solve with AI is wrong paradigms.

BS’s comments

This entry was one of my nominations for a top prize. It persuaded me that the contrast between direct vs. amortised optimization does a lot of work in explaining the contrast between intelligence and wisdom and that the former distinction merits more consideration from a macrostrategic perspective of considering how to make the evolution of AI go well. The essay also contains many points that I found plausible and worthy of further reflection. I wrote some notes in an attempt to distill ideas that stood out to me in the submitted version, but I would recommend that people read the essay rather than the notes.

CN’s comments

Tl;dr: Interesting ideas analysed in a way that makes it easy to trust the author, but somewhat lacking in a coherent, convincing, and important narrative.

I think of this text as having three parts that don’t quite follow the way the author divided the text. The first one offers a characterisation of wisdom. The second theoretically discusses the future of wisdom, including a growth model and considering cultural evolution. The third one discusses concrete and practical ways in which LLMs might contribute to wisdom.

All parts were interesting although I thought they weren’t very well integrated such that, after reading, it was difficult to remember the contents of the text. I think with a little work, the text might read more like a coherent piece or, failing that, might benefit from being split. I thought the first two sections were clearly stronger than the last section. [Editor’s note: it has now been split along these lines.]

I would like to highlight that I perceived the author to take an appropriate epistemic stance in this text. At virtually all points in the text when I thought “what about this objection?” or “this doesn’t seem quite true because of X”, the text addressed my concern in one of the following paragraphs. Sometimes, the text didn’t really provide an answer and instead just noted the concern. While that’s obviously worse than providing an answer, I really appreciated that the author acknowledged potential weaknesses of his conclusions.

That said, the entry didn’t offer a big original idea with a “big if true and plausibly true” feel (which, to be clear, is a very high bar, and I do think that the text contributes many interesting thoughts). I also have some reservations about this type of highly conceptual, foundational work on philosophy and wisdom since the path to impact is relatively long.

Overall, I think the text merits a top prize for what I consider its virtues in scholarship and the ideas that it contains.

DM’s comments

This is a very thought-provoking set of posts, about which I have several concerns. (The TL;DR is that if you read these, you should also read “Tentatively against making AIs ‘wise’” as a counterbalance.) What follows is a summary of my notes on the original submission in which these posts formed a single paper, and in which Laine was making a bolder claim about the connection between amortized optimization (AO) and wisdom. 

We are told that a lot of wisdom is amortized optimization. We might wonder: how often do instances of wisdom involve AO, and when they do, how much of the wisdom is attributable to AO? Surely, being produced by AO surely isn’t sufficient for a process or heuristic to be wise: Laine admits as much.

I would go further and argue that what distinguishes wise judgments from unwise ones, irrespective of how much AO is involved, has more to do with aspects of reasoning that fall on the other side of Laine’s conceptual divide. At least, so I would argue if we’re using “wise” in any normative sense, any sense in which it seems obvious that we want to make people and AIs wiser than they are. Indeed, I’m more confident that legibly good epistemics are crucial for this normative sense of wisdom than I am that there is any deep connection between wisdom and amortized optimization, per se. 

Take Laine’s example of cultural evolution. Crucially, enacting whatever practices have been handed down to us by our forebears is only sometimes wise; the fact that it involves AO doesn’t distinguish between cases where it’s done wisely and those in which it isn’t. In the best kind of case, the human consciously knows that they are benefiting from many generations of (say) successful arrow-making, and that they are unlikely to do better by designing a new method from the ground up. In the worst kind of case, the human blindly and superstitiously mimics cultural practices they’ve inherited, leading to harmful cultural inertia. 

In short, the wisest use of cultural evolution will be clear-eyed about why it makes sense to do so-- that is, it will verify the use of AO using legible and deliberate cognition. And often, consciously rejecting a more AO-style process in favor of a more transparent one will be the wisest path.

(Note also that Laine’s examples involve cases where a practice/artifact has been optimized for something beneficial, like survival. But often selection processes aren’t even optimized for anything beneficial to begin with. For example, many memes succeed not through benefiting the host, but because they happen to be good at spreading. I doubt we want to count the use of an AO artifact as wise as long as it just so happens that it’s beneficial. Adopting the output of an AO process without any reflection about the process that produced it or its current applicability, is often very unwise, even if one gets lucky.)

These points matter not just for the verbal dispute about what constitutes “wisdom”, but for the practical question about what kinds of cognitive features we should imbue our AIs with. For example, Laine characterizes wisdom in part by illegibility; but do we want to create systems whose apparently deep insights people trust despite the lack of any transparent lines of reasoning behind them? As Delaney’s submission notes, insofar as we characterize “wisdom” by mystery/ enigma/ illegibility, we may want to actively avoid building systems that are wise in this sense.

Towards the operationalization of philosophy & wisdom — Thane Ruthenis — $6,000

Summary

I provide candidate definitions for philosophy and wisdom, relate them to intuitive examples of philosophical and wise reasoning, and offer a tentative formalization of both disciplines. The motivation for this is my belief that their proper operationalization is the bottleneck both to scaling up the work done in these domains (i. e., creating an ecosystem), and to automatizing them.

I operationalize philosophy as “the process of deriving novel ontologies”, further concretized as “deriving some assumptions using which reality-as-a-whole could be decomposed into domains that could be studied separately”. I point out the similarity of this definition to John Wentworth’s operationalization of natural abstractions, from which I build the tentative formal model. In addition, I link philosophical reasoning to conceptual/qualitative/non-paradigmatic research, arguing that they’re implemented using the same cognitive algorithms. Counterweighting that, I define philosophy-as-a-discipline as a special case of this reasoning, focused on decomposing the “dataset” represented by the sum of all of our experiences of the world.

I operationalize wisdom as meta-level cognitive heuristics that iterate on object-level heuristics for planning/inference, predicting the real-world consequences of a policy which employs said object-level heuristics. I provide a framework of agency in which that is well-specified as “inversions of inversions of environmental causality”.

I close things off with a discussion of whether AIs would be wise/philosophical (arguing yes), and what options my frameworks offer regarding scaling up or automatizing these kinds of reasoning.

DM’s comments

There is a great deal that I liked about this paper, though I think it would be better construed as offering “operationalizations” not of philosophy and wisdom but of one characteristic thing philosophers (and often other theorists) do, and one characteristic thing wise people do. The two conceptual tasks that are operationalized here--viz., coming up with “ontologies” in the Ruthenis’s sense, and applying good meta-heuristics--are important enough to consider in their own right, even if they overlap only partially with the notions of philosophy and wisdom respectively. 

Among the “other theorists” I have in mind are theoretical physicists, who are actively engaged with the question of which ontology best characterizes the domain of physics (e.g. whether strings should be a fundamental part of it); and while there is also a thriving field of the philosophy of physics, that area seems best described as pursuing meta-questions about the relationship between these competing ontologies and the reality behind them, especially in the case where we may have reached an in-principle limit to our ability to empirically distinguish them. In other words, the borderline between physics and philosophy of physics in fact has to do with the presence or absence of in-principle empirical evidence, not with whether there are competing ontologies at issue. Likewise, I would say, for other domains where philosophy interacts with the sciences: for example, the question of how best to carve up cognitive traits most usefully to predict and explain behavior or pathology is well within the ambit of psychology, whereas the philosophy of mind pursues questions even further from the empirical frontier.

In characterizing the probabilistic independence criterion for ontologies, I wish Ruthenis had explained how this relates to more traditional articulations of theoretical desiderata in the philosophy of science, especially explanatory and predictive power. I was also unsure about how the model is supposed to apply in a case where a given higher level (functional) state (such as being a tree, to use the Ruthenis’s example) has multiple possible lower-level realizations that are all mutually inconsistent (specifying, among other things, exactly how many leaves are on a tree), since the latter will yield probabilities of zero for P(Li | L \ Li ∧ H). 

I agree that the notion of meta-heuristics is an important aspect of wisdom and that it’s importantly distinct from “outside-view” reasoning. The discussion of wisdom overlaps somewhat with Laine’s submission, making the point that wisdom is not explicit knowledge but is stored as “learned instincts, patterns of behavior… cultural norms” that are crystallized through both biological and cultural evolution. I would argue that Ruthenis’s analysis cuts more finely here, helping to explain why not everything crystalized in this way fits into the “wisdom” category (why is a thing like GPT1, that is made of amortized optimization, not actually wise?); and also, conversely, why not everything that is wise has been crystalized in this way. (“Wisdom can nevertheless be inferred “manually”... purely from the domain’s object-level model, given enough effort and computational resources.”)

CN’s comments

Tl;dr: Thought-provoking, many interesting ideas, but lacking in analysis and scholarship. [Editor’s note: the final version was edited to address some of these issues, but I don’t know how much the judges would still be concerned about them.]

I thought the text was very ambitious in a good way and made a lot of points that made me think. For example, the definition for wisdom made me wonder whether it seems right and go through examples and potential counterexamples. So, from that perspective, I really liked the text and appreciate its ideas.

That said, the quality of writing and analysis could have been improved a lot. Many of the thoughts, while very interesting, felt merely stated instead of argued for. In particular, many claims were confidently stated as conclusions when I didn’t think they followed from previous arguments. (I also couldn’t help but notice that virtually all examples were drawn from the LessWrong memesphere, which isn’t a problem per se, but adds to the sense that the author isn’t necessarily aware of the gaps in argumentation.) I also would have liked the text to engage more with reasons to doubt its own conclusions (or at least not state them as confidently in places where I thought it was unwarranted).

I also would have liked it if the text had tried to situate itself more in existing literature as I’m sure that many of the text’s ideas have already been discussed elsewhere. The text also is hard to follow at times, such that I could have benefitted from more and clearer explanations. I also have some reservations about this type of highly conceptual, foundational work on philosophy and wisdom since the path to impact is relatively long.

As we are highlighting this text by giving it a top prize, I would like to share that I wouldn’t want people to read it without applying a strong critical lens. I think some texts are valuable because they give you a lot of food for thought. Other texts are valuable because they are rigorous, do a lot of the work of critical thinking for you, and make you trust the author’s conclusion. This text definitely falls into the former and not the latter category.

With all that said, I overall think the text makes a lot of intellectual contributions, some of which have a “potentially big if (something in this vicinity is) true and plausibly (something in this vicinity is) true” feel, and merits a top prize.

BS’s comments

I found this to be one of the more thought provoking entries. I liked that it made ambitious positive proposals.

To echo some of Chi’s comments: because this entry is receiving a top prize, I also want to encourage readers to approach it as food for thought and with a strong critical lens. To that end, I’ll mention a few somewhat critical reflections:

Essays on training wise AI systems — Chris Leong — $4,000

These were submitted as part of a single essay, which centrally argued for a particular approach to training wise AI systems. In response to judge feedback it was split into two relatively separate parts, and the emphasis on the imitation learning approach was lessened.

Summary

The first essay compares the advantages and disadvantages of four “obvious” approaches to producing wise AI systems:

The second essay introduces and examines the idea of a “wisdom explosion”, in analogy to an intelligence explosion. It argues that it is plausible that a wisdom explosion could be achieved, and that it looks potentially favourable to pursue from a differential technology development perspective.

CN’s comments

(Tl;dr: Gives an overview of the “obvious” approaches to training for wisdom and considerations around them.

The text aims to convince the reader of the imitation learning approach to training AIs to be wise. To this aim, it contrasts it with other possible “obvious” approaches. I found the resulting overview of “obvious” approaches and the considerations around them to be the more valuable part of this text compared to the arguments in favour of the imitation learning approach. I can imagine this being a handy resource to look at when thinking about how to train wisdom, both as a starting point, a refresher, and to double-check that one hasn’t forgotten anything important.

That said, I thought the analysis of the different proposals was reasonable but didn’t stand out to me, which also contributes to me not being convinced that the imitation learning approach is in fact the best approach. This would possibly have been better if the entry had been written out: I personally find that bullet points often hide gaps in argumentation that I notice when I try to write them out. (This might also be completely idiosyncratic to me.)

Overall, I think this text deserves a top prize for doing solid, potentially useful work.

BS’s comments

This entry was one of my nominations for a top prize.  It looks at a number of different proposals for making AI wise. I thought it did an exceptionally good job of considering a range of relevant proposals, thinking about obstacles to implementation, and making a sustained effort to weigh up pros and cons of rival approaches.  Although I think some of the proposals this entry discusses merit further consideration, I’d especially like to see more work in this area that follows this one in considering a range of rival proposals, obstacles to implementation, and pros and cons.

Should we just be building more datasets? — Gabriel Recchia — $3,000

Summary

Building out datasets and evaluations focused on 'wisdom-relevant skills' could significantly contribute to our ability to make progress on scalable oversight and avoid large-scale errors more broadly. Specifically, I suggest that curating datasets for use as fine-tuning corpora or prompts to elicit latent model capabilities in areas such as error detection, task decomposition, cumulative error recognition, and identification of misleading statements could be a particularly worthwhile project to attempt. These datasets should cover a range of domains to test whether these skills can be elicited in a general way. This kind of work is relatively accessible, not requiring insider access to unpublished frontier models, specialized ML experience, or extensive resources.

CN’s comments

Tl;dr: Makes the case for a direction I think is promising, but not a lot of novelty or “meat”.

This is a short, pragmatic and relatively simple submission pointing out that we might just want to do the obvious thing and generate training data on things we consider important for wisdom. I have a lot of sympathy for concise, “not trying to be fancy” proposals that, if implemented, could be very impactful. I also think the text did a good job explaining for which domains additional training data is and isn’t helpful.

That said, I thought the intellectual contribution of this text could have been bigger. First, the text leaves out the analysis of whether wisdom is a domain for which additional training data is helpful. Second, how successful a potential implementation would come down a lot to the details, i.e. what exactly to put in the training data. While the text offers some preliminary thoughts on this, I think they are not enough to meaningfully guide a project trying to implement the proposal, so any implementers would have to basically think about this from scratch. [Editor’s note: the final version of the essay does include more ideas along these lines, and also links to a practical project building datasets.] This seems particularly relevant given that the basic idea of the proposal isn’t very original.

(To be clear, I think it’s much better to be not original and impactful than original and unlikely to ever be implemented or help. That’s part of the charme of this submission! But when evaluating this in terms of intellectual contribution, I would have liked it to either be novel or more fleshed out.)

Overall, I think this text deserves a top prize and I liked it for what I think it is: A brief high-level proposal for a promising if not very original agenda.

BS’s comments

This entry identifies some LLM wisdom-relevant skills that could plausibly be improved with suitable datasets using existing methods. I see this entry as belonging to the category of saying sensible, relatively straightforward stuff that’s worth saying rather than to the category of trying to offer theoretical insights. On the current margin, I think work in this area that aims to contribute via the former category is more to be encouraged than work that aims to contribute via the latter. So I liked that this entry nicely exemplified a category in which I’d like to see more work. I also liked that it identified a task (building datasets) that could plausibly lead to quality improvements in automated philosophy and which is a task that philosophers are already in a position to help with.

Runner-up prizes

Designing Artificial Wisdom: The Wise Workflow Research Organization — Jordan Arel

This was judges’ favourite of a series of four relatively standalone essays the author wrote on artificial wisdom. The other three are:

Summary

Even simple workflows can greatly enhance the performance of LLM’s, so artificially wise workflows seem like a promising candidate for greatly increasing Artificial Wisdom (AW).

This piece outlines the idea of introducing workflows into a research organization which works on various topics related to AI Safety, existential risk & existential security, longtermism, and artificial wisdom. Such an organization could make progressing the field of artificial wisdom one of their primary goals, and as workflows become more powerful they could automate an increasing fraction of work within the organization.

Essentially, the research organization, whose goal is to increase human wisdom around existential risk, acts as scaffolding on which to bootstrap artificial wisdom.

Such a system would be unusually interpretable since all reasoning is done in natural language except that of the base model. When the organization develops improved ideas about existential security factors and projects to achieve these factors, they could themselves incubate these projects, or pass them on to incubators to make sure the wisdom does not go to waste.

CN’s comments

I like the series for what it shows about the author thinking from ground up and at a high-level about all the relevant aspects of the automation of philosophy and wisdom. I thought the scheme presented here was reasonable although fairly vague and, at its level of abstraction, not very novel.

Philosophy's Digital Future — Richard Yetter Chappell

Summary

I suggest that a future "PhilAI" may (at a minimum) be well-suited to mapping out the philosophical literature, situating (e.g.) every paper in the PhilPapers database according to its philosophical contributions and citation networks. The resulting "PhilMap" would show us, at a glance, where the main “fault lines” lie in a debate, and which objections remain unanswered. This opens up new ways to allocate professional esteem: incentivizing philosophers to plug a genuine gap in the literature (or to generate entirely new branches), and not just whatever they can sneak past referees. I further argue that this could combine well with a "Publish then Filter" model (replacing pre-publication gatekeeping with post-publication review) of academic production.

BS’s comments

I liked that this entry made concrete suggestions for using AI to improve information flow across the field of philosophy and that these suggestions were grounded in knowledge of how the field currently operates. 

Regardless of whether one is sold on the particular proposals in this entry, I think its proposals encourage optimism about the availability of options for accelerating philosophical progress, options that can be uncovered without theoretical insights just by reflecting on how to apply existing tools across the field. My sense is that there’s a lot of room for improvement within academic philosophy when it comes to coordination and experimenting with concrete proposals for, say, improving the publication system. So, I’d also be particularly excited for future work on using automation to improve the ecosystem of academic philosophy to happen alongside work geared at promoting the requisite sorts of coordination/experimentation for implementing or testing promising proposals.

CN’s comments

I thought this presented cool product ideas that would improve Philosophy and that I’d be excited for someone to pursue. That said, I think it wasn’t the most relevant to making sure AGI goes well.

Synthetic Socrates and the Philosophers of the Future — Jimmy Alfonso Licon

Summary

Many philosophers prize finding deep, important philosophical truths such as the nature of right and wrong, the ability to make free choices, and so on. Perhaps, then, it would be better for such philosophers to outsource the search for such truths to entities that are better equipped to the task: artificial philosophers. This suggestion may appear absurd, initially, until we realize that throughout human history outsourcing tasks has been the norm for thousands of years. To the extent such philosophers care about discovering deep philosophical truths, they have a reason to aid in the creation of artificial philosophers who will eventually, in many respects, do philosophy better than even the best human philosopher who ever lived or who will live.

CN’s comments

I thought this was a well-written piece of communication that I’m happy to have out there and shared with people. That said, I don’t think it will offer any new considerations to the typical person who knows about our competition.

BS’s comments

I take the central idea of this piece to be that philosophers who care about the discovery of deep philosophical truths have reason to aid in the creation of artificial philosophers rather than simply trying to discover such truths. I’d be delighted for more academic philosophers to engage with this idea. 

The Web of Belief — Paal Fredrik S. Kvarberg

Summary

In this essay, I present a method for using technological innovations to improve rational belief formation and wise decision-making in an explainable manner. I assume a view of rationality in which beliefs are evaluated according to norms of intelligibility, accuracy and consistency. These norms can be quantified in terms of logical relations between beliefs. I argue that Bayesian networks are ideal tools for representing beliefs and their logical interconnections, facilitating belief evaluation and revision. Bayesian networks can be daunting for beginners, but new methods and technologies have the potential to make their application feasible for non-experts. AI technologies, in particular, have the potential to support or automate several steps in the construction and updating of Bayesian networks for reasoning in an explainable way. One of these steps consists of relating empirical evidence to theoretical and decision-relevant propositions. The result of using these methods and technologies would be an AI-powered inference engine we can query to see the rational support, empirical or otherwise, of key premises to arguments that bear on important practical decisions. With these technological innovations, decision support systems based on Bayesian networks may represent belief structures and improve our understanding, judgement, and decision-making.

CN’s comments

I thought this presented a potentially cool tool that I might be excited to see someone work on. I also really appreciate that the author already implemented proto-types and did experiments with them. Very cool. That said, I am sceptical that this tool will ultimately have a big impact (but would love for reality to convince me otherwise.)

Wise AI support for government decision-making — Ashwin Acharya and Michaelah Gertz-Billingsley

Summary

Governments are behind the curve on adopting advanced AI, in part due to concerns about its reliability. Suppose that developers gradually solve the issues that make AIs poor advisors — issues like confabulation, systemic bias, misrepresentation of their own thought processes, and limited general reasoning skills. What would make smart AI systems a good or poor fit as government advisors?

We recommend the development of wise decision support systems that answer the questions people _should_ have asked, not just the ones they did.

We recommend that government-advising institutions (e.g. contracting orgs like RAND) begin developing such systems, scaling them up over time as they acquire data and AI becomes more capable. Initially, AI systems might mostly summarize and clarify the views of human experts; later, they may guide and reframe conversations; eventually, they may largely _constitute_ the expert conversation that informs government decisionmaking on a topic.

CN’s comments

I thought this post highlights a very important area, decision-making in governments, and argues for something that I think is clearly true: Governments should implement (cheap) tools to improve their belief-formation and decision-making. I also liked that they discussed the availability of data. That said, I didn’t think the text offered either very novel ideas or a concrete enough proposal to make it meaningfully easier for groups to implement the proposal.

Machines and moral judgement — Jacob Sparks

Summary

Creating AI that is both generally intelligent and good requires building machines capable of making moral judgments, a task that presents philosophical and technical difficulties.

Current AI systems and much of what people imagine when they talk about “moral machines” may make morally significant or morally correct decisions, but they lack true moral reasoning.

Moral reasoning allows for reflective distance from one’s motivations and from any description under which one might act. It seems to challenge the distinction between beliefs and desires and to sit uncomfortably with a naturalistic worldview.

While there are promising tools in reinforcement learning for building agents that make moral judgments, we need more work on the philosophical puzzles raised by moral judgment and to rethink core RL concepts like action, reward, value and uncertainty.

CN’s comments

I thought the discussion of the peculiarities of morality and hence potential challenges with implementing it in AI systems was well-done if not novel. That said, I really strongly disagree with its last section “But How” where the author discusses how particular AI training techniques could help and thought that was by far the weakest part of the essay. In particular, I thought the connection the author draws between uncertainty and moral reflection is dubious and disagree with the reasons to conclude that RL is more promising than supervised finetuning. I think the text could have been improved with simple changes to the framing. (E.g. “Supervised finetuning has these flaws. Perhaps RL is better? Here are some reasons to think so. But, unfortunately, RL also has some flaws. So, maybe not.”) [Editor’s note: The framing was adjusted in response to this feedback.]

The purpose of philosophical AI will be: to orient ourselves in thinking — Maximilian Noichl

Summary

In this essay I will suggest a lower bound for the impact that artificial intelligence systems can have on the automation of philosophy. Specifically I will argue that while there is reasonable skepticism about whether LLM-based-systems that are sufficiently similar to the best ones available right now, will, in the short to medium term, be able to independently produce philosophy in a level of quality and creativity makes them interesting to us, they are clearly able to solve medium complexity language tasks in a way that makes them useful to structure and integrate the contemporary philosophical landscape, allowing for novel and interesting ways to orient ourselves in thinking.

CN’s comments

I really liked that this entry just went ahead and actually tried to implement a proto-type for a tool that can help with automating Philosophy. I also think the type of tool they proto-typed could become really useful. (In a way, it’s a first implementation of the improvements to organising academic literature that another entry proposed.)

BS’s comments

In a pilot study, the author of this entry used automation tools to map some literature on the philosophy of AI. I was happy to see an actual instance of automating something philosophical, that the instance was of a sort that could plausibly be useful if well-implemented at scale, and that the instance could be seen as a sort of proof of concept of a proposal in another entry (“Philosophy's Digital Future”) to use AI to map philosophical literature.

Cross-context deduction: on the capability necessary for LLM-philosophers — Rio Popper and Clem von Stengel

Summary

In this paper, we define a capability we call ‘cross-context deduction’, and we argue that cross-context deduction is required for large language models (LLMs) to be able to do philosophy well. First, we parse out several related conceptions of inferential reasoning, including cross-context deduction. Then, we argue that cross-context deduction is likely to be the most difficult of these reasoning capabilities for language models and that it is particularly useful in philosophy. Finally, we suggest benchmarks to evaluate cross-context deduction in LLMs and possible training regimes that might improve performance on tasks involving cross-context deduction. Overall, this paper takes an initial step towards discerning the best strategy to scalably employ LLMs to do philosophy.

CN’s comments

I really enjoyed the entry from an ML perspective and also thought it was very easy to read. I also really like the general approach of “Take some fairly well defined types of capabilities that we can differentiate in the context of ML, think about which ones are the kinds of capabilities we want to differentially improve, and suggest promising ways of doing so based on the existing ML literature.” That said, I’m unfortunately very unsure that boosting what the authors call cross-context deduction* would in fact be a good idea. While I thought their discussion of cross-context deduction* and Philosophy was really interesting, my current guess is that if we succeeded at boosting cross-context deduction* this would be net bad, i.e. net would net advance dangerous capabilities more than is justified by the boost to Philosophy.

(*We had some discussion about whether they defined deduction correctly, but I think they successfully managed to point at a for practical purposes distinct-enough kind of capability.)

DM’s comments

I have three main concerns about this paper. 

First, it’s unclear what the authors mean by “deduction”: they introduce it both as reasoning from universals to particulars (e.g. on pgs 2 and 6), and, more commonly, as reasoning in which premises “necessarily imply” the conclusion. But these are very different things. “Socrates is human, therefore something is human” is a deductive inference (in the latter sense) that moves from particular to universal. Conversely, we infer from universal to particular in an uncertain way all the time, eg. applying statistics to a case. 

In what follows, I’ll assume that by “deduction” the authors really mean “the type of reasoning that preserves truth with certainty”. (Another alternative is that they mean “inference by way of the application of logical laws”, but in that case the category depends on the choice of a logical system.) But why think of this as a type of reasoning, beyond the fact that there’s an old tradition of doing so? Maybe instead we should think of this as a specific epistemic feature that happens to attend some instances of various types of reasoning, individuated by something like an underlying cognitive procedure. That is, perhaps a better taxonomy will distinguish between semantic reasoning, statistical reasoning, visual reasoning, conditionalization, etc., each of which allows for various degrees of epistemic distance between premises and conclusion, including--in the limit case--the preservation of certainty. 

My second concern is that no convincing case is made that there is anything especially deductive about philosophical reasoning, or that there is anything especially philosophical about deductive reasoning (cross-context or otherwise), in either of the senses of “deduction” conflated in the paper. Indeed, paradigmatically philosophical arguments can usually be presented either in a deductive form or an inductive one. The deductive version will have more tendentious premises of which we are uncertain; the inductive version will have more certain premises but won’t entail the conclusion. 

Third, the paper uses “context” in “cross-context” in two very different ways: applied to humans it seems to mean roughly a domain of inquiry or a source of information, but applied to LLMs it seems to literally mean their “context window” (the input text supplied by the user). But there is no clear relationship between these two things aside from the fact that the word “context” is used for both of them. So even if philosophical reasoning by humans is often “cross-context” in the former sense, it doesn’t follow that there’s any special relationship to “cross-context” reasoning by LLMs in the latter sense.  

Evolutionary perspectives on AI values — Maria Avramidou

Summary

Addressing the question of value drift is crucial for deciding the extent to which AI should be automated. AI has the potential to develop capabilities that, if misaligned with human values, could present significant risks to humanity. The decision to automate AI depends heavily on our ability to trust that these systems will reliably maintain this alignment. Evolutionary theory provides a valuable framework for predicting how AI values might change, highlighting the role of environmental incentives and selection pressures in shaping AI behavior. Current discussions often focus on the inevitability of AI developing selfish values due to these pressures. I analyse these arguments and provide a case against these claims by drawing examples from evolution.

CN’s comments

This text, while somewhat off-topic in my view, highlights value drift as an important and neglected area. I think that’s valuable and it does a good job characterising it alongside what they call value specification and value monitoring (which I would loosely translate into outer and inner alignment.) I also think it’s reasonable to discuss evolution in this context. That said, I thought the text missed an opportunity to really analyse to which extent the evolutionary analogy is relevant in the AI context (and perhaps different stages in the AI context, e.g. training and post-deployment) and what other mechanics might be at play. The text failed to convince me that the evolutionary arguments it gave have bearing on AI and, in my view, unfortunately also doesn’t seem to try to do so.

BS’s comments

(I recused myself from judging this entry because I had worked with the author on related work.)

Tentatively against making AIs 'wise' — Oscar Delaney

Summary

CN’s comments

I liked that the entry questions the premise of the competition. I also think it makes a reasonable point that inspires some thinking and in a concise and clear way. A more complete version of this text would have, for example, tried to reason about what it is about wisdom that the competition hosts find appealing and whether there is some wisdom-related property that would be beneficial to promote in AI or otherwise addressed a potential trade-off between the author’s concern and wise AI. (Or otherwise tried to turn the competition premise into its most interesting version.)

Concluding thoughts

We found it fascinating and informative getting to read the competition entries and then hear other people’s perspectives on them. We hope that many readers may find the same. The greatest success of this competition will be if it sparks further thought and engagement, and helps these nascent fields to find their feet a little earlier than they might otherwise have done.

Here are some closing thoughts from our judges:

BS’s thoughts

Views I had before reading contest entries:

How reading contest entries affected my views:

Some takes on future work in these areas:

DM’s thoughts

Reading and thinking about these submissions, it strikes me that there are two kinds of goals for which one might want to try to automate philosophy and/or wisdom. One is the goal of helping to improve human thinking: perhaps AIs can guide us to understand our values better, help us achieve new philosophical insights, or help us reason better about how to build better institutions or future AIs. Importantly, a system that supports this goal wouldn’t need to be wise or philosophical itself: consider, for example, the AI that Yetter Chapell envisions, which simply provides a map of philosophical contributions, or the systems described by Acharya and Gertz-Billingsley, which support government decision-making. I was impressed by the variety and practicality of the suggestions for making progress towards the goal of improving human reasoning. 

A very different goal is that of imbuing the AIs themselves with wisdom, and/or having them arrive at philosophical conclusions on our behalf. Having read these submissions, I am both more optimistic that this can be achieved, and hesitant about whether we want to achieve it. Consider, for example, Delaney’s point that many features we associate with wisdom are really not the sort of features with which we’d want to imbue AIs, at least at first. (Laine likewise places “illegibility” on the “wisdom” side of the “intelligence/wisdom” divide.) So perhaps wisdom isn’t the right concept to focus on, after all. Likewise, if an AI is trained to be very capable at philosophical reasoning, we will probably want to inspect its whole chain of reasoning rather than trust its philosophical hunches. And the more legible we can make such a system, the more its output will actually function simultaneously to improve human reasoning. 

As much as I would love to see more progress made in philosophy generally, I do think the most pressing problem for which we need the help of AI is the problem of alignment itself. Here my guess is that systems aimed at supporting human reasoning could be quite helpful; but for the reasons just given, I’m less optimistic that it would be very useful to aim making AIs “wise” or even good at philosophy per se, as a stepping stone towards alignment. One kind of exception might be systems trained to reason carefully (but also legibly!) about human values, in a way that might ultimately help us design systems to reflect them. 

AS’s thoughts

My reaction after reading the submissions: "Wow, this area really needs a lot of work". The gap between what's there and what seems possible feels large.

Ideally work in this area:

I'd say the essays have about 1-1.5 of these three features on average (and I'm grateful that they do). They don't all have the same such features, so this doesn't seem to be an intrinsic limit of the domain.

I’m pretty sure you could take any one of the essays, score it along the three dimensions, and then develop it further along the dimensions where it's lacking.

This means that there is already one clear recipe for further work--significantly improve what's there along one dimension, turning it into something quite different in the process. Beyond this basic recipe, there's comparing, combining, and systematizing ideas across different essays. And finally, the original competition announcement poses excellent questions that none of the essays seem to directly address, especially in the Thinking ahead and Ecosystems sections.

As part of my work on Elicit, I regularly ask myself "What kind of reasoning is most worth advancing, and how?", and I know other founders whose roadmaps are open to arguments & evidence. Of course, good thinking on automating wisdom would likely have many diffuse positive effects, but I wanted to flag this one path to impact that feels especially real to me, and makes me personally very excited to see more work in this area.

Acknowledgements

Thanks to AI Impacts for hosting this competition, and to the donors who allowed it to happen. And of course, a big thanks to the people who took up the challenge and wrote entries for the competition — this would all have been nothing without them.

 

  1. ^

    After in-depth judge discussions, there was more consensus on which were the top four entries, than on the proper ordering between those. We therefore determined to split a $20,000 pool for top prizes among these four, according to average judge views.

    As there were a good number of entries that judges felt could be deserving of prizes, the remaining prize money was split into $500 runner-up prizes, to allow for more total prizes.

    Although the competition announcement specified “category” prizes, in practice the categories did not seem very natural — many entries lay in multiple categories, and for many entries different people would make different assessments about which categories they lay in. As all of the categories were represented in the top four, we decided to move away the idea of category prizes.

    The principles that were important to us in reworking the prize schedule were:

    • To reflect judge sentiment
    • To have the same total amount of prize-money
    • To keep the eventual prize amounts reasonably straightforward
    • To make things broadly similar in how evenly the money was distributed
    • Not to break any commitments that people might reasonably have counted on (like “if I have the best entry in category X, I’ll get at least $2,000”)

L Rudolf L @ 2024-10-29T13:38 (+6)

I've now posted my entries on LessWrong:

I'd also like to really thank the judges for their feedback. It's a great luxury to be able to read many pages of thoughtful, probing questions about your work. I made several revisions & additions (and also split the entire thing into parts) in response to feedback, which I think improved the finished sequence a lot, and wish I had had the time to engage even more with the feedback.

Chris Leong @ 2024-10-29T05:50 (+3)

Just wanted to mention that if anyone liked my submissions (3rd prize,  An Overview of “Obvious” Approaches to Training Wise AI Advisors, Some Preliminary Notes on the Promise of a Wisdom Explosion), 
I'll be running a project related to this work as part of AI Safety Camp. Join me if you want to help innovate a new paradigm in AI safety.