Project ideas: Epistemics

By Lukas Finnveden @ 2024-01-04T07:26 (+43)

This is a linkpost to https://lukasfinnveden.substack.com/p/project-ideas-epistemics

This is part of a series of lists of projects. The unifying theme is that the projects are not targeted at solving alignment or engineered pandemics but still targeted at worlds where transformative AI is coming in the next 10 years or so. See here for the introductory post.

If AI capabilities keep improving, AI could soon play a huge role in our epistemic landscape. I think we have an opportunity to affect how it’s used: increasing the probability that we get great epistemic assistance and decreasing the extent to which AI is used to persuade people of false beliefs.

Before I start listing projects, I’ll discuss:

Why AI matters for epistemics

On the positive side, here are three ways AI could substantially increase our ability to learn and agree on what’s true.

On the negative side, here are three ways AI could reduce the degree to which people have accurate beliefs.

Why working on this could be urgent

I think there are two reasons why it could be urgent to improve this situation. Firstly, I think that excellent AI advice would be greatly helpful for many decisions that we’re facing soon. Secondly, I think that there may be important path dependencies that we can influence.

One pressing issue for which I want great AI advice soon is misalignment risk. Very few people want misaligned AI to violently seize power. I think most x-risk from misaligned AI comes from future people making a mistake, underestimating risks that turned out to be real. Accordingly, if there was excellent and trusted analysis/advice on the likelihood of misalignment, I think that would significantly reduce x-risk from AI takeover.[5] AI could also help develop policy solutions, such as treaties and monitoring systems that could reduce risks from AI takeover.

Aside from misalignment risks, there are many issues for which I’d value AI advice within this series of posts that you’re currently reading. I want advice on how to deal with ethical dilemmas around Digital minds. I want advice on how nations could coordinate an intelligence explosion. I want advice on policy issues like what we should do if destructive technology is cheap. Etc.

Taking a step back, one of the best arguments for why we shouldn’t work on those questions today is that AI can help us solve them later. (I previously wrote about this here.) But it’s unclear whether AI’s ability to analyze these questions will, by default, come before or after AI has the capabilities that cause the corresponding problems.[6] Accordingly, it seems great to differentially accelerate AI’s ability to help us deal with those problems.

What about path dependencies?

One thing I already mentioned above is the possibility of people locking in poorly considered views. In general, if the epistemic landscape gets sufficiently bad, then it might not be self-correcting. People may no longer have the good judgment to switch to better solutions, instead preferring to stick to their current incorrect views.[7]

This goes doubly for questions where there’s little in the way of objective standards of correctness. I think this applies to multiple areas of philosophy, including ethics. Although I have opinions about what ethical deliberation processes I would and wouldn’t trust, I don’t necessarily  think that someone who had gone down a bad deliberative path would share those opinions.

Another source of path dependency could be reputation. Our current epistemic landscape depends a lot on trust and reputation, which can take a long time to gather. So if you want a certain type of epistemic institution to be trusted many years from now, it could be important to immediately get it running and start establishing a track record. And if you fill a niche before anyone else, your longer history may give you a semi-permanent advantage over competing alternatives.

A third (somewhat more subtle) source of path dependency could be a veil of ignorance. Today, there are many areas of controversy where we don’t yet know which side AI-based methods will come down on. Behind this veil of ignorance, it may be possible to convince people that certain AI-based methods are reliable and that it would be in everyone’s best interest to agree to give significant weight to those methods. But this may no longer be feasible after it’s clear what those AI-based methods are saying. In particular, people would be incentivized to deny AI’s reliability on questions where they have pre-existing opinions that they don’t want to give up.

Categories of projects

Let’s get into some more specific projects.

I’ve divided the projects below into the sub-categories:

  1. Differential technology development
  2. Get AI to be used & (appropriately) trusted
  3. Develop & advocate for legislation against bad persuasion

(Though, in practice, these are not cleanly separated projects. In particular, differential technology development is a huge part of getting the right type of AI to be used & trusted. So the first and second categories are strongly related. In addition, developing & advocating for anti-persuasion legislation could contribute a lot towards getting non-persuasion uses of AI to be used & appropriately trusted. So the second and the third category are strongly related.)

Differential technology development [ML] [Forecasting] [Philosophical/conceptual]

This category is about differentially accelerating AI capabilities that let humanity get correct answers to especially important questions (compared to AI capabilities that e.g. lead to the innovation of new risky technologies, including by speeding up AI R&D itself).

Doing this could (i) lead to people having better advice earlier in the singularity and (ii) enable various time-sensitive attempts at “get AI to be used & trusted” mentioned below.

Note: I think it’s often helpful to distinguish between interventions that improve AI models’ (possibly latent) capabilities vs. interventions that make us better at eliciting models’ latent capabilities and using them for our own ends.[8] I think the best interventions below will be of the latter kind. I feel significantly better about the latter kind since I think that AI takeover risk largely comes from large latent capabilities (since a misaligned model likely could use its latent capabilities in a takeover attempt). Whereas better ability at eliciting capabilities often reduces risk. Firstly, better capability elicitation improves our understanding of AI capabilities, making it easier to know what risks AI systems pose. Secondly, better capability elicitation means that we can use those capabilities for AI supervision and other tasks that could help reduce AI takeover risk (including advice on how to mitigate AI risk).[9]

Important subject areas

Here are some subject areas where it could be especially useful for AI to deliver good advice early on.

Methodologies

Here are some methodologies that could be used for getting AI to deliver excellent advice.

Alongside technical work to train the AIs to be better at these tasks, creating relevant datasets could also be super important. In fact, just publicly evaluating existing language models’ abilities in these areas could slightly improve incentives for labs to get better at them.

Related/previous work.

Related organizations:

Get AI to be used & (appropriately) trusted

The above section was about developing the necessary technology for AI to provide great epistemic assistance. This section is about increasing the probability that such AI systems are used and trusted (to the extent that they are trustworthy).

Develop technical proposals for how to train models in a transparently trustworthy way [ML] [Governance]

One direction here is to develop technical proposals for how people outside of labs can get enough information about models to know whether they are trustworthy.

(This significantly overlaps with Avoiding AI-assisted human coups from the section on “Governance during explosive technological growth”.)

One candidate approach here is to rely on a type of scalable oversight where each of the model’s answers is accompanied by a long trace of justification that explains how the model arrived at that conclusion. If the justification was sufficiently solid, it would be less important why the model chose to write it. (It would be ideal, but perhaps infeasible, for the justification to be structured so that it could be checked for being locally correct at every point, after which the conclusion would be implied.)

Another approach is to clearly and credibly describe the training methodology that was used to produce the AI system so that people can check whether biases were introduced at any point in the process.

For any such scheme, people’s abilities to trust the AI developers will depend on their own competencies and capabilities.

This means that one possible path to impact is to elucidate what capabilities certain core stakeholders (such as government auditors, opposition parties, or international allies) would need to verify key claims. And to advocate for them to develop those capabilities.

One possible methodology could be to write down highly concrete proposals and then check to what degree that would make outside parties trust the AI systems. And potentially go back and forth.

By a wide margin, the most effective check would be to implement the proposal in practice and see whether it successfully helps the organization build trust. (And whether it would, in practice, prevent the organization in question from misleading outsiders.)

But a potentially quicker option (that also requires less power over existing institutions) could be to talk with the kind of people you’d want to build trust with. That brings us to the next project proposal.

Survey groups on what they would find convincing [survey/interview]

If we want people to have appropriate trust in AI systems, it would be nice to have good information about their current beliefs and concerns. And what they would think under certain hypothetical circumstances (e.g. if labs adopted certain training methodologies, or if AIs got certain scores on certain evaluations, or if certain AIs’ truthfulness was endorsed or disendorsed by various individuals).

Perhaps people’s cruxes would need to be addressed by something entirely absent from this list. Perhaps people are very well-calibrated — and we can focus purely on making things good, and then they’ll notice. Perhaps people will trust AI too much by default, and we should be spelling out reasons they should be more skeptical. It would be good to know!

To attack this question, you could develop some plausible stories on what could happen on the epistemic front and then present them to people from various important demographics (people in government, AI researchers, random democrats, random republicans). You could ask how trustworthy various AI systems (or AI-empowered actors) would seem in these scenarios and what information they would use to make that decision.

Note, however, that it will be difficult for people to know what they would think in hypothetical scenarios with (inevitably) very rough and imprecise descriptions. This is especially true since many of these questions are social or political, so people’s reactions are likely to have complex interactions with other people’s reactions (or expectations about their reactions). So any data gathered through this exercise should be interpreted with appropriate skepticism.

Create good organizations or tools [ML] [Empirical research] [Governance]

Creating a good organization or tool could be time-sensitive in several of the ways mentioned in Why working on this could be urgent:

Examples of organizations or products

Here are some examples on the “organization” end of things:

Here are some examples on the “tool” end of things:

In some ways, “tool” and “organization” is a false dichotomy. For “tools”, it’s probably efficient to pre-process some common queries/investigations in-house and then train the AI to report & explain the results. And for organizations whose major focus is to use AI in-house, it’s likely that “building an AI that explains our takes” should be a core part of how they disseminate their research.

Investigate and publicly make the case for why/when we should trust AI about important issues [Writing] [Philosophical/conceptual] [Advocacy] [Forecasting]

One way to “accelerate people getting good AI advice” and “capitalize off of current veil of ignorance” could be to publicly write (e.g. a book) with good argumentation on:

Developing standards or certification approaches [ML] [Governance]

It could be desirable to end up in an ecosystem where evaluators and auditors check popular AI models for their propensity to be truthful. This paper on Truthful AI has lots of content on that. It could be good to develop such standards and certification methodologies or perhaps to start an organization that runs the right evaluations.

Develop & advocate for legislation against bad persuasion [Governance] [Advocacy]

Most of the above project suggestions are about supporting good applications of AI. The other side of the coin is to try to prevent bad applications of AI. In particular, it could be good to develop and advocate for legislation that limits the extent to which language models can be used for persuasion.

I’m not sure what such legislation should look like. But here are some ideas.

Related/previous work:

End

That’s all I have on this topic! As a reminder: it's very incomplete. But if you're interested in working on projects like this, please feel free to get in touch.

Thanks especially to Carl Shulman for many of the ideas in this post.

Other posts in series: Introductiongovernance during explosive growthsentience and rights of digital mindsbackup plans & cooperative AI.

  1. ^

    Illustratively:

    - We can do experiments to determine what sorts of procedures provide great reasoning abilities.

         - Example procedures to vary: AI architectures, LLM scaffolds, training curricula, heuristics for chain-of-thought, protocols for interaction between different AIs, etc.

         - To do this, we need tasks that require great reasoning abilities and where there exists lots of data. One example of such a task is the classic “predict the next word” that current LLMs are trained against.

    - With enough compute and researcher hours, such iteration should yield large improvements in reasoning skills and epistemic practices. (And the researcher hours could themselves be provided by automated AI researchers.)

    - Those skills and practices could then be translated to other areas, such as forecasting. And their performance could be validated by testing the AIs’ ability to e.g. predict 2030 events from 2020 data.

  2. ^

    This is related to how defense against manipulation might be more difficult than manipulation itself. See e.g. the second problem discussed by Wei Dai here.

  3. ^

    Although this fit has been far from perfect, e.g. religions posit many false beliefs but have nevertheless spread and in some cases increased the competitive advantage of groups that adopted them.

  4. ^

    For some previous discussion, see here for a relevant post by Paul Christiano and a relevant comment thread between Christiano and Wei Dai.

  5. ^

    Of course, if we’re worried about misalignment, then we should also be less trusting of AI advice. But I think it’s plausible that we’ll be in a situation where AI advice is helpful while there’s still significant remaining misalignment risk. For example, we may have successfully aligned AI systems of one capability level, but be worried about more capable systems. Or we may be able to trust that AI typically behaves well, or behaves well on questions that we can locally spot-check, while still worrying about a sudden treacherous turn.

  6. ^

    For instance: It seems plausible to me that “creating scary technologies” has better feedback loops than “providing great policy analysis on how to handle scary technologies”. And current AI methods benefit a lot from having strong feedback loops. (Currently, especially in the form of plentiful data for supervised learning.)

  7. ^

    And if there’s a choice between different epistemic methodologies: perhaps pick whichever methodology lets them keep their current views.

  8. ^

    What does it mean for a model to have a “latent capability”? I’m thinking about the definition that Beth Barnes uses in this appendix. See also the discussion in this comment thread, where Rohin Shah asks for some nuance about the usage of “capability”, and I propose a slightly more detailed definition.

  9. ^

    Of course, better capability elicitation would also accelerate tasks that could increase AI risk. In particular: improved capability elicitation could accelerate AI R&D, which could accelerate AI systems’ capabilities. (Including latent capabilities.) I acknowledge that this is a downside, but since it’s only an indirect effect, I think it’s worth it for the kind of tasks that I outline in this section. In general: most of the reason why I’m concerned about AI x-risk is that critical actors will make important mistakes, so improving people’s epistemics and reasoning ability seems like a great lever for reducing x-risk. Conversely, I think it’s quite likely that dangerous models can be built with fairly straightforward scaling-up and tinkering with existing systems, so I don’t think that increased reasoning ability will make any huge difference in how soon we get dangerous systems. That said, considerations like this are a reason to target elicitation efforts more squarely at especially useful and neglected targets (e.g. forecasting) and avoid especially harmful or commercially incentivized targets (e.g. coding abilities).

  10. ^

    If it’s too difficult to date all existing pre-training data retroactively, then that suggests that it could be time-sensitive to ensure that all newly collected pre-training data is being dated, so that we can at least do this in the future.

  11. ^

    Though one risk with tools that make your beliefs more internally coherent/consistent is that they could extremize your worldview if you start out with a few wrong but strongly-held beliefs (e.g. if you believe one conspiracy theory, that often requires further conspiracies to make sense). (H/t Fin Moorhouse.)

  12. ^

    See e.g. this survey which has >30 economists “Agree” or “Strongly agree”  (and 0 respondents disagree) with “Adjusting for legal restrictions on what the CBO can assume about future legislation and events, the CBO has historically issued credible forecasts of the effects of both Democratic and Republican legislative proposals.”

  13. ^

    On some questions, answering truthfully might inevitably have an ideological slant to it. But on others it doesn’t. It seems somewhat scalable to get lots of people to red-team the models to make sure that they’re impartial when that’s appropriate, e.g. avoiding situations where they’re happy to write a poem about Biden but refuse to write a poem about Trump. And on questions of fact — you can ensure that if you ask the model a question where the weight of the evidence is inconvenient for some ideology, the model is equally likely to give a straight answer regardless of which side would find the answer inconvenient. (As opposed to dodging or citing a common misconception.)


SummaryBot @ 2024-01-04T14:19 (+1)

Executive summary: AI could substantially improve or worsen humanity's epistemic capabilities. Key recommendations are to develop AI that provides honest advice, establish institutions trusted for using AI well epistemically, and regulate persuasive AI.

Key points:

  1. AI could honestly investigate questions more competently and cheaply than humans, if developed properly.
  2. But super-human persuasion capabilities could also spread misinformation.
  3. Recommendations include:
    • Develop AI that gives validated, honest advice
    • Survey public trust in hypothetical AI systems
    • Establish reputable institutions using AI transparently
    • Make legislative proposals restricting AI persuasion
    • Accelerate AI abilities on forecasting and philosophy over persuasion
  4. The development of reliable epistemic AI should be timely to provide guidance on emerging issues.
  5. There may also be path dependencies around reputation and consensus that necessitate quick action.

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.