AI safety needs to scale, and here's how you can do it

By Esben Kran @ 2024-02-02T07:17 (+32)

This is a linkpost to https://apartresearch.com/post/ai-safety-needs-to-scale-and-heres-how-you-can-do-it

AI development attracts more than $67 billion in yearly investments, contrasting sharply with the $250 million allocated to AI safety (see appendix). This gap suggests there's a large opportunity for AI safety to tap into the commercial market. The big question then is, how do you close that gap?

In this post, we aim to outline the for-profit AI safety opportunities within four key domains:

  1. Guaranteeing the public benefit and reliability of AI when deployed.
  2. Enhancing the interpretability and monitoring of AI systems.
  3. Improving the alignment of user intent at the model level.
  4. Developing protections against future AI risk scenarios using AI.

At Apart, we're genuinely excited about what for-profit AI security might achieve. Our experience working alongside EntrepreneurFirst on AI security entrepreneurship hackathons, combined with discussions with founders, advisors, and former venture capitalists, highlights the promise of the field. With that said, let's dive into the details.

Safety in deployment

Problems related to ensuring that deployed AI is reliable, protected against misuse, and safe for users, companies, and nation states. Related fields include dangerous capability evaluation, control, and cybersecurity.

Deployment safety is crucial to ensure AI systems function safely and effectively without misuse. Security also meshes well with commercial opportunities and building capability in this domain can scale strong security teams to solve future safety challenges. If you are interested in non-commercial research, we also suggest looking into governmental research bodies, such as the ones in UK, EU, and US.

Concrete problems for AI deployment

Examples

Interpretability and oversight

Problems related to monitoring for incidents, retracing AI reasoning, and understanding of AI capability. Related to the research in interpretability, scalable oversight, and model evaluation.

It is without a doubt that more oversight and understanding of AI models will help us make the technology more secure (despite criticism). Given the computation needs of modern interpretability methods along with AI transparency compliance, interpretability and oversight appears to be well-aligned with commercial interests.

Problems in AI transparency and monitoring

Examples

Alignment

Problems related to user intent alignment, misalignment risks of agentic systems, and user manipulation. Related fields include red teaming, causal incentives, and activation steering.

Alignment is an important part of ensuring both model reliability along with further security against deceptive and rogue AI systems. This objective aligns closely with commercial interests, specifically the need to improve model alignment, minimize unnecessary hallucinations, and strengthen overall reliability.

Approaches to commercial alignment

Examples

AI Defense

Problems related to systemic risks of AI, improved cyber offense capabilities, and difficulty of auditing. Related fields include cybersecurity, infrastructure development, and multi-agent security.

AI defense is critical to safeguard infrastructure and governments against the risks posed by artificial general intelligence and other AI applications. Given the magnitude of this challenge, supporting solutions on a comparable scale aligns well with commercial structures.

Concrete societal-scale needs for AI defense

Examples

Wrapping up

We are optimistic about the potential of AI safety in commercial domains and hope that this can serve as an inspiration for future projects. We do not expect this overview to be exhaustive, and warmly invite you to share your thoughts in the comments. Please reach out via email if you're interested in developing AI safety solutions or considering investments in these types of projects.

Apart is an AI safety research organization that incubates research teams through AI safety research hackathons and fellowships. You can stay up-to-date with our work via our updates.

Appendix: Risks of commercial AI safety

Here are a few risks to be aware of when launching commercial AI safety initiatives:

Appendix: Further examples

Appendix: Funding calculations

CategorySub-categoryAmountSource
CapabilitiesFunding$66.8 billionTop Global AI Markets (trade.gov)
AI safetyOpenPhil50MAn Overview of the AI Safety Funding Situation — LessWrong
AI safetyGovernment127MIntroducing the AI Safety Institute - GOV.UK (www.gov.uk)
AI safetyCommercial32MAn Overview of the AI Safety Funding Situation — LessWrong
AI safetySFF30MAn Overview of the AI Safety Funding Situation — LessWrong
AI safetyAcademia + NSF11MAn Overview of the AI Safety Funding Situation — LessWrong
AI safetyPotential60.2BCapabilities minus all AI safety

tailcalled @ 2024-02-02T19:12 (+2)

Doesn't the $67 billion number cited for capabilities include a substantial amount of work being put into reliability, security, censorship, monitoring, data protection, interpretability, oversight, clean dataset development, alignment method refinement, and security? At least anecdotally the AI work I see at my non-alignment-related job mainly falls under these sorts of things.

Esben Kran @ 2024-02-06T02:58 (+1)

You are completely right. My main point is that the field of AI safety is under-utilizing commercial markets while commercial AI indeed prioritizes reliability and security to a healthy level.