Our repository compiles 10–20 examples of biosecurity statements and practices from biological design tools and AI models. These examples have been analysed and grouped based on how different developers are addressing biosecurity and dual-use risk concerns.
Target Audience:
This repository is designed for developers working on computational biology research, who recognise the dual-use risks associated with AI tools. However, while many are aware of these risks, they may be unsure about how to publish statements addressing these risks - for both tool users and the general public.
Methodology:
We started out by gathering existing statements and practices from organisations building AI biological design tools. We then reviewed these statements, thematically identifying common features, including explicit acknowledgement of dual-use risks.
Following feedback from Tessa and Max, the repository has been updated with further relevant categories and emergent themes to improve readability and user-friendliness. Some of these categories now highlight how users interact with the tool/model, access management, tool subcategories, and more.
Intended Impact:
We hope that this repository can help to:
Establish coherent norms for mitigating biological design tool risks
Provide templates and inspiration for new tool developers
Contribute to the standardisation of biosecurity practices across the field
How to Use:
The first tab provides an overview of all included tools and their corresponding categories. The extractions tab includes exact wording used in the original biosecurity statements, including the paragraphs we referenced during the process of categorising these tools.
A section below details a list of categories and values used in the repository, for those who are interested.
Feedback:
If you know developers of AI biological design tools or model developers who may benefit from this resource, we would appreciate your help in sharing this with them.
We also welcome feedback on the repository itself, as well as suggestions on any additional categories, important tools or practices we may have missed.
Acknowledgements:
Team: Shiying He, PT Nhean
Conceptualisation and feedback: Tessa Alexanian and Max Langenkamp;
Origin of the author/developer’s host institution, not nationality
Organization
To separate from model name
User’s interaction with the model
Choose ≥1 value
Local execution: running the model directly on a user’s device or local infrastructure, won’t require external servers or internet access.
API access: accessing the model through web-based platform, usually hosted by the developer or a third party (e.g. querying a pathogen prediction model via cloud-based API)
Secure server: hosting and operating the model on a controlled server with strong security and access restrictions to limit unauthorized or unsafe uses (e.g requiring user authentication and monitoring)
Not reported: interactions with model are unclear due to model being released to a small group of scientists for testing (e.g. Google AI co-scientist)
Paywall
Choose 1 value
Yes, with free trials
Yes, no free trials
Yes, but with a free version
No
Not reported: presence of paywall is unclear due to model being released to a small group of scientists for testing (e.g. Google AI co-scientist)
Access Management
Choose ≥1 value
Authentication: verify user identity before granting access
Open weights: model parameters are publicly available for download and use
Open source: model code is publicly accessible and modifiable
Protein design tools: tools that can predict the sequence of proteins with specified structural and/or functional properties (e.g. binding with a given target).
Example user input: 3D protein structure
Output: Amino acid sequences
Small biomolecule design tools: Tools that can predict molecular structures with specific profiles (e.g. generating a drug that provokes a desired biological response and maintains acceptable pharmacokinetic properties)
Example user input: ligand structure, target molecule structure or class, and desired property
Output: molecular structure
Pathogen property prediction: tools that can predict or detect features of a pathogen such as propensity for zoonotic spillover, host tropism, likelihood of infecting humans, virulence, etc.
Example user input: genome sequences
Output: zoonotic spillover prediction score or classification (e.g. high risk)
Host-pathogen interaction prediction tools: tools that can predict the protein-protein interactions between a given host and pathogenic agent (e.g. predicting likelihood of antibody escape for viral mutations, exploitation of host mechanisms, or the virus’ entry mechanism into host cells).
Example user input: host protein sequences, viral protein sequences
Output: Likely interactions between host and viral proteins
Viral vector design tools: tools that can predict the amino acid sequences of virus capsids with the aim of optimizing them as delivery vectors (e.g. capable of assembling and packaging their own genomes, low immunogenicity)
Example user input: target capsid amino acid segment for mutation
Output: amino acid sequences
Immunological system modelling: tools that artificially replicate a component of the human immune system with the aim of predicting immune responses (e.g. predicting T-cell receptor epitope recognition)
Example user input: amino acid sequence of TCR CDR3 region and the epitope
Output: likely TCR recognition of an epitope COVID 19 clinical outcome prediction
Experimental design/simulation and autonomous tools: tools that are able to generate and simulate designs given a predefined objective, and predict experimental outcomes. Tools that are able to conduct experiments (including physical tests, modelling, or data mining) without human intervention.
Example user input: experimental workflow and variables, and laboratory automation equipment
Output: optimized methods or variables, simulated experimental data, or experimental data
Biosecurity Actions
Choose ≥1 value(s)
Statement acknowledging dual-use risk: a formal recognition that their tools could be misused for harmful biological purposes (e.g. a disclaimer noting potential misuse in a model card for a protein design tool, or ethics statement).
Expert engagement: involving biosafety, bioethics, and AI experts in the development and review processes (e.g. forming an external advisory board, qualitative interviews with experts to ensure safe use cases).
Curation of training data: synthetic data, or training data undergoing a filtering process to limit biological data that can enable dangerous capabilities (e.g. excluding pathogen virulence factors from a genomic training dataset).
Evaluation for dual-use risk: assessing models or tools for their potential misuse before release (e.g. red teaming exercises on a biological design tool, or adopting frontier evaluation frameworks)
Adoption or development of governance frameworks: applying internal policies or aligning with external standards to ensure safe development (e.g. adopting the WHO guidance on dual-use research, or create a lab-specific AI-biosafety protocol)
Community consultations: Forums, engagements, discussions or dialogues with potential users and/or stakeholders. (e.g. Alphafold 3: "Building on the external consultations we carried out for AlphaFold 2, we’ve now engaged with more than 50 domain experts, in addition to specialist third parties, across biosecurity, research and industry, to understand the capabilities of successive AlphaFold models and any potential risks. We also participated in community-wide forums and discussions ahead of AlphaFold 3’s launch.")
E.g. Co-Scientist’s trusted tester program to gather feedback from researchers using the tool
Provision of recommendations: Provides recommendations for future work, safeguards, mitigation strategies.
Publication of evaluation tests and/or results: provides additional details on the methodology of evaluations conducted, benchmarks used and/or results of evaluations