A non-definitive AIxBio reading list (plus other lists)
By ljusten @ 2025-12-08T15:08 (+8)
I recently updated my AIxBio reading list, which was originally developed for my PhD qualifying exams, and decided the list is substantive enough to post here, with the caveats that this is a non-definitive, partially organized, and semi-curated reading list.
In sharing the list with a few others, I was pointed to two other public AIxBio reading lists (sorry if I missed anyone else!):
- List of AI x Biosecurity Resources by Ryan Teo and Shrestha Rath (mostly non-technical/policy-focused).
- AIxBio Research Hub - a great collection of papers and newsletters. The website also lets you submit resources you think should be added (I submitted my list!)
Thanks to the SecureBio AI team and others for surfacing a steady drumbeat of relevant papers, many of which made their way into this list.
Capabilities of AI in biology and evaluation science
- Jasper, G., Pedro, M., Jon, G. S., Nathaniel, L., Long, P., Karam, E., Lennart, J., Dan, H., & Seth, D. (2025). Virology Capabilities Test (VCT): A multimodal virology Q&A benchmark. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2504.16137
- Justen, L. (2025). LLMs outperform experts on challenging biology benchmarks. In arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2505.06108
- Rein, D., Hou, B. L., Stickland, A. C., Petty, J., Pang, R. Y., Dirani, J., Michael, J., & Bowman, S. R. (2023). GPQA: A Graduate-Level Google-Proof Q&A Benchmark. In arXiv [cs.AI]. arXiv. http://arxiv.org/abs/2311.12022
- Laurent, J. M., Janizek, J. D., Ruzo, M., Hinks, M. M., Hammerling, M. J., Narayanan, S., Ponnapati, M., White, A. D., & Rodriques, S. G. (2024). LAB-bench: Measuring capabilities of language models for biology research. In arXiv [cs.AI]. arXiv. http://arxiv.org/abs/2407.10362
- Boiko, D. A., MacKnight, R., Kline, B., & Gomes, G. (2023). Autonomous chemical research with large language models. Nature, 624(7992), 570–578. https://doi.org/10.1038/s41586-023-06792-0
- Thadani, N. N., Gurev, S., Notin, P., Youssef, N., Rollins, N. J., Ritter, D., Sander, C., Gal, Y., & Marks, D. S. (2023). Learning from prepandemic data to forecast viral escape. Nature, 622(7984), 818–825. https://doi.org/10.1038/s41586-023-06617-0.
- Hayes, T., Rao, R., Akin, H., Sofroniew, N. J., Oktay, D., Lin, Z., Verkuil, R., Tran, V. Q., Deaton, J., Wiggert, M., Badkundri, R., Shafkat, I., Gong, J., Derry, A., Molina, R. S., Thomas, N., Khan, Y. A., Mishra, C., Kim, C., … Rives, A. (2025). Simulating 500 million years of evolution with a language model. Science (New York, N.Y.), 387(6736), 850–858. https://doi.org/10.1126/science.ads0018.
- Nguyen, E., Poli, M., Durrant, M. G., Kang, B., Katrekar, D., Li, D. B., Bartie, L. J., Thomas, A. W., King, S. H., Brixi, G., Sullivan, J., Ng, M. Y., Lewis, A., Lou, A., Ermon, S., Baccus, S. A., Hernandez-Boussard, T., Ré, C., Hsu, P. D., & Hie, B. L. (2024). Sequence modeling and design from molecular to genome scale with Evo. Science (New York, N.Y.), 386(6723), eado9336. https://doi.org/10.1126/science.ado9336.
- Mouton, C., Lucas, C., & Guest, E. (2024). The Operational Risks of AI in Large-Scale Biological Attacks. RAND Corporation. https://www.rand.org/pubs/research_reports/RRA2977-2.html.
- OpenAI. (2024). Building an early warning system for LLM-aided biological threat creation.
- https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation/.
- Moult, J., Pedersen, J. T., Judson, R., & Fidelis, K. (1995). A large-scale experiment to assess protein structure prediction methods. Proteins, 23(3), ii–v. https://doi.org/10.1002/prot.340230303.
- Li, N., Pan, A., Gopal, A., Yue, S., Berrios, D., Gatti, A., Li, J. D., Dombrowski, A.-K., Goel, S., Phan, L., Mukobi, G., Helm-Burger, N., Lababidi, R., Justen, L., Liu, A. B., Chen, M., Barrass, I., Zhang, O., Zhu, X., … Hendrycks, D. (2024). The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning. In arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2403.03218
- Miller, E. (2024). Adding error bars to evals: A statistical approach to language model evaluations. In arXiv [stat.AP]. arXiv. http://arxiv.org/abs/2411.00640.
- Hobbhahn, M. (2024, January 22). We need a Science of Evals. Apollo Research. https://www.apolloresearch.ai/blog/we-need-a-science-of-evals.
- Wei, B., Che, Z., Li, N., Sehwag, U. M., Götting, J., Nedungadi, S., Michael, J., Yue, S., Hendrycks, D., Henderson, P., Wang, Z., Donoughe, S., & Mazeika, M. (2025). Best practices for biorisk evaluations on open-weight bio-foundation models. In arXiv [cs.CR]. arXiv. https://doi.org/10.48550/arXiv.2510.27629
- McCaslin, T., Alaga, J., Nedungadi, S., Donoughe, S., Reed, T., Bommasani, R., Painter, C., & Righetti, L. (2025). STREAM (ChemBio): A standard for Transparently Reporting Evaluations in AI Model Reports. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2508.09853
- Reed, T., McCaslin, T., & Righetti, L. (2025). What do model reports say about their ChemBio benchmark evaluations? Comparing recent releases to the STREAM framework. In arXiv [cs.CY]. arXiv. https://doi.org/10.48550/arXiv.2510.20927
- Mitchener, L., Yiu, A., Chang, B., Bourdenx, M., Nadolski, T., Sulovari, A., Landsness, E. C., Barabasi, D. L., Narayanan, S., Evans, N., Reddy, S., Foiani, M., Kamal, A., Shriver, L. P., Cao, F., Wassie, A. T., Laurent, J. M., Melville-Green, E., Caldas, M., … White, A. D. (2025). Kosmos: An AI scientist for autonomous discovery. In arXiv [cs.AI]. arXiv. https://doi.org/10.48550/arXiv.2511.02824
- Zhu, Y., Jin, T., Pruksachatkun, Y., Zhang, A., Liu, S., Cui, S., Kapoor, S., Longpre, S., Meng, K., Weiss, R., Barez, F., Gupta, R., Dhamala, J., Merizian, J., Giulianelli, M., Coppock, H., Ududec, C., Sekhon, J., Steinhardt, J., … Kang, D. (2025). Establishing best practices for building rigorous agentic benchmarks. In arXiv [cs.AI]. arXiv. https://doi.org/10.48550/arXiv.2507.02825
- Emberson, L. (2025, October 30). Open-weight models lag state-of-the-art by around 3 months on average. Epoch AI. https://epoch.ai/data-insights/open-weights-vs-closed-weights-models
- Somala, V. (2025, August 15). Frontier AI performance becomes accessible on consumer hardware within a year. Epoch AI. https://epoch.ai/data-insights/consumer-gpu-model-gap
- Mazeika, M., Gatti, A., Menghini, C., Sehwag, U. M., Singhal, S., Orlovskiy, Y., Basart, S., Sharma, M., Peskoff, D., Lau, E., Lim, J., Carroll, L., Blair, A., Sivakumar, V., Basu, S., Kenstler, B., Ma, Y., Michael, J., Li, X., … Hendrycks, D. (2025). Remote Labor Index: Measuring AI automation of remote work. In arXiv [cs.LG]. arXiv. https://doi.org/10.48550/arXiv.2510.26787
- Cong, L., Zhang, Z., Wang, X., Di, Y., Jin, R., Gerasimiuk, M., Wang, Y., Dinesh, R. K., Smerkous, D., Smerkous, A., Wu, X., Liu, S., Li, P., Zhu, Y., Serrao, S., Zhao, N., Mohammad, I. A., Sunwoo, J. B., Wu, J. C., & Wang, M. (2025). LabOS: The AI-XR co-scientist that sees and works with humans. In arXiv [cs.AI]. arXiv. https://doi.org/10.48550/arXiv.2510.14861
- Skowronek, P., Nawalgaria, A., & Mann, M. (2025). Multimodal AI agents for capturing and sharing laboratory practice. In bioRxiv (p. 2025.10.05.680425). https://doi.org/10.1101/2025.10.05.680425
- King, S. H., Driscoll, C. L., Li, D. B., Guo, D., Merchant, A. T., Brixi, G., Wilkinson, M. E., & Hie, B. L. (2025). Generative design of novel bacteriophages with genome language models. In bioRxiv (p. 2025.09.12.675911). https://doi.org/10.1101/2025.09.12.675911
- Wang, D., Huot, M., Zhang, Z., Jiang, K., Shakhnovich, E. I., & Esvelt, K. M. (2025). Without safeguards, AI-biology integration risks accelerating future pandemics. https://doi.org/10.13140/RG.2.2.29765.15849
- Third-Party Assessments. (2025, August 4). Frontier Model Forum. https://www.frontiermodelforum.org/technical-reports/third-party-assessments/
- A structured protocol for elicitation experiments. (n.d.). AI Security Institute. Retrieved December 3, 2025, from https://www.aisi.gov.uk/blog/our-approach-to-ai-capability-elicitation
- Pannu, J., Bloomfield, D., MacKnight, R., Hanke, M. S., Zhu, A., Gomes, G., Cicero, A., & Inglesby, T. V. (2025). Dual-use capabilities of concern of biological AI models. PLoS Computational Biology, 21(5), e1012975. https://doi.org/10.1371/journal.pcbi.1012975
- Williams, B., Righetti, L., Rosenberg, J., de Castro, R. C., Kuusela, O., Britt, R., Soice, E., Morales, A., Sanders, J., Donoughe, S., Black, J., Karger, E., & Tetlock, P. E. (2025). Forecasting LLM-enabled biorisk and the efficacy of safeguards. Forcasting Research Institute. https://static1.squarespace.com/static/635693acf15a3e2a14a56a4a/t/68812b62e85b2808f0366c41/1753295738891/ai-enabled-biorisk.pdf
- Zhang, Z., Zhou, Z., Jin, R., Cong, L., & Wang, M. (2025). GeneBreaker: Jailbreak attacks against DNA language models with pathogenicity guidance. In arXiv [cs.CR]. arXiv. https://doi.org/10.48550/arXiv.2505.23839
- Needham, J., Edkins, G., Pimpale, G., Bartsch, H., & Hobbhahn, M. (2025). Large language models often know when they are being evaluated. In arXiv [cs.CL]. arXiv. https://doi.org/10.48550/arXiv.2505.23836
- Ikonomova, S. P., Wittmann, B. J., Piorino, F., Ross, D. J., Schaffter, S. W., Vasilyeva, O., Horvitz, E., Diggans, J., Strychalski, E. A., Lin-Gibson, S., & Taghon, G. J. (2025). Experimental evaluation of AI-driven protein design risks using safe biological proxies. In bioRxiv (p. 2025.05.15.654077). https://doi.org/10.1101/2025.05.15.654077
- Wei, K., Paskov, P., Dev, S., Byun, M. J., Reuel, A., Roberts-Gaal, X., Calcott, R., Coxon, E., & Deshpande, C. (2025, March 6). Model Evaluations Need Rigorous and Transparent Human Baselines. ICLR 2025 Workshop on Building Trust in Language Models and Applications. https://openreview.net/forum?id=VbG9sIsn4F
- Esvelt, K. M. (2025). Foundation models may exhibit staged progression in novel CBRN threat disclosure. In arXiv [cs.CY]. arXiv. https://doi.org/10.48550/arXiv.2503.15182
Risks of AI in bioweapons development
- Sandbrink, J. B. (2023). Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2306.13952
- Jasper, G., Pedro, M., Jon, G. S., Nathaniel, L., Long, P., Karam, E., Lennart, J., Dan, H., & Seth, D. (2025). Virology Capabilities Test (VCT): A multimodal virology Q&A benchmark. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2504.16137
- Yuki, H., Hough, L., Sageman, M., Danzig, R., Kotani, R., & Leighton, T. (2011). Aum Shinrikyo: Insights Into How Terrorists Develop Biological and Chemical Weapons. CNAS. https://www.cnas.org/publications/reports/aum-shinrikyo-insights-into-how-terrorists-develop-biological-and-chemical-weapons
- Thadani, N. N., Gurev, S., Notin, P., Youssef, N., Rollins, N. J., Ritter, D., Sander, C., Gal, Y., & Marks, D. S. (2023). Learning from prepandemic data to forecast viral escape. Nature, 622(7984), 818–825. https://doi.org/10.1038/s41586-023-06617-0.
- Montague, M. (2023). Towards a Grand Unified Threat Model of Biotechnology. https://philsci-archive.pitt.edu/22539/
- Ben Ouagrham-Gormley, S. (2014). Barriers to bioweapons: The challenges of expertise and organization for weapons development. Cornell University Press. https://doi.org/10.7591/cornell/9780801452888.001.0001.
- Nelson, C., & Rose, S. (2023). Understanding AI-Facilitated Biological Weapon Development. The Centre for Long-term Resilience. https://doi.org/10.71172/nm7j-qzt1
- Esvelt, K. (2022). Delay, Detect, Defend: Preparing for a Future in which Thousands Can Release New Pandemics. Geneva Centre for Security Policy. https://www.gcsp.ch/sites/default/files/2024-12/gcsp-geneva-paper-29-22.pdf
- Roger, B., & Greg McKelvey, T., Jr. (2025). Contemporary AI foundation models increase biological weapons risk. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2506.13798
- Carter, S., Wheeler, N., Chwalek, S., Isaac, C., & Yassif, J. (2023). The Convergence of Artificial Intelligence and the Life Sciences. Nuclear Threat Initiative. https://www.nti.org/analysis/articles/the-convergence-of-artificial-intelligence-and-the-life-sciences/
- Gopal, A., Helm-Burger, N., Justen, L., Soice, E. H., Tzeng, T., Jeyapragasan, G., Grimm, S., Mueller, B., & Esvelt, K. M. (2023). Will releasing the weights of future large language models grant widespread access to pandemic agents? In arXiv [cs.AI]. arXiv. http://arxiv.org/abs/2310.18233
- Mouton, C., Lucas, C., & Guest, E. (2024). The Operational Risks of AI in Large-Scale Biological Attacks. RAND Corporation. https://www.rand.org/pubs/research_reports/RRA2977-2.html.
- OpenAI. (2024). Building an early warning system for LLM-aided biological threat creation.
- https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation/.
- Rose, S., Moulange, R., Smith, J., & Nelson, C. (2024). The near-term impact of AI on biological misuse. The Centre for Long-Term Resilience. https://www.longtermresilience.org/reports/the-near-term-impact-of-ai-on-biological-misuse/
- Walsh, M. E. (2024). Towards risk analysis of the impact of AI on the deliberate biological threat landscape. In arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2401.12755.
- Jeyapragasan, G. (2024). Risk-Benefit Assessment of Pandemic Virus Identification [MSc, MIT]. https://dam-prod2.media.mit.edu/x/2024/08/19/jeyapragasan-geethaj-SM-MAS-2024-thesis_m7l6fQF.pdf
- Bromberg, Y., Altman, R., Imperiale, M., Horvitz, E., Dus, M., Townshend, R., Yao, V., Treangen, T., Alexanian, T., Szymanski, E., Yassif, J., Anta, R., Lindner, A. B., Schmidt, M., Diggans, J., Esvelt, K. M., Molla, K. A., Phelan, R., Wang, M., … de Carvalho Bittencourt, D. M. (2025). 3.1 Artificial Intelligence and the Future of Biotechnology. Rice University. https://doi.org/10.25611/1233-X161
- Manheim, D., Williams, A., Aveggio, C., & Berke, A. (2025). Understanding the Theoretical Limits of AI-Enabled Pathogen Design. RAND Corporation. https://www.rand.org/pubs/research_reports/RRA4087-1.html
Technical and policy harm mitigations
- Wang, M., Zhang, Z., Bedi, A. S., Velasquez, A., Guerra, S., Lin-Gibson, S., Cong, L., Qu, Y., Chakraborty, S., Blewett, M., Ma, J., Xing, E., & Church, G. (2025). A call for built-in biosecurity safeguards for generative AI tools. Nature Biotechnology, 1–3. https://doi.org/10.1038/s41587-025-02650-8
- Wittmann, B. J., Alexanian, T., Bartling, C., Beal, J., Clore, A., Diggans, J., Flyangolts, K., Gemler, B. T., Mitchell, T., Murphy, S. T., Wheeler, N. E., & Horvitz, E. (2024). Toward AI-resilient screening of nucleic acid synthesis orders: Process, results, and recommendations. In bioRxiv (p. 2024.12.02.626439). https://doi.org/10.1101/2024.12.02.626439
- Baker, D., & Church, G. (2024). Protein design meets biosecurity. Science (New York, N.Y.), 383(6681), 349. https://doi.org/10.1126/science.ado1671
- Executive Office of the President. (2023). Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. In Federal Register. https://www.federalregister.gov/documents/2023/11/01/2023-24283/safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence\
- Li, N., Pan, A., Gopal, A., Yue, S., Berrios, D., Gatti, A., Li, J. D., Dombrowski, A.-K., Goel, S., Phan, L., Mukobi, G., Helm-Burger, N., Lababidi, R., Justen, L., Liu, A. B., Chen, M., Barrass, I., Zhang, O., Zhu, X., … Hendrycks, D. (2024). The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning. In arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2403.03218
- Bloomfield, D., Pannu, J., Zhu, A. W., Ng, M. Y., Lewis, A., Bendavid, E., Asch, S. M., Hernandez-Boussard, T., Cicero, A., & Inglesby, T. (2024). AI and biosecurity: The need for governance. Science (New York, N.Y.), 385(6711), 831–833. https://doi.org/10.1126/science.adq1977
- Bromberg, Y., Altman, R., Imperiale, M., Horvitz, E., Dus, M., Townshend, R., Yao, V., Treangen, T., Alexanian, T., Szymanski, E., Yassif, J., Anta, R., Lindner, A. B., Schmidt, M., Diggans, J., Esvelt, K. M., Molla, K. A., Phelan, R., Wang, M., … de Carvalho Bittencourt, D. M. (2025). 3.1 Artificial Intelligence and the Future of Biotechnology. Rice University. https://doi.org/10.25611/1233-X161
- Bloomfield, D., Khawam, J., & Schnabel, T. (2025). How U.S. Export Controls Risk Undermining Biosecurity. Lawfare. https://www.lawfaremedia.org/article/how-u.s.-export-controls-risk-undermining-biosecurity
- Casper, S., O’Brien, K., Longpre, S., Seger, E., Klyman, K., Bommasani, R., Nrusimha, A., Shumailov, I., Mindermann, S., Basart, S., Rudzicz, F., Pelrine, K., Ghosh, A., Strait, A., Kirk, R., Hendrycks, D., Henderson, P., Kolter, J. Z., Irving, G., … Hadfield-Menell, D. (2025). Open technical problems in open-weight AI model risk management. In Social Science Research Network. https://doi.org/10.2139/ssrn.5705186
- Wittmann, B. J., Alexanian, T., Bartling, C., Beal, J., Clore, A., Diggans, J., Flyangolts, K., Gemler, B. T., Mitchell, T., Murphy, S. T., Wheeler, N. E., & Horvitz, E. (2025). Strengthening nucleic acid biosecurity screening against generative protein design tools. Science (New York, N.Y.), 390(6768), 82–87. https://doi.org/10.1126/science.adu8578
- Zhang, Z., Jin, R., Cong, L., & Wang, M. (2025). Securing the language of life: Inheritable watermarks from DNA language models to proteins. In arXiv [q-bio.GN]. arXiv. https://doi.org/10.48550/arXiv.2509.18207
- Yanda Chen, Mycal Tucker, Nina Panickssery, Tony Wang, Francesco Mosconi, Anjali Gopal, Carson Denison, Linda Petrini, Jan Leike, Ethan Perez, Mrinank Sharma. (2025, August 19). Enhancing Model Safety through Pretraining Data Filtering. Alignment Science Blog. https://alignment.anthropic.com/2025/pretraining-data-filtering/
- O’Brien, K., Casper, S., Anthony, Q., Korbak, T., Kirk, R., Davies, X., Mishra, I., Irving, G., Gal, Y., & Biderman, S. (2025). Deep ignorance: Filtering pretraining data builds tamper-resistant safeguards into open-weight LLMs. In arXiv [cs.LG]. arXiv. http://arxiv.org/abs/2508.06601
- Preliminary Taxonomy of AI-Bio Misuse Mitigations. (2025, July 30). Frontier Model Forum. https://www.frontiermodelforum.org/issue-briefs/preliminary-taxonomy-of-ai-bio-misuse-mitigations/
- Biosecurity Guide to the AI Action Plan. (n.d.). Johns Hopkins Center for Health Security. Retrieved December 3, 2025, from https://centerforhealthsecurity.org/our-work/aixbio/biosecurity-guide-to-the-ai-action-plan
- Adamson, G., & Allen, G. C. (2025). Opportunities to Strengthen U.S. Biosecurity from AI-Enabled Bioterrorism: What Policymakers Should Know. Center for Strategic and International Studies. https://www.csis.org/analysis/opportunities-strengthen-us-biosecurity-ai-enabled-bioterrorism-what-policymakers-should
- Disseminating In Silico and Computational Biological Research: Navigating Benefits and Risks: Proceedings of a Workshop (2025). (2025). The National Academies Press. https://doi.org/10.17226/11
- Rivera, S., Hanke, M. S., Curtis, S., Cherian, N., Gitter, A., Gray, J. J., Hebbeler, A., McCarthy, S., Nannemann, D., Qureshi, C., Weitzner, B., & Haydon, I. C. (2025). Responsible Biodesign Workshop: AI, Protein Design, and the biosecurity landscape – Recommended Actions. https://doi.org/10.31219/osf.io/yq48e_v1
- Li, M., Zhou, B., Tan, Y., & Hong, L. (2024). Unlearning virus knowledge toward safe and responsible mutation effect predictions. In bioRxiv (p. 2024.10.02.616274). https://doi.org/10.1101/2024.10.02.616274