Nucleic Acid Observatory Updates, April 2025
By Jeff Kaufman 🔸, mike_mclaren, Will Bradshaw @ 2025-04-15T18:58 (+15)
This is a linkpost to https://naobservatory.org/blog/updates-apr-2025/
We’ve been very busy since our January update, and have a lot to share. As always, if you have questions or see opportunities to collaborate, please let us know; we’re eager to work with others thinking along similar lines.
Wastewater Sequencing
We’ve continued to collaborate with Marc Johnson’s lab at the University of Missouri to sequence wastewater influent. We’ve also begun to ramp up our in-house process, using the NovaSeq X+ sequencers at Broad Clinical Labs. Over the past three months, we’ve sequenced 270B read pairs from thirteen sewersheds, more than all our pre-2025 sequencing combined. This includes the addition of a sewershed in South Florida, through a collaboration with Helena Solo-Gabriele’s lab at the University of Miami, and a new sewershed from the Midwest with significant input from meat processing facilities, through the Johnson lab. We continue to be enthusiastic about collaborating with additional partners here to expand the scope of the monitoring system.
Our ANTI-DOTE contract work for PHC Global (press release) has expanded to three sewersheds, and now includes wastewater from military bases in addition to marine blackwater.
We’ve written up our key conclusions from our fall 2023 collaboration with CDC’s Traveler-based Genomic Surveillance program and Ginkgo Biosecurity, comparing pooled airplane lavatory waste to municipal treatment plant influent. We plan to release a preprint after our collaborators have reviewed it. While this report considers only the pilot sequencing results from last spring, we’ve now received a full set of prepared aliquots from MIT’s BioMicro Center and have sequenced an initial batch of these at Broad Clinical Labs. We’ll make a decision later this quarter about whether to sequence the full set now, and if so, how deeply.
Last winter, we collaborated with Jason Rothman, Katrine Whiteson, and the Southern California Coastal Water Research Project to sequence wastewater from Los Angeles. While the data has been on SRA since December (PRJNA1198001), we’ve recently submitted a data paper for publication describing the data generation process and giving additional metadata. A preprint is available on SSRN.
Pooled Individual Sequencing
Over Q1, we ramped up our swab collection efforts, collecting swabs from 728 people across sixteen collection runs (dashboard). Our goal was to collect enough data to decide to either scale the program up or shut it down, and based on promising results over Q1 we’ve decided to scale it up. We’ll publish a report explaining our reasoning and our overall findings in mid Q2.
To effectively process pools of dry swabs, we’ve needed to develop a nucleic acid extraction protocol, which is public on protocols.io along with our existing wastewater protocols.
Air Sampling
Our review on indoor air sampling for detecting viral nucleic acids has now been published in the Journal of Aerosol Science (blog post). This peer-reviewed publication builds upon the preprint we previously announced in May 2024, and includes analysis of public metagenomic sequencing data, a review of sample collection methods, and discussion of implementation strategies. We would be excited to collaborate with others, especially in cases where we could run our pathogen detection algorithms on data collected by partners with air sampling expertise. In particularly promising cases, we could potentially cover sequencing and sample processing costs to make this happen.
Analysis of Sequencing Data
We continued to optimize and professionalize our main reference-based analysis pipeline, mgs-workflow. It’s now even cheaper to run, because we’ve converted it to streaming and dropped several redundant stages.
While we had hoped to have reference-based growth detection (RBGD) in production by the end of Q1, we’ve realized that mgs-workflow needs to have better handling of rare and ambiguous sequences before we expect RBGD to perform well. Improving these inputs to RBGD is a high-priority project for us in Q2.
We’ve continued to run our chimera detection system on data as it comes in. The most interesting finding from the last few months has been a cluster of sequencing reads with a junction between cytomegalovirus (CMV) and something synthetic. With help from the Johnson lab we’ve determined these are most likely lab discards from CMV vaccine work.
When we find a suspicious chimeric junction or other unusual feature in the data, we often would like to know more context. In our last update post, we described a new tool for iteratively assembling outward from a seed of interest. This quarter we’ve continued the tool’s development, are now using it internally for investigating chimeric junctions, and have made it open source.
We’re also applying this assembly tool to detect threats without relying on viral reference sequences. We have prototyped a pipeline for flagging short sequences with growing abundance, and are experimenting with assembling outward from these flagged sequences and characterizing the resulting contigs. Mike McLaren presented a poster at the Chemical and Biological Defense Gordon Research Conference, describing these efforts to construct a reference-free detection pipeline.
We published the third blog post in our series evaluating blood as a potential sample type for biosurveillance, analyzing six public datasets to understand the relative abundance of various viruses in metagenomic sequencing data from whole blood and plasma. While we think a comprehensive system would include blood, our current best guess is that it’s approximately third priority, after wastewater and either nasal swabs or air sampling.
Organizational Updates
We recently hired a Research Scientist, Katherine Stansifer, and a Laboratory Technician, RóisÃn Floyd-O’Sullivan. Katherine comes to us from Biobot Analytics where she analyzed wastewater sequencing data and developed pipelines to detect SARS-CoV-2 variants. She’s joined our Near Term Detection team to streamline our handling of incoming data, professionalize our pipelines, and generally help us improve our computational detection capabilities. RóisÃn also comes to us from Biobot Analytics where she processed wastewater samples for pathogen detection and was involved in lab operations management. She’s joined our wet lab team to process and sequence samples for our philanthropic-funded and contract sequencing, and improve our lab organization.
We’ve received a substantial grant from Open Philanthropy. The largest portion, $3.03M, is for scaling up our wastewater sequencing to three NovaSeq X 25B runs weekly. It also includes $350k for improving experimental methods and exploring other detection opportunities, and $50k as a reserve for rapid response needs if we find something seriously concerning in the sequencing data.
The NAO was featured in a recent piece in Transformer, discussing efforts to increase societal resilience in the face of risks from AI.
SummaryBot @ 2025-04-16T15:49 (+1)
Executive summary: This detailed update from the Nucleic Acid Observatory (NAO) outlines major expansions in wastewater and pooled individual sequencing, air sampling analysis, and data processing capabilities, emphasizing progress toward scalable biosurveillance systems while acknowledging ongoing technical challenges and exploratory efforts.
Key points:
- Wastewater sequencing has scaled significantly, with over 270 billion read pairs sequenced from thirteen sites—more than all previous years combined—thanks to collaborations with several research labs and support from contracts like ANTI-DOTE.
- Pooled swab collection from individuals has expanded, with promising Q1 results leading to a decision to scale up; a public report is expected in mid Q2 detailing the findings and rationale.
- Indoor air sampling work has resulted in a peer-reviewed publication, and the team is actively seeking collaborations with groups already collecting air samples, potentially offering funding for sequencing and processing.
- Software development continues, with improvements to the main mgs-workflow pipeline and efforts to enhance reference-based growth detection (RBGD) by addressing issues with rare and ambiguous sequences.
- Reference-free threat detection is being prototyped, including tools for identifying and assembling from short sequences with increasing abundance—efforts recently shared at a scientific conference.
- Organizationally, the NAO has grown, adding two experienced staff members from Biobot Analytics and securing a $3.4M grant from Open Philanthropy to support wastewater sequencing scale-up, methodological improvements, and rapid-response readiness.
This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.