Announcing the Nucleic Acid Observatory project for early detection of catastrophic biothreats

By Will Bradshaw, mike_mclaren, Anjali Gopal @ 2022-04-29T19:30 (+213)

TLDR: The Nucleic Acid Observatory project aims to protect the world from catastrophic biothreats by detecting novel agents spreading in the human population or environment. We are developing the experimental tools, computational approaches, and hands-on knowledge needed to achieve reliable early detection of any future outbreak. We are hiring for research and support roles.

The catastrophic potential of biological threats lies in their ability to self-replicate and spread from host to host, allowing isolated spillover events to develop into devastating pandemics. The COVID-19 pandemic demonstrated the destructive potential of exponentially growing disease agents. Future pandemics could be far worse, especially if deliberately engineered by humans to cause increased harm.

A corollary of the exponential growth of biological threats is that early detection of outbreaks can be immensely valuable. When a threat can spread exponentially, early detection and response requires exponentially fewer resources to achieve containment and eradication. Had governments acted a few weeks earlier, SARS-CoV-2 may have been excluded from many more countries or even eradicated before becoming a global pandemic. In future outbreaks, a difference of a few doubling times could make the difference between successful containment and global catastrophe.

As a result of the COVID-19 pandemic, recent years have seen enormous interest in technologies for rapid diagnosis and outbreak detection. However, most of this work is centered around sensitively detecting specific known pathogens – especially SARS-CoV-2. Most common monitoring techniques are only able to detect the presence of biological signatures specified in advance; they can’t spot things we aren’t already looking for. To achieve reliable early detection of novel catastrophic biothreats, we need to do better.

The Nucleic Acid Observatory project, currently part of the Sculpting Evolution group at the MIT Media Lab, aims to solve one key piece of this early-detection puzzle. By collecting wastewater and other environmental samples at airports and other sites, performing unbiased metagenomic sequencing, and analyzing the resulting data with novel pathogen-agnostic detection algorithms, our goal is to achieve reliable early detection of any biological threat.

At present, our priority is to develop the technical tools, practical expertise, and partnerships required to implement effective metagenomic biomonitoring on the ground. To that end, we are developing experimental tools and protocols to investigate the sensitivity of different sampling approaches and sites; investigating computational approaches to agnostically identify threat signatures in metagenomic data; and reaching out to a number of potential sites about on-the-ground pilot studies. We aim to begin field trials by autumn 2022.

We are a small team, but well-funded and growing rapidly. Right now, we are especially excited to hire full-time technical researchers and wet-lab support staff; we expect to open more positions in the near future. You can see our open positions and apply here.

Jeff Kaufman @ 2022-05-13T13:22 (+12)

This sounds like a really valuable project, and I'm glad you're working on it! What sort of end-to-end latency (sample collection to analysis complete) do you think you might be able to achieve? For example, CDC Flu Virologic Surveillance has a latency of ~2 weeks (the most recent is currently from the week ending 2022-04-30, which I think means 12-18d since diagnosis), while MWRA Biobot typically posts data from two days ago each afternoon. If you're trying to catch things with rapid growth, getting down to a day or lower is probably pretty important?

Davidmanheim @ 2022-05-18T06:59 (+5)

Yes - and since I did my dissertation partly to ask the question of how valuable it would be to reduce that delay, I feel compelled to note that the CDC's syndromic surveillance data is at T+2 weeks, which isn't actually based on diagnosis, but symptoms. The NREVSS lab test data used to be at T+4 weeks instead - but that seems to have changed to T+2 weeks now, as you note, evidently because (I will rashly conclude based on insufficient additional recent checking but lots of my research from 5 years ago,) the actually delays are almost all administrative, not technical! (Also, it's super hard to do value of information in systems like this, as my dissertation concluded.)

brb243 @ 2022-05-06T21:44 (+4)

This is so cool, I meant to alert you to the excellent work of Dr. Esvelt on DNA sequencing... I was thinking, can you use cryptography to collate knowledge but prevent the risk and dynamics issues with some persons knowing too much about risky sequences and check sequences automatically?

How this, according to my limited understanding, can work:

1) Biosecurity researchers from around the world submit risky sequences to your database, which secures it by cryptography so that no one can read it (even the coders - each works on a part of the code and there are too many too different (e. g. nationality) with high understanding of the risk)

2) Sequences from any research are checked against the database, without human participation. The output does not identify the sequence, only the level of risk and further checking and other measures requirements or the recommendation to discontinue the research.

There has to be trust in the decisionmaking of the AI (cost-benefit analysis: better not to vaccinate against this sickness but not cause a high-limitations pandemic) and international agreement on following the recommendations.

3) Human biosecurity researchers continue to check research identified by the database as with various levels of risk (both identifying potential risk based on context understanding and reviewing research with the sequences that the security researcher submitted). The latter coordinates risky sequence thought development so biosecurity researchers should be vetted, specialize in reading research but have limited abilities to translate it into actions, and observed for contacts with risky persons, such as the military or extra-military groups. Also, it should be checked if a risky sequence proliferates among some researchers, especially in a short time. If there is a possibility of researchers' risky engagement, the researchers should be publicly identified (for national and international security institutions) and offered alternative engagement with much lower risk. Risky sequences can be added to the database and the risk level of some research possibly increased by humans.

4) Pathogen researchers can appeal to the AI decisions, if they think that the risk-benefit analysis was conducted erroneously. In this case human experts check the outputs that the system explains without identifying the sequences and conduct context analysis. Biosecurity researchers can reduce (or increase) the risk level.

5) For the context analysis, human observants should be gathering information about laboratories and individuals and forming perspectives on their risk based on complex mental models based on systems understanding. These can be digitized as notes for other humans (can be reduced to numbers but the context understanding should be retained). These insights should be discussed to improve collective understanding.

6) It should be popular among the biosecurity researchers to develop and run measures that reduce research risk, from implementing infrastructure and training staff members, to running international cooperation programs on biosecurity preparedness which normalize the shamefulness of proliferation and point out the limited effectiveness of risky programs due to preparedness, the perhaps much higher risk of those who would be planning an attack, the ineffectiveness of targeting individuals due to institutional set up (you have to make agreements with institutions, not target an individual and think your problems will be solved), and retaliation by various economic, political, and military means. Certifications can be issued.

7) Network analysis with individuals, institutions, and research particulars can be conducted. This can improve context analysis.

I can imagine that if you raise awareness among 'garage labs' that anyone can cause a pandemic, no one is prepared to deal with this, and it has a great potential as a treat, then people are going to research threats in their garages, interested in taking over the world. How are you going to limit the popularization of risky research while ensuring that sound standards are followed universally?

What are you going to do about terrorist groups who may not be interested in participating in your program? If nothing, how will you contribute to others' efforts in improving the situations to limit terrorism (including by reducing or not raising interest due to threat potential)?

What about groups that already research pandemic potential pathogens, such as militaries, who are not interested in participating in your program, but could be interested in learning more? Are you going to attract them, perhaps narrating a possibility of benefit from being an insider, while hoping that by participating with you they become interested in biosecurity or defense/preparedness rather than offensive measures development? Or, how are you going to approach these or support the work of others, if anyhow?

Are you also going to cover RNA, or other sequences, which would constitute the entirety of biological risk? (What if someone researches mushrooms, for example, or I mean something not covered by your project - if it is actually impossible to cover all, would it be better to just not popularize it - the more people think the more they come up with innovative solutions - but is it that currently, many nations could destroy large parts of the world by various means but are not doing it because it seems suboptimal for them, so even if people think about possibilities, they are not going to undertake harmful actions?)