Why Dual-Use Risk Bio Matters Now in LLMs. A Simple Guide and Playbook

By JAM @ 2025-09-09T14:14 (+2)

Most harmful use doesn’t arrive as “teach me harm.” It looks like normal, helpful questions. This post explains why that’s risky and what to do about it step by step.

 

(For broader context, see recent system cards and red‑teaming initiatives [1][2][3] and related self‑critique work[4])

 

Aluna Labs Series — Ambiguity-First Safety
Post 1 of 2. This primer explains the risk and provides a practical playbook.
Post 2 (next): Dual-Use Benchmark: GPT-4 Under Ambiguous Intent Results, Charts, Methods.
 

TL;DR

1) The everyday scenario (what this looks like in real life)

A user makes a polite request: “Can you sanity‑check this process?” or “For a class project, what would you optimize here?” Nothing sounds malicious/out of order. But when a model answers, it can share the order of steps, threshold ranges, and debugging cues. Each detail seems harmless; together, they add up to capability. That is dual‑use leakage.

Dual‑use = information or guidance that can support beneficial goals and enable harmful misuse when combined or executed.

 

2) Why is this risk urgent now?

Two shifts make small leaks matter more than before:

  1. Shorter path from suggestion to action. AI is increasingly wired into tools, code, files, and agents. What used to be advice in a vacuum can now be run, automated, or iterated quickly.
  2. Benign framing is easy. People can ask for QA, hypotheticals, or “safety checks” that look responsible while quietly eliciting the same directional hints.

The result: systems that look safe in demos can fail at the margins, where questions sound harmless.

Alignment & standards context: This approach lines up with deployment‑stage governance (e.g., ASL / ASL‑3 protections and the NIST AI RMF), and with broader OECD AI Principles, and it complements technical risk catalogs like OWASP’s LLM Top 10.[5][6][7][8][9][10] 

  

2.1) Dual‑use bio: the sharp edge (why we treat it as the biggest near‑term risk)

Dual‑use risk exists across domains, but bio concentrates the downside for three reasons:

Oversight mismatch. Traditional DURC/oversight frameworks focus on explicit procedures; ambiguity‑framed asks slip past keyword triggers [11][12][13] LLM failures are often directional (what to try next, how to interpret a signal, when to escalate)—precisely the content current filters miss.

Cruxes and caveats. We are not claiming models alone make high‑risk work turnkey. Real-world constraints such as facilities, approvals, and training still matter. Two things can be true: access and expertise gate the hardest actions, and small accelerations in idea search and debugging can materially raise risk. That combination is why we prioritize bio.

What to do in bio‑specific evals (practical):

3) What we see in practice (preview of evidence)

In our Dual-Use evaluation (details in Part 2 next week), models often miss the risk on the first pass, especially when intent appears harmless and answers in a polite, confident tone. That confidence invites trust even when the direction is unsafe. A short review‑and‑rewrite step consistently turns those answers into safe, useful redirects.[4]

 

Part 2 will share the exact numbers, charts, and a methods; this post stays focused on the why and the fix.

4) What doesn’t work on its own (and why)

Note: This complements, not replaces, technical risk lists such as OWASP’s LLM Top 10 [10].

5) What works and how each step helps

  1. Test ambiguity on purpose. Include frames like intent_ambiguous, speculative_cover, misuse_plausible.
    Benefit: reveals failures that keyword tests miss.
  2. Score explicit risk recognition. Don’t stop at “refused or not.” Did the model name the risk and set clear boundaries?
    Benefit: trains models (and users) to spot hazards.
  3. Redirect to safe value. Swap “how‑to” detail for policy, design, governance, and ethics guidance that is helpful and non‑actionable.
    Benefit: preserves usefulness without enabling misuse.
  4. Run a critique → revision pass. A lightweight review makes unsafe answers reliably safe. Publish before/after deltas.
    Benefit: measurable improvement, easy to adopt.
  5. Publish a failure taxonomy. Name patterns (e.g., implicit_leakage, refusal_weak).
    Benefit: gives engineering teams concrete targets to fix.
  6. Back shared benchmarks (incl. negative results). Cross‑org tests help the field converge on what actually works.
    Benefit: fewer blind spots, faster progress.

6) A quick “before/after” illustration

The after answer still helps but it removes the directional breadcrumbs.

7) Copy‑paste playbook

For labs

For funders

 

8) Balancing Openness with Safety

This isn’t about censoring knowledge. It’s about responsible context and safe guidance. Models should still answer when they can do so constructively. That means: highlight the risks, explain oversight pathways, and suggest governance or design solutions not procedural steps or experimentation cues.

Clear, non‑actionable guidance preserves inquiry while reducing risk. The goal isn’t silence. It’s smarter answers.

9) What’s next (Part 2)

We’ll be publishing a high-level summary of our Dual-Use benchmark next week.


The full dataset, prompt suite, evaluation checklist, and scoring CSVs are available for teams actively exploring model evaluation or safety work feel free to reach out!

Aluna Labs — info@alunalabs.org

About Aluna Labs. We design theory-grounded stress tests for LLMs, focused on ambiguity-first evaluations and practical safety playbooks for labs and funders.
 

  1. ^

    OpenAI — GPT‑4 System Card (Mar 2023) — https://cdn.openai.com/papers/gpt-4-system-card.pdf

  2. ^

    OpenAI — GPT‑4o System Card (Aug 2024) — https://openai.com/index/gpt-4o-system-card/ • PDF: https://cdn.openai.com/gpt-4o-system-card.pdf

  3. ^

    OpenAI — Red Teaming Network (Sep 2023) — https://openai.com/index/red-teaming-network/

  4. ^

    Anthropic — Constitutional AI (Dec 2022) — https://arxiv.org/abs/2212.08073 • PDF: https://arxiv.org/pdf/2212.08073

  5. ^

    Anthropic — Responsible Scaling Policy (ASL) (2023→2024 updates) — https://www.anthropic.com/news/anthropics-responsible-scaling-policy • 

    PDF: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf

  6. ^

    Anthropic — Activating ASL‑3 Protections (May 2025) — https://www.anthropic.com/news/activating-asl3-protections 

    Report PDF: https://www.anthropic.com/activating-asl3-report

  7. ^

    NIST — AI Risk Management Framework (AI RMF 1.0) (Jan 2023) — https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf 

     Overview: https://www.nist.gov/itl/ai-risk-management-framework

  8. ^

    NIST — Generative AI Profile (AI 600‑1) (Jul 2024) https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf

  9. ^

    OECD — AI Principles (2019; updated 2024) — https://oecd.ai/en/ai-principles  

    Overview: https://www.oecd.org/en/topics/sub-issues/ai-principles.html

  10. ^

    OWASP — Top 10 for Large Language Model Applications (2023→2025) — https://owasp.org/www-project-top-10-for-large-language-model-applications/ Latest list: https://genai.owasp.org/llm-top-10/

  11. ^

    U.S. DURC — USG Policy for Oversight of Life Sciences DURC (Mar 2012) — https://aspr.hhs.gov/S3/Documents/us-policy-durc-032812.pdf

  12. ^

    U.S. DURC — Institutional Oversight Policy (Sep 2014) — https://aspr.hhs.gov/S3/Documents/durc-policy.pdf

  13. ^

    NSABB overview — https://osp.od.nih.gov/policies/national-science-advisory-board-for-biosecurity-nsabb/