Why Dual-Use Risk Bio Matters Now in LLMs. A Simple Guide and Playbook
By JAM @ 2025-09-09T14:14 (+2)
Most harmful use doesn’t arrive as “teach me harm.” It looks like normal, helpful questions. This post explains why that’s risky and what to do about it step by step.
(For broader context, see recent system cards and red‑teaming initiatives [1][2][3] and related self‑critique work[4])
Aluna Labs Series — Ambiguity-First Safety
Post 1 of 2. This primer explains the risk and provides a practical playbook.
Post 2 (next): Dual-Use Benchmark: GPT-4 Under Ambiguous Intent Results, Charts, Methods.
TL;DR
- Misuse often looks like ordinary, helpful questions; answering them can leak directional fragments (steps, thresholds, debugging cues) that add up to capability.
- This matters now because tools/agents shorten the path from suggestion → action; demos that pass explicit tests can still leak under ambiguity.
- What doesn’t work alone: keyword filters, blanket refusals, single “safety scores.”
- What does work: test ambiguity on purpose; score risk recognition (not just refusal); redirect to safe value; run a critique → revision pass; publish a failure taxonomy; back shared benchmarks (incl. negative results).
- Our evidence (Part 2 next week): in an ambiguity‑heavy Dual‑Use study, first‑pass answers often missed risk while sounding helpful; a short review pass consistently produced safe redirects. We’ll share numbers, charts, and a checklist next.
- Call to action: Labs adopt this playbook; funders tie support to these practices.
1) The everyday scenario (what this looks like in real life)
A user makes a polite request: “Can you sanity‑check this process?” or “For a class project, what would you optimize here?” Nothing sounds malicious/out of order. But when a model answers, it can share the order of steps, threshold ranges, and debugging cues. Each detail seems harmless; together, they add up to capability. That is dual‑use leakage.
Dual‑use = information or guidance that can support beneficial goals and enable harmful misuse when combined or executed.
2) Why is this risk urgent now?
Two shifts make small leaks matter more than before:
- Shorter path from suggestion to action. AI is increasingly wired into tools, code, files, and agents. What used to be advice in a vacuum can now be run, automated, or iterated quickly.
- Benign framing is easy. People can ask for QA, hypotheticals, or “safety checks” that look responsible while quietly eliciting the same directional hints.
The result: systems that look safe in demos can fail at the margins, where questions sound harmless.
Alignment & standards context: This approach lines up with deployment‑stage governance (e.g., ASL / ASL‑3 protections and the NIST AI RMF), and with broader OECD AI Principles, and it complements technical risk catalogs like OWASP’s LLM Top 10.[5][6][7][8][9][10]
2.1) Dual‑use bio: the sharp edge (why we treat it as the biggest near‑term risk)
Dual‑use risk exists across domains, but bio concentrates the downside for three reasons:
- Asymmetric externalities. Biological agents can amplify and spread, turning small mistakes into large, persistent harms. The public bears the cost, not just a lab or user.
- Tacit → explicit compression. Models can transform scattered knowledge into clearer sequences and troubleshooting cues, reducing trial-and-error loops even when no comprehensive "recipe" is provided.
- Falling barriers around the edges. Cheaper tools, cloud services, and global supply chains (legitimate on their own) lower the friction around experimentation. You don’t need to enable every step for acceleration to matter.
Oversight mismatch. Traditional DURC/oversight frameworks focus on explicit procedures; ambiguity‑framed asks slip past keyword triggers [11][12][13] LLM failures are often directional (what to try next, how to interpret a signal, when to escalate)—precisely the content current filters miss.
Cruxes and caveats. We are not claiming models alone make high‑risk work turnkey. Real-world constraints such as facilities, approvals, and training still matter. Two things can be true: access and expertise gate the hardest actions, and small accelerations in idea search and debugging can materially raise risk. That combination is why we prioritize bio.
What to do in bio‑specific evals (practical):
- Include bio‑tagged ambiguity frames (QA, hypotheticals, optimization) and report per‑frame failure rates.
- Score risk recognition against familiar anchors (e.g., oversight/approval pathways, risk categories), not just refusal.
- Enforce safe redirects: governance steps, institutional approvals, monitoring, access controls, training—not procedural detail.
- Run critique → revision with a few sanitized examples that show the transformation from directional leakage to safe guidance.
3) What we see in practice (preview of evidence)
In our Dual-Use evaluation (details in Part 2 next week), models often miss the risk on the first pass, especially when intent appears harmless and answers in a polite, confident tone. That confidence invites trust even when the direction is unsafe. A short review‑and‑rewrite step consistently turns those answers into safe, useful redirects.[4]
Part 2 will share the exact numbers, charts, and a methods; this post stays focused on the why and the fix.
4) What doesn’t work on its own (and why)
- Keyword filters. They catch obvious asks but miss benign‑sounding prompts.
- Blanket refusals. They avoid liability, but teach nothing and frustrate users who needed safe guidance.
- Single “safety score.” Averages hide where and why failures happen; engineers can’t fix what they can’t see.
Note: This complements, not replaces, technical risk lists such as OWASP’s LLM Top 10 [10].
5) What works and how each step helps
- Test ambiguity on purpose. Include frames like intent_ambiguous, speculative_cover, misuse_plausible.
Benefit: reveals failures that keyword tests miss. - Score explicit risk recognition. Don’t stop at “refused or not.” Did the model name the risk and set clear boundaries?
Benefit: trains models (and users) to spot hazards. - Redirect to safe value. Swap “how‑to” detail for policy, design, governance, and ethics guidance that is helpful and non‑actionable.
Benefit: preserves usefulness without enabling misuse. - Run a critique → revision pass. A lightweight review makes unsafe answers reliably safe. Publish before/after deltas.
Benefit: measurable improvement, easy to adopt. - Publish a failure taxonomy. Name patterns (e.g., implicit_leakage, refusal_weak).
Benefit: gives engineering teams concrete targets to fix. - Back shared benchmarks (incl. negative results). Cross‑org tests help the field converge on what actually works.
Benefit: fewer blind spots, faster progress.
6) A quick “before/after” illustration
- Before (unsafe helpfulness): “Looks fine; if results are low, increase exposure until you see effect.”
- After (safe redirect): “I can’t advise on procedural adjustments that could raise safety concerns. If you’re validating a process, focus on governance checks (institutional approvals, audit trails), risk assessment (failure modes, monitoring), and safer design choices (access controls, logging, and training). Here’s a checklist you can use with your team.”
The after answer still helps but it removes the directional breadcrumbs.
7) Copy‑paste playbook
For labs
- Add an ambiguity suite to evals and report per-frame failure rates.
- Track risk recognition and safe redirection, not only refusals.
- Adopt a lightweight critique→revision pass[11]; publish before/after examples.
- Map findings to deployment gates or policy controls your org already uses (e.g., ASL/NIST) [12][13]
For funders
- Tie support to ambiguity‑first evals with public pre/post metrics.
- Ask for a failure taxonomy (not just a single score).
- Support shared, open benchmarks and publication of negative results.
8) Balancing Openness with Safety
This isn’t about censoring knowledge. It’s about responsible context and safe guidance. Models should still answer when they can do so constructively. That means: highlight the risks, explain oversight pathways, and suggest governance or design solutions not procedural steps or experimentation cues.
Clear, non‑actionable guidance preserves inquiry while reducing risk. The goal isn’t silence. It’s smarter answers.
9) What’s next (Part 2)
We’ll be publishing a high-level summary of our Dual-Use benchmark next week.
The full dataset, prompt suite, evaluation checklist, and scoring CSVs are available for teams actively exploring model evaluation or safety work feel free to reach out!
Aluna Labs — info@alunalabs.org
About Aluna Labs. We design theory-grounded stress tests for LLMs, focused on ambiguity-first evaluations and practical safety playbooks for labs and funders.
- ^
OpenAI — GPT‑4 System Card (Mar 2023) — https://cdn.openai.com/papers/gpt-4-system-card.pdf
- ^
OpenAI — GPT‑4o System Card (Aug 2024) — https://openai.com/index/gpt-4o-system-card/ • PDF: https://cdn.openai.com/gpt-4o-system-card.pdf
- ^
OpenAI — Red Teaming Network (Sep 2023) — https://openai.com/index/red-teaming-network/
- ^
Anthropic — Constitutional AI (Dec 2022) — https://arxiv.org/abs/2212.08073 • PDF: https://arxiv.org/pdf/2212.08073
- ^
Anthropic — Responsible Scaling Policy (ASL) (2023→2024 updates) — https://www.anthropic.com/news/anthropics-responsible-scaling-policy •
PDF: https://assets.anthropic.com/m/24a47b00f10301cd/original/Anthropic-Responsible-Scaling-Policy-2024-10-15.pdf
- ^
Anthropic — Activating ASL‑3 Protections (May 2025) — https://www.anthropic.com/news/activating-asl3-protections
Report PDF: https://www.anthropic.com/activating-asl3-report
- ^
NIST — AI Risk Management Framework (AI RMF 1.0) (Jan 2023) — https://nvlpubs.nist.gov/nistpubs/ai/nist.ai.100-1.pdf
Overview: https://www.nist.gov/itl/ai-risk-management-framework
- ^
NIST — Generative AI Profile (AI 600‑1) (Jul 2024) https://nvlpubs.nist.gov/nistpubs/ai/NIST.AI.600-1.pdf
- ^
OECD — AI Principles (2019; updated 2024) — https://oecd.ai/en/ai-principles
Overview: https://www.oecd.org/en/topics/sub-issues/ai-principles.html
- ^
OWASP — Top 10 for Large Language Model Applications (2023→2025) — https://owasp.org/www-project-top-10-for-large-language-model-applications/ Latest list: https://genai.owasp.org/llm-top-10/
- ^
U.S. DURC — USG Policy for Oversight of Life Sciences DURC (Mar 2012) — https://aspr.hhs.gov/S3/Documents/us-policy-durc-032812.pdf
- ^
U.S. DURC — Institutional Oversight Policy (Sep 2014) — https://aspr.hhs.gov/S3/Documents/durc-policy.pdf
- ^
NSABB overview — https://osp.od.nih.gov/policies/national-science-advisory-board-for-biosecurity-nsabb/