The narrow slice of data that worries biosecurity…

The narrow slice of data that worries biosecurity experts

Researchers from Johns Hopkins, Oxford, Stanford, Columbia and NYU are calling for guardrails on certain infectious disease datasets that could enable AI to design deadly viruses.

Why it matters: Once high-risk biological data hits the open web, it can't be recalled — and regulation won't matter if the knowledge itself is already widely distributed.

Driving the news: An international group of more than 100 researchers has endorsed a framework to govern certain biological data the same way we handle sensitive health records.

The debate comes as the Trump administration pushes an aggressive "move fast" AI agenda.
The White House's Genesis Mission — announced in late 2025 — aims to build AI systems trained on massive scientific datasets to speed research breakthroughs.

What's inside: The proposed framework isn't meant to slow science. The authors argue that most biological data should stay open.

Only a narrow band that materially increases potential misuse should be protected, they say.
"Responsible governance and scientific progress are not contradictions," according to the framework.

How it works: Right now, AI systems can only create applications based on what's in their training data.

Training models on datasets that link viral genetics to real-world traits — like transmissibility or immune evasion — could lower the barrier to designing dangerous pathogens.

Zoom in: The concern isn't about off-the-shelf versions of ChatGPT and Claude, says Jassi Pannu, assistant professor at the Johns Hopkins Center for Health Security and one of the authors of the framework.

Some AI models for biological research use architectures similar to large language models — but trained on DNA instead of text. Researchers found that systems built to understand human language can also learn the "language" of genetics.
Some developers voluntarily decided not to train their models on virology data because they were worried about putting that capability into the world.

Zoom out: If the data still exists on the web, third parties who may not follow the same safeguards can take those models and fine-tune them on the data that's out there.

"Legitimate researchers should have access," Pannu said. "But we shouldn't be posting it anonymously on the internet where no one can track who downloads it."

The intrigue: "Right now, there's no expert-backed guidance on which data poses meaningful risks, leaving some frontier developers to make their best guess and voluntarily exclude viral data from training," Pannu says.

The report warns that new biological AI models are often released "without conducting basic safety assessments" that would be standard in other life-science research.
Governments should regularly reassess any restrictions, the authors write, and refine them as the science evolves.

What they're saying: "It's been shown time and time again that we don't do a good job of predicting AI capability trends," Pannu says.

"We're constantly surprised. And so I would argue that for these large-scale, consequential risks, we should try and prevent these worst-case scenarios and be prepared for them," she told Axios.
"It's not necessarily that I'm saying that I think this will happen and I know exactly when it will happen, but I think ... it's worth trying to prevent [the worst-case scenario], even if we're unsure exactly when it might happen."

The bottom line: Researchers say there's a window of opportunity to protect dangerous data and prevent bad actors from using AI tools to create bioweapons or other harmful applications.

Read news from 100's of titles, curated specifically for you.

Already a member? Sign in here