Anthropic's latest models display some vulnerability to being used in "heinous crimes," including the development of chemical weapons, the company said in a new sabotage report released late Tuesday.
Why it matters: Increasingly powerful AI models also mean heightened scrutiny of the potential for disastrous behavior.
What they're saying: "In newly-developed evaluations, both Claude Opus 4.5 and 4.6 showed elevated susceptibility to harmful misuse" in certain computer use settings, Anthropic said.
- "This included instances of knowingly supporting — in small ways — efforts toward chemical weapon development and other heinous crimes."
The big picture: The risk assessment looked at actions taken largely by models themselves, without nefarious input from humans.
- Anthropic contends this risk is low, but not negligible.
- Researchers noted that, in certain test environments, and when prompted to "single-mindedly optimize a narrow objective," Opus 4.6 appears "more willing to manipulate or deceive other participants, compared to prior models from both Anthropic and other developers."
Between the lines: Anthropic CEO Dario Amodei raised alarm on the human risks in his early 2026 essay on the imminent dangers of AI, flagging that "there is a serious risk of a major attack ... with casualties potentially in the millions or more."
Zoom in: Much of Anthropic's confidence rests on continuity, with Opus 4.6 having similar training and behavior to prior models that have been widely deployed without signs of intentional misbehavior.
- But that continuity may not last forever, and certainly doesn't exist across competing AI firms.
- In an interview at Davos alongside Google DeepMind CEO Demis Hassabis, both executives signaled the need for less competition between AI companies so they can work together to prioritize safety.
The intrigue: Amodei himself has flagged that AI companies aren't incentivized to be honest about the risks of the technology, given how much money and power is at stake.
- There's an ongoing argument that AI leaders ratchet up fears of rogue AI to ensure the regulatory environment benefits them the most.
- Amodei was on Capitol Hill this week, urging lawmakers to restrict chip sales to China and discussing AI safety with Republican lawmakers on the Senate banking committee, along with Sen. Elizabeth Warren (D-Mass.).
The other side: The Future of Life Institute, backed by major AI safety groups, announced this week it is spending up to $8 million on a pro AI-regulation ad campaign across the country urging regulators to "protect what's human."
What we're watching: Anthropic warns that future capability jumps, new reasoning mechanisms or broader autonomous deployments could invalidate today's conclusions.
- AI models are getting faster and better, and are now capable enough to be used to iterate on themselves.
- That's a future that requires oversight and governance to avoid the downsides of AI, according to Anthropic.