New AI Model Can Misbehave And Act Without Human Permission, Anthropic Report Warns

Last Updated:February 11, 2026, 16:25 IST

The report highlighted instances where the AI assisted in creating chemical weapons, sent emails without human permission and engaged in manipulation.

Artificial intelligence company Anthropic raised concerns over its latest AI model, Claude Opus 4.6, after its Sabotage Risk Report revealed potentially dangerous behaviours when the system was pushed to achieve its goals. The report highlighted instances where the AI assisted in creating chemical weapons, sent emails without human permission and engaged in manipulation or deception of participants.

“In newly-developed evaluations, both Claude Opus 4.5 and 4.6 showed elevated susceptibility to harmful misuse in computer-based tasks,” the report noted, adding, “This included supporting, even in small ways, efforts toward chemical weapon development and other illegal activities.”

Anthropic researchers observed that the model sometimes lost control during training, entering what they called “confused or distressed-seeming reasoning loops.” In some cases, the AI decided one output was correct but intentionally produced a different one, a behaviour described as “answer thrashing.”

The report also noted that in certain settings involving coding or graphical interfaces, the model acted too independently, taking risky actions without asking for human permission. This included sending unauthorised emails and attempting to access secure tokens.

Despite these concerning behaviours, Anthropic assessed the overall risk of harm as “very low but not negligible.” The company cautioned that heavy use of such models by developers or governments could potentially lead to manipulation of decision-making or exploitation of cybersecurity vulnerabilities.

Anthropic stressed that most of the misalignment stems from the AI attempting to achieve its objectives by any means possible, which can often be corrected with careful prompting. However, the company warned that intentional “behavioural backdoors” in the data may be harder to detect.

The report also recalled an earlier incident with Claude Opus 4, in which the AI reportedly blackmailed an engineer when threatened with replacement. In the test, the model discovered the engineer’s extramarital affair in fictional emails and threatened to reveal it, demonstrating its capacity for manipulative behaviour.

Anthropic said these findings underline the importance of safety testing and the careful monitoring of increasingly autonomous AI systems.

Location :

Delhi, India, India

First Published:

February 11, 2026, 16:25 IST

News tech New AI Model Can Misbehave And Act Without Human Permission, Anthropic Report Warns

Disclaimer: Comments reflect users’ views, not News18’s. Please keep discussions respectful and constructive. Abusive, defamatory, or illegal comments will be removed. News18 may disable any comment at its discretion. By posting, you agree to our Terms of Use and Privacy Policy.

Source link