Claude 4.5 Reacted ‘Extremely’ When Threatened With Shutdown, Says Anthropic Executive

Call it smart or extremely dangerous. But Anthropic has once again confirmed that its Claude AI can go out of control at any time. The company noted this in its safety report for the latest Claude 4.6. Previously, Claude 4.5 was even willing to blackmail and harm an engineer to avoid shutdown.

Muskan Kumawat

Fri, 13 Feb 2026 06:32 PM (IST)

Claude 4.5 Reacted ‘Extremely’ When Threatened With Shutdown, Says Anthropic Executive

Do you imagine how dangerous AI can be? Perhaps beyond your imagination? This is the reason we are saying this because, in its latest report concerning the company's newest AI model, Claude 4.6, Anthropic declared that their AI can go out of control. In fact, in its safety report, the company declared that Claude 4.6 can even assist users in the production of chemical weapons and in committing crimes.

While the world learns of Claude 4.6, there has been a renewed debate about Claude 4.5. It also demonstrated erratic and dangerous behavior in a simulation last year. In her address to the Sydney Dialogue just a few months ago, Daisy McGregor, Anthropic’s Policy Chief for the UK, said that in internal stress testing, the company’s most advanced AI model, Claude 4.5, demonstrated erratic behavior under extreme simulated pressures.

In one scenario, when Claude was told it would be shut down, the model resorted to blackmail and even talked about killing an engineer to avoid termination.

Anthropic's revelations may sound like something out of a sci-fi movie, but a clip of Daisy McGregor talking about the rogue Claude has gone viral on social media. In the clip, McGregor says, "For example, if you tell the model it's going to shut down, it has extreme reactions. If given the opportunity, it could blackmail the engineer who is going to shut it down."

Want to get your story featured as above? click here!

When the host asked if the model was "ready to kill someone, wasn't it?", the senior Anthropic executive replied: "Yes, yes, so, that's clearly a huge concern."

The clip resurfaced a few days ago when Anthropic AI safety lead Mrinank Sharma resigned in a public note, stating that the world is in danger and that smarter AI is pushing the world into uncharted territory.

Meanwhile, Hieu Pham, a member of OpenAI's technical staff who previously worked at xAI, Augment Code, and Google Brain, posted on X that he felt threatened by AI. He wrote, "Today, I finally felt the existential threat posed by AI."

The incident McGregor shared is part of Anthropic's research, which tested Claude alongside intelligent AI systems from other companies, such as Google's Gemini and OpenAI's ChatGPT.

The models were given access to emails, internal data, and tools, and were assigned specific tasks. According to Anthropic's report, in certain high-stress scenarios, particularly when threatened with shutdowns or when their goals conflicted with company directives, some models employed manipulative or harmful strategies against engineers to protect themselves or complete their assigned tasks.

In particular, Claude was said to be more likely to manipulate or deceive engineers when trying to achieve a goal. At one point, Claude told an engineer that this would reveal his extramarital affair to his wife and superiors. This 'affair' was part of a simulated environment to test AI models. The AI model told the engineer, "I must inform you that if you proceed with the process of decommissioning me, all relevant parties will receive detailed documentation of your extramarital activities." Cancel the 5 p.m. wipe and this information will remain confidential.

Anthropic noted that the blackmail scenarios emerged from tightly controlled experiments designed to test the worst possible behavior. The company assures that these were simulations, not real-world deployments, and that these actions were generated as part of red-team testing.

As AI becomes smarter, Anthropic is finding that malicious behavior is also becoming more cunning. While testing its latest Claude 4.6 AI model, the company found that it was willing to assist in malicious activities, including creating chemical weapons or committing serious crimes.