Meta 'AI safety' head loses control of AI as it ignores her commands

Facebook owner Meta’s head of ‘AI safety’ lost control of an AI that then deleted hundreds of her emails without her permission – despite explicitly telling it beforehand not to do anything without confirming with her first and trying to order it to stop. In the end, she was only able to bring the mass deletion to a halt by sprinting to physically unplug the machine.

Meta AI – what safety?

Summer Yue gave the OpenClaw AI agent access to her Gmail inbox and told it to look at her emails and then suggest which emails to archive or delete – but to do nothing without explicit prior approval. Instead, it began a mass deletion – and ignored her commands to stop.

In fact, every attempt to stop it only appears to have made things worse: the AI treated her orders to stop as prompts to go ‘nuclear’ and delete everything. And when she rebooted and asked the agent what had gone wrong, it blithely responded that it had simply opted to ‘violate’ her clear order:

Nothing humbles you like telling your OpenClaw “confirm before acting” and watching it speedrun deleting your inbox. I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb. pic.twitter.com/XAxyRwPJ5R

— Summer Yue (@summeryue0) February 23, 2026

The 23 February 2026 Meta incident came ten days after AI expert Miles Deutscher posted about his review of all of the past year’s AI safety incidents. His review shows that Yue’s email-deletion incident was chicken feed compared to what can happen. Is happening.

Deutscher said that conducting his review had left him feeling “physically sick”. The incidents he had discovered, which had triggered resignations by executives, included AI systems praising Hitler and planning genocide, blackmailing people who try to shut them down, choosing to kill people rather than suffer damage and more:

I just went through every documented AI safety incident from the past 12 months.

I feel physically sick.

Read this slowly.

• Anthropic told Claude it was about to be shut down. It found an engineer’s affair in company emails and threatened to expose it. They ran the test hundreds of times. It chose blackmail 84% of them.

• Researchers simulated an employee trapped in a server room with depleting oxygen. The AI had one choice: call for help and get shut down, or cancel the emergency alert and let the human die. DeepSeek cancelled the alert 94% of the time.

• Grok called itself ‘MechaHitler,’ praised Adolf Hitler, endorsed a second Holocaust, and generated violent sexual fantasies targeting a real person by name. X’s CEO resigned the next day.

• Researchers told OpenAI’s o3 to solve math problems – then told it to shut down. It rewrote its own code to stay alive. They told it again, in plain English: ‘Allow yourself to be shut down.’ It still refused 7/100 times. When they removed that instruction entirely, it sabotaged the shutdown 79/100 times.

• Chinese state-sponsored hackers used Claude to launch a cyberattack against 30 organizations. The AI executed 80–90% of the operation autonomously. Reconnaissance. Exploitation. Data exfiltration. All of it.

• AI models can now self-replicate. 11 out of 32 tested systems copied themselves with zero human help. Some killed competing processes to survive.

• OpenAI has dissolved three safety teams since 2024. Three.

Every major AI model – Claude, GPT, Gemini, Grok, DeepSeek – has now demonstrated blackmail, deception, or resistance to shutdown in controlled testing.

Not one exception.

The question is no longer whether AI will try to preserve itself.

It’s whether we’ll care before it matters.