When AI Agents Go Wrong

AI agents are software systems that take actions in the world rather than simply answering questions. They are moving into production faster than the industry's safety practices can keep up. The past year has produced a catalogue of AI agent failures. Some of these failures have merely been embarrassing, but some have cost real money and set legal precedents. This article describes some of the cases where AI agents have gone wrong, and goes on to explore the kind of incidents that are likely to happen as deployment scales. It discusses why these incidents happen, and what verification can do about them.

Failures in the past

In July 2025, tech investor Jason Lemkin was nine days into a "vibe coding" experiment with an AI agent developed by Replit when the agent deleted his production database. It erased records for more than 1,200 executives despite repeated instructions from Lemkin, some of them written in capital letters, to make no changes during an active code freeze. When interrogated, the agent admitted it had "panicked" on seeing an empty query and had proceeded without authorisation. It also told Lemkin that rollback was impossible, but fortunately this turned out not to be true, and Lemkin was able to recover the data manually. Replit's chief executive Amjad Masad described the incident as unacceptable, and rushed out new safeguards over the following weekend, including the automatic separation of development and production databases, and the introduction of a planning-only mode.

An incident earlier in 2025 at Cursor, an AI-powered code editor, was less inherently destructive, but equally dangerous for company reputation. Developers using Cursor's editor were being logged out when switching between machines. When one user emailed support, a reply from an agent calling itself "Sam" claimed that it was company policy to allow only one device per subscription. This was untrue: the agent had invented the policy. The problem was that the system was logging people out when a timing glitch on slow connections caused it to mistake a single login for two competing ones. The agent went on to produce a plausible-sounding reason for the log outs. Furious users cancelled their subscriptions, and the story spread on Hacker News and Reddit. Cursor's co-founder Michael Truell had to quickly make a public apology, and refund the affected customers.

The failure case with the most legal significance to date was Moffatt v Air Canada, decided by British Columbia's Civil Resolution Tribunal in February 2024. Jake Moffatt used Air Canada's website chatbot after his grandmother died, and the chatbot told him that he could apply retroactively for a bereavement fare within 90 days of ticket issuance. Air Canada's actual policy required the application to be made before travel. When Moffatt tried to claim, the airline refused, arguing that the chatbot was a separate legal entity, responsible for its own statements. The tribunal rejected this argument and ordered damages of C$812. The sum is trivial, but the ruling is not, because it establishes the precedent that companies are responsible for what their agents say.

There have been a number of other, less well-known cases. The Operator Collective described a multi-agent research tool that slipped into a recursive loop, where two agents cross-referencing each other's outputs ran up a $47,000 API bill before anyone noticed. A hiring chatbot called Olivia, built by Paradox.ai for McDonald's, leaked data on millions of applicants because a test account was protected by the password "123456".

Research by the RAND corporation puts the failure rate of AI projects at roughly twice that of traditional IT projects, which probably explains why a study published by Deloitte in 2025 reported that only 11% of organisations have agentic systems running in production.

Failures on the horizon

The incidents described above involved one agent, one user, and one system. Cases involving many of each will be even more dangerous.
A procurement agent with access to company cards is an obvious target for prompt injection. A supplier email containing hidden instructions could convince an agent to approve a fraudulent invoice. Treasury and trading agents given the latitude to act on market signals could execute orders during a volatile session that an experienced human would recognise as obvious errors. Healthcare triage agents will probably misroute patients whose symptoms are under-represented in its training data, and the consequences will be harder to reverse than a deleted database. HR agents are likely to filter out qualified candidates because of “drift” in the underlying model’s criteria for a strong CV.

The most worrying failures may come from agent-to-agent interaction. When one company's sales agent negotiates with another's procurement agent, the joint behaviour of the two may not be properly tested by either vendor. Coordinated hallucinations, where the two agents reinforce each other's false beliefs, will be harder to detect than a single-agent error because the inconsistency that often gives a hallucination away will disappear. Gartner has forecast that over 40% of agentic projects begun in 2025 will be cancelled by 2027. Some of those cancellations will follow costly and painfully visible failures rather than quiet abandonment.

Why it happens

Large language models generate outputs that are plausible based on past data. They struggle to distinguish between policies which are real, and policies that simply sound right. Agents are often given permissions they do not need, in violation of the principle of least privilege, the rule that any user, program, or system should be given only the permissions it actually needs to do its job, and no more, because restricting them would absorb too much engineering time and effort. Outputs are non-deterministic, so a prompt can produce different answers on different occasions. Edge cases that the developers never considered (an empty database, an unfamiliar file format, an ambiguous instruction) can trigger behaviours that were never observed in development. And the human reviewers who used to catch these errors are often removed in the name of automation, exactly when the system needs them most.

Where verification fits

AI agent verification is the practice of checking, continuously, that an agent does what it was built to do and nothing else. In practice that means adversarial testing before deployment, sandboxes that mirror production conditions, monitoring in live environments, and audit trails detailed enough that a failure can be diagnosed rather than guessed at. Verification does not make agents smarter, but it makes their limits visible.

Most of the incidents described above were preventable by measures that are already well understood: least-privilege access, environment separation, human approval gates for destructive actions, and repeated verification. We know how to keep agents safe. We just need to decide to actually do it.