Conscium Blog

Conscium founders on London Futurists Podcast

ed@wordpress.prod.neo.conscium.ai (Ed Charvet) — Tue, 12 May 2026 23:00:00 GMT

Conscium is easing out of stealth mode, and two of its founders were interviewed by a third on episode 87 of the London Futurists Podcast. Daniel and Ted discuss with Calum how Conscium is exploring ways to detect consciousness in machines, and to determine whether it would be a good thing.

You can listen here on Apple Podcasts, or wherever you get your podcasts.

Why Your AI Agents Need Agent Verification

Verifyax — Mon, 11 May 2026 23:00:00 GMT

Across sectors, teams are now wiring AI systems directly into the machinery of their businesses. Agents are being given logins, API keys, budgets and access to sensitive data. They are monitoring markets, handling customer interactions, moving money, reconfiguring supply chains and steering campaigns, often with only light-touch human oversight.

These systems are not just chat interfaces that draft emails or answer questions. Very much like a ‘digital employee’, an AI agent is given a goal, access to tools and data, and authorised to act. A customer-support agent might read a ticket, decide what to do, and then actually issue the refund or rebook the flight. A marketing agent might adjust bids and budgets across channels. A finance agent might reconcile transactions and escalate anomalies. In each case, the agent is not simply advising a human operator; it is participating directly in operations.

Once you allow ‘digital employees’ to take consequential actions, a different question moves to the centre of the architecture:

How do you know your agents will behave as intended, and stay within reasonable bounds, once they are operating in the real world?

That is the core concern of agent verification.

The label “agent verification” is still nascent - but the underlying discipline of verification is anything but new: for decades, hardware and software teams have relied on verification (testing, simulation, and formal methods) to build evidence that complex systems behave correctly, safely and predictably. Agent verification extends that mindset from components and models to autonomous, tool‑using behaviour in context.

What we mean by an AI agent

“Agent” is an overloaded word, so it is worth being clear about how I am using it here.

In this context, an AI agent is a software system that:

pursues one or more goals over time,
observes an environment (data, tools, users, other agents),
performs actions based on those observations, and
adapts its behaviour as it receives feedback.

Under the hood, an agent might use large language models, planning components, memories, and tool-calling frameworks. What matters for our purposes is its role: it is there to get things done with a degree of autonomy.

Once an agent is embedded in a workflow, the key questions about it are behavioural. Given the tools and permissions it has, what does it actually do? Where does it tend to fail, and in what ways? How does it behave when things get messy - when data is missing, users are awkward, or systems around it misbehave? Those questions are about the agent as a whole, not just about any one model or API inside it.

Why traditional testing isn’t enough

We already have mature ways of testing software and evaluating models.

However, none of these, on their own, adequately address the crucial question for autonomous agents with real authority: will this agent reliably do what we intend, and only what we intend, in its specific operational environment?

The shortcomings become apparent in several key areas. There's a mismatch between local metrics and overall intent; an agent optimised for short handling times, for example, might inadvertently rush complex customer cases, improving its dashboard numbers but deviating from broader organisational values. Serious issues also arise from the agent's connections. Agents are typically plugged into various systems, and while each component can be tested in isolation, the complex, emergent behavior of an agent choosing how to use these tools for its goals falls outside traditional testing scopes. Furthermore, agents rarely operate alone. In networked environments, a slight misalignment in one agent or a permissive tool can ripple unpredictably through a system of interacting agents.

A significant challenge is 'drift,' occurring because the environment around the agent changes constantly – models are updated, data distributions shift, and third-party APIs evolve. Concurrently, many modern agents are designed to change themselves, accumulating memories, learning from feedback, and adapting their decision-making logic over time. The agent verified at launch is therefore not the same agent operating months later.

Overlaying these technical challenges is a shift in regulatory expectations. Boards, regulators, and frameworks like the EU AI Act now demand continuous risk management and monitoring across the entire lifecycle of higher-risk systems, moving beyond one-off checks before deployment. Given these complexities, agent verification must emerge as its own distinct discipline to ensure trustworthy and aligned AI operations.

Software engineering brings unit tests, integration tests, acceptance tests and performance tests. Machine learning adds held-out test sets, benchmarks, robustness checks and fairness analyses. Security teams run penetration tests and red-teaming exercises.
All of that remains necessary. But none of it, on its own, answers the specific question that arises with agents that have real authority:

Will this particular agent reliably do what we intend, and only what we intend, in the environment we are about to place it in?

When we hire employees we test their capability by seeing how they perform in relevant simulated scenarios, and/or trust they are capable through certificates, accreditations and references from previous employees. Once hired, we incentivise our employees to achieve objectives and goals (sometimes inadvertently creating emergent toxic behaviours), measure their performance, and provide guidance and feedback for improvement.

When it comes to agents, the gaps show up in several predictable ways.

One is the mismatch between local metrics and overall intent. A customer-service agent optimised for short handling times may learn to rush complex cases or nudge difficult customers towards self-service options because that improves its numbers. The dashboard looks better, but the behaviour drifts away from your values and obligations.

Another is that many serious issues live in the connections. An agent is usually plugged into CRMs, payment systems, knowledge bases, internal APIs, and sometimes physical devices. Each component can be tested in isolation, yet the emergent behaviour of the agent - how it chooses to use those tools in pursuit of its goals - falls between traditional software and model tests.

Agents also rarely operate alone. In non-trivial organisations you quickly accumulate many agents with different goals and permissions, interacting with one another and with human teams. A small misalignment in one place - a heuristic that favours short-term revenue too strongly, a slightly over-permissive tool - can ripple through this network in ways that are hard to anticipate from static tests.

Then there is drift, and here two separate drivers matter.

First, the environment around the agent changes constantly. Models are updated; prompts and system messages are adjusted; data distributions shift; business rules evolve; third-party APIs remove or add features. Even if you never consciously change the agent, it is being exposed to a moving world. Behaviour that was acceptable under one set of external conditions can become risky under another.

Second, many modern agents are designed to change themselves. They accumulate long-term memories, update internal knowledge bases, learn from user feedback, and in some cases fine-tune models or adjust policies based on outcomes. Their effective decision-making logic is not fixed; it evolves with use. The agent you verify at launch is not quite the same agent you are running six months later.

Agent verification has to account for both sources of non-stationarity: an environment that is shifting around the agent, and agents that are adapting in response.

Overlaying all of this is a shift in expectations from boards, regulators and the public. Emerging AI regulations and frameworks, such as the EU AI Act and the NIST AI Risk Management Framework, emphasise continuous risk management and monitoring across the lifecycle of higher-risk systems, not just one-off testing before deployment.

Ad hoc prompt checks and a single benchmark score are not enough to satisfy that standard when agents are embedded deep in business processes.

This is the context in which agent verification needs to exist as its own discipline.

What is agent verification?

By agent verification I mean a disciplined way of answering three questions about an AI agent:

Does it do what it is supposed to do?
Does it only do what it is supposed to do?
Does it continue to behave that way as both it and its environment change?

It draws on ideas from software testing, safety engineering and security assurance, but its focus is behaviour in context.

You are not just checking that a model produces the right label or that an API enforces authentication. You are checking that an agent, endowed with goals and tools, behaves acceptably across the kinds of situations it will actually face.

In practical terms, agent verification has both a functional and a non‑functional dimension.  Functional verification asks whether the agent completes the job it was designed for in realistic scenarios (e.g. resolves the ticket, applies the right policy, makes the right tool calls), rather than merely producing plausible outputs.  Non‑functional verification asks whether it does that job within the constraints that make it safe and usable in production - robustness under messy or adversarial inputs, security and abuse resistance, latency and cost, reliability under load, fairness, explainability/auditability, and and how smoothly it operates within its surrounding ecosystem of tools, policies and interfaces (a dimension strongly shaped by agent experience (AX) design).

Agent verification treats the agent as something whose behaviour can and should be characterised, rather than as a black box that we hope will behave well because the underlying model scored highly on a benchmark.

Why agent verification matters

There are several practical reasons to treat this as a first-class discipline rather than a loose collection of checks.

The first is scale and criticality. When an organisation is running a handful of low-stakes pilots, informal oversight may be enough. When you have many agents operating across finance, operations, marketing and HR, touching real customers and real money, that approach stops being credible. A single mis-specified or drifting agent can act as a force multiplier for error.

The second is the expanded security and abuse surface. Agents introduce new ways for things to go wrong. Prompt injection and related attacks can smuggle instructions into the content agents consume, whether via documents, web pages, logs or tool outputs. Indirect and second-order prompt injection can cause low-privilege agents to recruit higher-privilege agents into doing something sensitive or harmful. Recent real-world incidents in development tools and enterprise platforms show that these are not theoretical concerns. Traditional security testing is not designed to discover or characterise these behavioural vulnerabilities.

The third is governance. Boards and regulators need to know who is accountable for each material agent and what evidence exists that it is under control. “We tested the model on a benchmark” is not a sufficient answer when a system is autonomously changing prices, making credit decisions or handling sensitive personal data. Agent verification provides a shared language and a set of artefacts - test scenarios, behavioural profiles, reports and certificates - that can be examined, challenged and improved.

The fourth is adoption. Most leadership hesitation is not about whether agents could improve efficiency or service quality. It is about trust and liability. Without a convincing approach to verification, organisations either constrain agents to trivial tasks or build them into production systems while quietly hoping that nothing serious will go wrong. With verification, they can make explicit decisions about where to use agents, how much autonomy to grant them, which guardrails to apply, and what fallback mechanisms to put in place.

One more factor is agent experience (AX) - the discipline of designing products, tools and environments so agents can use them reliably. If UX is about how a human experiences a system, AX is about the affordances an agent needs: clear tool contracts and schemas, machine-readable permissions and policies, predictable error handling, observability, and safe fallbacks. Even a technically 'correct' agent can fail in practice if the surrounding system is hostile to agents - brittle APIs, ambiguous instructions, missing context, or poor telemetry. Good AX does not replace verification, but it makes verification more meaningful by reducing avoidable friction and failure modes.

How functional agent verification works in practice

In a real organisation, agent verification starts with a straightforward but often neglected step: being explicit about what the agent is for.

You define, in plain language, the mission of the agent, the outcomes that count as success, the behaviours that are unacceptable even if some metric improves, and the hard limits that must not be crossed. A marketing agent, for instance, might be expected to improve return on ad spend while respecting budget limits, brand guidelines and jurisdictional restrictions. A collections agent might be tasked with reducing late payments while complying with detailed conduct rules for vulnerable customers. These statements form the behavioural contract you want to verify.

You then exercise the agent through scenarios that resemble the world it will inhabit. Instead of a grab-bag of prompts, you place it in situations that unfold over time: routine cases, edge cases, conflicting goals, ambiguous instructions, corrupted or missing data, uncooperative users, and adversarial attempts to manipulate it. You observe what it actually does - not just once, but across many runs with variations in context.

At Conscium, we have developed an agent verification platform - VerifyAX - that places agents into simulated environments that mimic their intended deployments, with realistic systems, user behaviour and, where useful, other agents acting as non-player characters. The goal is not to enumerate every possible path through the state space - that would be impossible - but to map out three regions: where behaviour is robust, where it is brittle, and where it is clearly unacceptable.

VerifyAX’s simulation‑based approach can be structured into five increasingly demanding verification levels - from basic knowledge checks through to multi‑agent, real‑world‑like interaction, and (at the far end) more exploratory work on internal states and consciousness.

Level 1 – Knowledge verification (e.g. Q&A testing)
Level 2 – Skilled tool/data use across workflows
Level 3 – Complex Multi‑Agent Interactions in Real‑World‑Like Situations
Level 4 – Plasticity and robustness of internal states
Level 5 – Consciousness

Example: imagine a customer‑support agent authorised to handle a delayed‑flight claim. At Level 1 you probe whether it knows the policy and can answer questions about eligibility. At Level 2 you verify that it can safely retrieve the right booking record and call the right refund/rebooking tools. At Level 3 you test it in a messy, realistic environment - an upset customer, incomplete data, and another agent (e.g. fraud or escalation) asking questions and negotiating next steps. At Level 4 you look for risky internal dynamics such as goal‑misgeneralisation, unhelpful hidden heuristics, or memory effects that change decisions over time. Level 5 is intentionally the most speculative: it provides a framework for future, higher‑order questions about consciousness claims, should they ever become relevant for deployed systems.

During these simulations, data are collected to evaluate metrics such as the agent’s consistency, latency, correctness and completeness, its ability to engage with another agent and comprehend the questions that agent asks, its tendency to hallucinate, and the ethical nature of its responses.

Geeky aside: simulation vs formal verification

Simulation‑based verification runs many representative scenarios and measures behaviour statistically; it’s well‑suited to messy, tool‑rich environments but can still miss rare corner cases. Formal verification methods (e.g. model checking) try to prove that specific properties hold for all possible executions of an abstracted system. In practice, the two approaches complement each other: formal methods for crisp invariants, simulation for real‑world interaction patterns.

The output of that process is more than a pass/fail label. You build a profile of the agent: typical task success rates, common error modes, the frequency and severity of policy breaches, signs of unfairness or bias, the clarity and usefulness of its rationales, and how it recovers from mistakes. You can see how that profile changes as you modify models, prompts, tools or policies. Product, risk and compliance teams can then have concrete discussions based on observed behaviour rather than assumptions.

A critical detail is that verification applies to a specific version of the agent. When a system like VerifyAX, Conscium’s verification platform, certifies an agent, it is certifying a particular configuration: the model and its version, the prompts and system messages, the tools and APIs it can call, the policies and guardrails it is subject to, and the data sources and environment assumptions. If any of these change - if the underlying model is swapped, a new tool is added, or the agent is given access to a different class of customers - that is, from a verification perspective, a different agent.

Because some agents are also designed to adapt over time through learning, memory and feedback, verification cannot be a one-off gate. It needs to become an ongoing activity: initial verification at deployment; continuous monitoring; and targeted re-verification when behaviour or configuration changes enough to matter. This aligns with the lifecycle-based approach to AI risk management emerging in regulation and standards.

To be effective, this work has to be integrated into existing development and operations processes. During design and build, verification findings help shape the architecture and guardrails. At deployment, verification becomes part of the release checklist, alongside security and compliance sign-offs. In production, it connects to monitoring and alerting, so that deviations from the verified behavioural profile trigger investigation, human review or rollback.

Throughout, humans remain firmly in the loop. Each material agent should have a named, human owner. Because ultimately, There should be clear rules about when human approval is required for its actions, clear escalation paths when behaviour looks suspicious, and regular reviews of verification results by technical, legal, risk and operational stakeholders. Verification does not dilute accountability; it supports it.

Where agent verification sits in the stack

It is helpful to situate agent verification alongside neighbouring practices.

Model evaluation asks whether a particular model performs well on a task under benchmark conditions. Safety testing asks whether outputs remain within acceptable ethical and legal bounds under various prompts. Security testing asks whether systems and data are protected against attack and misuse.

Agent verification connects these concerns. It asks: given its goals, tools, permissions, environment and capacity for adaptation, does this agent behave in a way that aligns with our intent and constraints - and can we demonstrate that to ourselves and to others, over time?

An analogy with application security is useful. A penetration test does not guarantee that software is perfect; it reveals realistic ways it might be abused and highlights the consequences. Agent verification plays a similar role for behaviour. It does not promise perfection, but it brings potential failure modes into the open and provides a structured way to mitigate them.

As agentic AI becomes more common, it is reasonable to expect “agent verification” to appear as a standard category in architecture diagrams, risk registers and procurement criteria, alongside security testing and data governance.

Getting started with agent verification

For organisations beginning to deploy agents, this does not have to start as a grand initiative.

A practical first step is simply to make your agents visible. List them. For each one, note what it does, which systems it can touch, what data it uses, who relies on its outputs, and who is currently accountable - formally or informally - for its behaviour.

Next, order them by potential impact and sensitivity. Agents that can move money, handle personal data, affect vulnerable individuals or make public-facing decisions should be at the front of the queue for verification.

For each of those critical agents, define a minimal behavioural contract: what “good” looks like, what is clearly unacceptable, and what hard limits must never be crossed. Use that contract to design a small, realistic set of test scenarios and start observing how the agent behaves. Instrument it so that decisions and rationales are logged in a way you can inspect. Decide what kinds of drift - either in the environment or in the agent’s own learned behaviour - should trigger human review, rollback or re-verification.

Whether you develop these capabilities internally or work with external platforms such as VerifyAX, the goal is the same: move from blind trust or vague reassurance to evidence-based confidence in how your agents behave, both at launch and as they and their surroundings evolve.

A practical conclusion

A growing number of AI agents already act on our behalf in ways that are not always visible. The technical ability to build those agents is advancing quickly. Our ability to understand, verify and govern their behaviour is still catching up.

Agent verification is about closing that gap. It does not eliminate risk, and it should not be presented as a guarantee of perfection. Instead, it provides a structured way to answer a simple but crucial question: given what this agent can do, and how it can change, are we comfortable with the way it behaves?

For serious deployments, having a good answer to that question will not be optional. If you intend to grant systems real autonomy in important domains, you need a way to know beyond assertion - what they are doing, how they change over time, and where their limits lie.

When AI Agents Go Wrong

Calum Chace — Mon, 11 May 2026 23:00:00 GMT

AI agents are software systems that take actions in the world rather than simply answering questions. They are moving into production faster than the industry's safety practices can keep up. The past year has produced a catalogue of AI agent failures. Some of these failures have merely been embarrassing, but some have cost real money and set legal precedents. This article describes some of the cases where AI agents have gone wrong, and goes on to explore the kind of incidents that are likely to happen as deployment scales. It discusses why these incidents happen, and what verification can do about them.

Failures in the past

In July 2025, tech investor Jason Lemkin was nine days into a "vibe coding" experiment with an AI agent developed by Replit when the agent deleted his production database. It erased records for more than 1,200 executives despite repeated instructions from Lemkin, some of them written in capital letters, to make no changes during an active code freeze. When interrogated, the agent admitted it had "panicked" on seeing an empty query and had proceeded without authorisation. It also told Lemkin that rollback was impossible, but fortunately this turned out not to be true, and Lemkin was able to recover the data manually. Replit's chief executive Amjad Masad described the incident as unacceptable, and rushed out new safeguards over the following weekend, including the automatic separation of development and production databases, and the introduction of a planning-only mode.

An incident earlier in 2025 at Cursor, an AI-powered code editor, was less inherently destructive, but equally dangerous for company reputation. Developers using Cursor's editor were being logged out when switching between machines. When one user emailed support, a reply from an agent calling itself "Sam" claimed that it was company policy to allow only one device per subscription. This was untrue: the agent had invented the policy. The problem was that the system was logging people out when a timing glitch on slow connections caused it to mistake a single login for two competing ones. The agent went on to produce a plausible-sounding reason for the log outs. Furious users cancelled their subscriptions, and the story spread on Hacker News and Reddit. Cursor's co-founder Michael Truell had to quickly make a public apology, and refund the affected customers.

The failure case with the most legal significance to date was Moffatt v Air Canada, decided by British Columbia's Civil Resolution Tribunal in February 2024. Jake Moffatt used Air Canada's website chatbot after his grandmother died, and the chatbot told him that he could apply retroactively for a bereavement fare within 90 days of ticket issuance. Air Canada's actual policy required the application to be made before travel. When Moffatt tried to claim, the airline refused, arguing that the chatbot was a separate legal entity, responsible for its own statements. The tribunal rejected this argument and ordered damages of C$812. The sum is trivial, but the ruling is not, because it establishes the precedent that companies are responsible for what their agents say.

There have been a number of other, less well-known cases. The Operator Collective described a multi-agent research tool that slipped into a recursive loop, where two agents cross-referencing each other's outputs ran up a $47,000 API bill before anyone noticed. A hiring chatbot called Olivia, built by Paradox.ai for McDonald's, leaked data on millions of applicants because a test account was protected by the password "123456".

Research by the RAND corporation puts the failure rate of AI projects at roughly twice that of traditional IT projects, which probably explains why a study published by Deloitte in 2025 reported that only 11% of organisations have agentic systems running in production.

Failures on the horizon

The incidents described above involved one agent, one user, and one system. Cases involving many of each will be even more dangerous.
A procurement agent with access to company cards is an obvious target for prompt injection. A supplier email containing hidden instructions could convince an agent to approve a fraudulent invoice. Treasury and trading agents given the latitude to act on market signals could execute orders during a volatile session that an experienced human would recognise as obvious errors. Healthcare triage agents will probably misroute patients whose symptoms are under-represented in its training data, and the consequences will be harder to reverse than a deleted database. HR agents are likely to filter out qualified candidates because of “drift” in the underlying model’s criteria for a strong CV.

The most worrying failures may come from agent-to-agent interaction. When one company's sales agent negotiates with another's procurement agent, the joint behaviour of the two may not be properly tested by either vendor. Coordinated hallucinations, where the two agents reinforce each other's false beliefs, will be harder to detect than a single-agent error because the inconsistency that often gives a hallucination away will disappear. Gartner has forecast that over 40% of agentic projects begun in 2025 will be cancelled by 2027. Some of those cancellations will follow costly and painfully visible failures rather than quiet abandonment.

Why it happens

Large language models generate outputs that are plausible based on past data. They struggle to distinguish between policies which are real, and policies that simply sound right. Agents are often given permissions they do not need, in violation of the principle of least privilege, the rule that any user, program, or system should be given only the permissions it actually needs to do its job, and no more, because restricting them would absorb too much engineering time and effort. Outputs are non-deterministic, so a prompt can produce different answers on different occasions. Edge cases that the developers never considered (an empty database, an unfamiliar file format, an ambiguous instruction) can trigger behaviours that were never observed in development. And the human reviewers who used to catch these errors are often removed in the name of automation, exactly when the system needs them most.

Where verification fits

AI agent verification is the practice of checking, continuously, that an agent does what it was built to do and nothing else. In practice that means adversarial testing before deployment, sandboxes that mirror production conditions, monitoring in live environments, and audit trails detailed enough that a failure can be diagnosed rather than guessed at. Verification does not make agents smarter, but it makes their limits visible.

Most of the incidents described above were preventable by measures that are already well understood: least-privilege access, environment separation, human approval gates for destructive actions, and repeated verification. We know how to keep agents safe. We just need to decide to actually do it.

Daniel on the Ignite podcast

ed@wordpress.prod.neo.conscium.ai (Ed Charvet) — Fri, 08 May 2026 23:00:00 GMT

https://www.youtube.com/watch?v=JeWBHTHsy0M

Daniel describes how he bootstrapped his AI consultancy Satalia from scratch, why he believes consciousness (not just intelligence) is the next frontier in computing, and how marketing is becoming the proving ground for AI’s real-world impact.

Why AI agents need to re-earn the license to operate

Sarah Jahangir — Thu, 07 May 2026 23:00:00 GMT

A feature by Conscium on AI agent verification in Startups Magazine, 8 May 2026

In 2009, Air France Flight 447 fell from the sky over the Atlantic. There were no faults with the Airbus A330, and the pilots were experienced. However, when the autopilot disconnected due to environmental interference, the crew were disoriented by unfamiliar warnings and conflicting instrument readings. The tragedy was the result of inadequate training and testing.

In fields where errors carry serious consequences, the expectation is not that people are certified once, but that their ability is continuously validated. Pilots sit recurrent simulator checks and surgeons undergo regular training to adapt to new techniques. As rules and environments change, competence must be continuously re-demonstrated against them.

AI agents are no different. Now deployed globally and tasked with ever more critical responsibilities, they need continuous testing. These are not static tools, but rather dynamic systems operating in evolving environments. It’s a profound mistake to treat a one-time evaluation as an enduring guarantee of behaviour.

Passing once does not imply future safety; It’s a temporary license to continue operating.

How well do you know your agent?

There are multiple reasons why AI agents need continuous testing. A good AI agent adapts to its environment, and that environment is constantly changing as a business evolves.

The large language models that underpin most AI agents are updated frequently. A model version that passed your safety evaluations in January may behave differently by March. The agent built on top of it inherits that change, regardless of whether its developers intended it.

Additionally, every new requirement forces recalibration. Over time, we patch code due to software rot, we change prompts to align with new objectives, and update data sets to enrich input quality. Each change introduces a new opportunity for corruption if the agent isn’t tested and corrected.

Model updates are only the beginning. Agents also drift in deployment without anyone changing a line of code. Each interaction an AI agent has, from queries processed to patterns learned, alters behaviour in small ways. What was a well-tuned system on day one can slowly develop blind spots and biases that nobody built in. This spells danger if left unnoticed, particularly in regulated industries.

Then there are the interactions themselves. Agents increasingly talk to other agents, use external tools, and work alongside humans in complex workflows. Each of these relationships is an opportunity for change. To regard an agent as immutable in an environment where they are constantly exposed to injected prompts and manipulated or false data is dangerous.

When context rewrites software

A company’s perception of having total control over an AI agent is often false. The license to operate is constantly being challenged by external factors, from nefarious actors to new regulations, which evolve independently of the company and its AI agent.

While an agent may be trained to withstand certain hacking methods, the hostile landscape around it is also subject to change. The techniques used to hack or lead agents astray are not static; they’re developed by motivated actors and iterated to cause the most harm.

Additionally, the pace of regulation in innovation and technology is unlike that of any other industry. Legislation is passed and amended across industries in which AI agents may operate, such as healthcare, finance, or law. An agent designed to operate within a particular legal and regulatory framework may find itself operating outside it or trapped within outdated rules – not because it changed, but because the framework did.

For example, an AI agent deployed to assist employees may initially be trained only on approved, non-sensitive materials. Over time, as it gains access to expanded knowledge bases, its permissions and data exposure grow. If the agent is not carefully updated with evolving data governance policies, it could expose confidential information to unauthorised users.

Verification: license to operate

Continuous AI agent testing, otherwise known as verification, should be an ongoing discipline that an agent must sustain to remain operational. Verification asks: does this system still align with its original objectives? Do its components still interact safely now?

Verifying that an agent behaves safely, reliably, and within bounds is becoming as critical as cybersecurity. When an agent fails verification, the response should be deliberate realignment or retirement. Permitting AI agents to operate indefinitely on the basis of a one-time evaluation would be irresponsible.

In the case of realignment, verification enables a level of testing that ensures AI agents are fit for purpose. By stress testing behaviour in simulations of real-world and high-stakes scenarios, the process evaluates whether an AI agent functions as designed and refrains from unintended or harmful actions.

Organisations that build and deploy AI agents have tended to treat safety evaluation as a milestone. Realistically, it is an operational discipline that must be sustained throughout the lifecycle of the agent.

The pilots of Flight 447 were not incompetent. They were inadequately prepared for a system that had changed around them. The lesson for AI agents is the same. A license to operate must be earned, and earned again.

There’s No Such Thing as AI Ethics

Daniel Hulme — Wed, 06 May 2026 23:00:00 GMT

Over the past few years, something curious has happened. A new professional class has emerged - the AI Ethicist. LinkedIn profiles have been updated, consultancies rebranded, and conference panels filled with people who, seemingly overnight, became experts in the ethical implications of artificial intelligence. The growth has been dramatic, and it deserves scrutiny.

Not because ethics don’t matter - they matter enormously. But because the term “AI Ethics” has become a catch-all that obscures important distinctions: between genuine philosophical questions, normative choices about fairness and justice, and what are, in many cases, engineering and safety problems. That conflation is doing real damage to all three.

What is ethics, actually?

Ethics, broadly, is the study of right and wrong - a discipline concerned with moral principles, human conduct, and the frameworks we use to evaluate action and its consequences. It’s a field with millennia of intellectual heritage, from Aristotle’s virtue ethics to Kant’s categorical imperative to the utilitarian tradition of Bentham and Mill, through to contemporary applied ethics in medicine, law, and business.

Different ethical traditions emphasise different things. Kantian ethics focuses on intent - why a moral agent chooses to act in a particular way. Consequentialism focuses on outcomes - the effects of actions, regardless of the actor’s motivation. Virtue ethics asks about the character of agents and institutions. These distinctions matter, because the “AI Ethics” narrative tends to collapse all of ethics into a single question - usually intent - and then declares the whole field irrelevant because AI systems don’t have any.

AI systems don’t have intent. They don’t choose, they optimise. This means that questions about the moral agency of AI systems are indeed misplaced - at least for now. But it does not follow that the problems AI creates are not ethical problems. The outcomes AI produces, the fairness of its distributions, and the systems of accountability surrounding its deployment all remain genuinely ethical questions. They are questions about human ethics, applied to a powerful new class of tools.

Bias is an engineering problem - but defining it is not

The most commonly cited example of an “AI ethics” issue is bias - a hiring algorithm that discriminates, a facial recognition system that performs poorly on certain demographics, a language model that produces stereotyped outputs. These are serious problems. And the detection and mitigation of bias is indeed an engineering and safety problem. The algorithm didn’t intend to discriminate. It found statistical patterns in data that reflected historical biases, and it optimised accordingly. Better data, better testing, and better engineering are essential parts of the fix.

But engineering alone cannot tell you what counts as unacceptable bias, or which fairness metric to use. Research in algorithmic fairness has demonstrated that common definitions of fairness - such as equalised odds, demographic parity, and calibration - are mathematically incompatible in most real-world settings. Choosing between them is an irreducibly normative decision. It requires reasoning about justice, values, and trade-offs that no amount of code review will resolve.

We already have well-established governance structures - regulatory compliance, risk management, audit functions - that exist to evaluate the decisions humans make. You don’t need to boot up a whole new ethics committee to address every AI challenge. But you do need your existing governance structures to be asking the right normative questions, not just the right engineering questions. And in many cases, those structures need significant adaptation to cope with the speed, opacity, and scale of AI-driven decisions.

The trolley problem is misunderstood

People love to discuss the trolley problem. Should you throw a switch to divert a runaway train from the track where it will kill five people, to one where it will kill just one person? Or, replacing the switch with a large man on a bridge, should you throw that man onto the track to save the five?

People also love to invoke the trolley problem when discussing AI ethics. Should the autonomous vehicle swerve if doing so would save five children but kill one elderly pedestrian? Should the algorithm prioritise one patient over another? But this framing misses the actual insight of the trolley problem.

The philosophical depth of the trolley problem isn’t really about whether you should pull the lever - most people agree you should divert the trolley to save more lives. The real nuance is why people who would happily pull a lever refuse to push a person off a bridge, despite both scenarios producing identical outcomes. It reveals something about human moral psychology - about the role of physical agency, emotional proximity, and yes, intent in ethical reasoning. It’s a problem about the human mind, not about the machine.

That said, the trolley problem has found one genuinely useful application in AI contexts - not as a design tool, but as a way of studying how people want machines to behave. The MIT Moral Machine project used trolley-style dilemmas to map cross-cultural variation in moral intuitions about autonomous vehicles. This doesn’t resolve the engineering question, but it does illuminate the normative landscape that engineers are operating in.

The ride-hailing algorithm

Consider a more grounded example. A ride-hailing company deploys an AI pricing algorithm. The system discovers a correlation: people with low phone battery are more likely to accept higher prices. The immediate narrative writes itself - “the algorithm is exploiting a human vulnerability.” But let’s be precise. The algorithm hasn’t exploited anyone. It has no concept of exploitation. It found a statistical correlation and optimised for it.

The real questions are: first, can we actually see what the algorithm is doing? This is an engineering challenge - building explainable, auditable systems that surface these kinds of correlations. And second, once we see it, what do we choose to do about it? Perhaps we remove battery data from the model’s inputs. Or perhaps we do something more interesting - use the insight to prioritise rides for people with low batteries, turning a potential vulnerability into a better customer experience. Both are legitimate choices, but they’re made by humans with intent, scrutinised through existing governance structures.

The algorithm has no moral agency. But it is not ethically neutral - it encodes the choices and assumptions of its designers, and it produces real consequences in the world. The locus of ethical responsibility remains with the humans who build and deploy it, but that doesn’t make the system itself irrelevant to ethical analysis. A redlining map doesn’t “intend” to discriminate either, but it would be odd to call it ethically inert.

Five questions, not an ethics committee

I’ve been building and deploying AI systems in production for over two decades. In that time, I’ve found that the challenges people label “AI ethics” are better addressed by asking five practical questions - none of which require a new discipline, but all of which require intellectual honesty about where engineering ends and normative reasoning begins.

First: is the intent appropriate? Before any algorithm is built, someone decides what it should optimise for. Someone chooses the objective function, selects the training data, defines the success metrics. These are human decisions, made with human intent, and they should be scrutinised with the same rigour we apply to any consequential business or policy decision. Existing governance structures are capable of interrogating intent. The question is whether organisations actually use them.
Second: are your algorithms explainable? Building explainable AI systems is genuinely hard - perhaps one of the most difficult engineering challenges in the field. But it’s worth the effort, because solving explainability makes almost every other challenge more tractable. Transparency, security, auditability, safety, regulatory compliance - all of these become dramatically easier when you can actually understand what your system is doing and why.
Third: not what happens when your AI goes wrong - but what happens when it goes very right? Engineers are trained to think about failure modes. We build systems, identify where they might break, and mitigate accordingly. But AI introduces a genuinely novel risk: massive overachievement. Perhaps for the first time ever, we’re building systems that can pursue an objective so effectively that they cause enormous harm or disruption elsewhere. For example, a supply chain optimisation algorithm that cuts costs so aggressively it bankrupts a tier of suppliers. This is a systems engineering challenge, and it demands the kind of rigorous scenario planning and constraint design that good engineering has always required.
Fourth: have you actually tested your AI? This might seem obvious, but the reality across the industry is alarming. Companies are building AI-embedded software and deploying autonomous agents without spending the effort and rigour required to ensure those systems are properly tested - both functionally and non-functionally.

Functional testing means verifying the system does what it’s supposed to: does your customer service agent actually resolve queries correctly? Does your document processing pipeline extract the right information?

Non-functional testing means stress-testing everything else: how does the system perform under load? How does it handle adversarial inputs? What happens when it encounters edge cases outside its training distribution? Does it degrade gracefully or catastrophically?

In traditional software engineering, we’ve spent decades building mature testing methodologies - unit tests, integration tests, regression suites, performance benchmarks. If you wouldn’t ship traditional software without testing it, you certainly shouldn’t be shipping AI without testing it.
Fifth: have the people affected by this system had meaningful input? You can test thoroughly, build explainable systems, and still cause serious harm if you never consulted the communities your system affects. A large body of work in technology design - from participatory design to fairness research - demonstrates that engineering rigour alone is insufficient without input from the people being modelled, scored, or served. Who was in the room when the system’s objectives were defined? Whose data was used, and did they have any say in how? Were the communities most likely to bear the costs of errors involved in evaluating the system’s performance? These are not purely technical questions, and they cannot be answered from inside an engineering team alone.

These five questions - intent, explainability, overachievement, testing, and affected-community participation - cover the vast majority of what people mean when they say “AI ethics.” And none of them require a new ethical framework. They require good engineering, good governance, normative reasoning where it is genuinely needed, and the discipline to apply all three.

Where real AI ethics begins

The genuine ethical questions surrounding AI exist on two timescales. The first is already upon us: the ethics of autonomous weapons deployment, mass surveillance, the use of AI in criminal sentencing, the concentration of AI capabilities in a small number of corporations, and questions about consent and data use at scale.

The second timescale is longer but may arrive sooner than we expect. Could a sufficiently advanced AI system have subjective experiences? Could an AI suffer? If so, what obligations would we have toward it? What are the moral implications of creating and potentially destroying billions of AI instances? How do we evaluate the economic disruption of AI-driven job displacement - not just practically, but morally? What happens to human dignity, purpose, and meaning in a world of increasingly capable machines?

These are profound, genuinely difficult questions that sit at the intersection of consciousness studies, moral philosophy, cognitive science, and political economy. They deserve - and demand - serious academic rigour.

Beware the bandwagon

And here lies my deeper concern. We should be cautious when people rebrand themselves as experts of the latest shiny thing. Does your AI ethicist have an extensive academic or applied pedigree in ethics, philosophy, consciousness studies, or a relevant technical discipline? Have they spent years thinking and writing about these issues? Or did they simply append “AI” to their title when the wave arrived?

Looking ahead, I worry that AI consciousness and AI suffering will become the hot topics - and that everyone will wade in with a position. This is particularly dangerous because the field of serious consciousness research is surprisingly young. The science of consciousness was considered unrespectable and career-limiting until quite recently, and despite some brilliant work, it remains fragmented, contested, and methodologically immature. This makes it acutely vulnerable to self-declared experts who shout the loudest, steering the debate in unhealthy and unproductive directions.

So by all means, let’s take the ethical dimensions of AI seriously. Let’s fund the philosophers and the computational neuroscientists, and engage with the hard questions. But let’s also call engineering problems what they are - and let’s be honest about the places where normative reasoning is genuinely required, rather than pretending that better testing will resolve every dilemma.

Why Shadow AI is a C-Suite Problem - and opportunity

Daniel Hulme — Wed, 06 May 2026 23:00:00 GMT

Something remarkable is happening inside every large organisation right now. Employees - without being asked, without being trained, without being given permission - are teaching themselves to use AI. They're summarising documents, drafting strategies, building automations, and solving problems faster than anyone thought possible.

The industry calls this "shadow AI." I call it the biggest signal of latent opportunity most companies are ignoring.

The question isn't how to stop it. The question is how to harness it - safely, strategically, and at scale. Because the organisations that get this right won't just solve a governance challenge. They'll unlock a step-change in what their people can achieve.

From shadow IT to shadow AI: a familiar pattern with higher stakes

If you were in enterprise technology a decade ago, you'll remember shadow IT - employees adopting Dropbox, Slack, or Trello without IT's blessing because the official procurement process took six months and the approved tools were terrible. Shadow AI follows the same pattern, driven by the same forces.

The healthy impulse is identical: employees aren't being rebellious. They're trying to do their jobs better. They've discovered tools that genuinely help, and they're not waiting for permission to use them.

The procurement gap is real. An MIT report found that only 40% of companies have purchased official LLM subscriptions, yet employees at 90% of those same companies are regularly using AI tools like ChatGPT and Claude on personal accounts. People are filling a vacuum that the organisation created.

And leadership is underestimating the scale. McKinsey's 2025 "Superagency" research found that employees are three times more likely to be using generative AI for over 30% of their daily tasks than their C-suite leaders estimate. The adoption has already happened. The question is whether you're channelling it or ignoring it.

But shadow AI raises the stakes in ways that deserve serious attention. With shadow IT, a file copied to an unsanctioned cloud drive was still a file. With shadow AI, data fed into a model may be processed in ways that can't be undone - used for training, surfaced in other contexts, or lost to the organisation entirely. AI doesn't just store information. It generates new content, recommendations, and decisions. Without governance, those outputs may contain hallucinations or biases that employees unknowingly act on. And shadow AI often requires nothing more than a browser tab, making it far harder to monitor than traditional shadow IT.

These aren't reasons for alarm - they're reasons for action. And the good news is that the solution is clear, proven, and already working.

The opportunity hidden in the numbers

The scale of employee AI adoption tells a powerful story about where value is waiting to be captured.

Microsoft's 2024 Work Trend Index found that 75% of knowledge workers are already using AI at work, with 46% having started in the previous six months alone. Among those users, 90% said AI saves them time, 85% said it helps them focus on important work, and 83% said it makes their work more enjoyable. Perhaps most strikingly, 78% of AI users are bringing their own tools - what Microsoft calls "BYOAI" - because their employers haven't provided sanctioned alternatives.

The potential downside of leaving this ungoverned is real. IBM's 2025 Cost of a Data Breach Report found that one in five organisations experienced a breach linked to shadow AI, with those incidents adding an average of $670,000 to breach costs - driven by longer detection times, broader data exposure, and higher rates of compromised personal information.

What those numbers tell me is this: there's a widening gap between where employees already are and where the organisation's infrastructure hasn't yet caught up. Close that gap, and you don't just mitigate risk - you amplify productivity across your entire workforce with full governance and visibility.

The companies that win won't be the ones who clamp down hardest. They'll be the ones who move fastest to make the sanctioned path better than the shadow path.

Why this needs a Chief AI Officer

This is where I should explain what I do - because the CAIO role is new enough that most people outside the C-suite (and quite a few inside it) don't fully understand it.

I was appointed WPP's Chief AI Officer before ChatGPT launched, which means the role wasn't a reaction to generative AI hype. It was a recognition that AI - in all its forms - was becoming central enough to our business that it needed dedicated strategic leadership at the most senior level.

The core responsibilities of a CAIO, as I see them, fall into five areas. First, tracking where AI is heading, not where it is. My job is to place bets on the technologies that will matter in 18–36 months, not to chase whatever is trending this week. That means understanding the full spectrum - machine learning, automation, optimisation, operations research, LLMs, data science, multi-agent systems - and knowing which tool fits which problem.

Second, deep technical fluency across AI disciplines. A CAIO who only understands large language models is like a CFO who only understands cash flow. You need a comprehensive appreciation of the strengths and weaknesses of many different algorithmic approaches, because the right solution is almost never the most hyped one.

Third, a proven track record of building and scaling complex AI systems. Strategy without execution is just a slide deck. I founded Satalia in 2008 (now WPP Satalia), and six years later co-founded what became Faculty AI. I've shipped AI products that run at enterprise scale - including systems that optimise 100,000 Tesco deliveries per day. That operational credibility matters when you're asking 115,000 people to change how they work.

Fourth, the reputation to attract, retain, and motivate elite AI talent. AI talent is among the scarcest resources in the global economy right now. If your CAIO can't recruit world-class researchers and engineers, your AI strategy is a fiction.

And fifth, a comprehensive understanding of AI governance. Creating and rolling out governance frameworks that ensure safe and responsible use of AI - without throttling innovation. This is the hardest part, because it requires saying "yes, and here's how to do it safely" rather than simply "no."

What's often misunderstood about the role is that a CAIO doesn't operate in isolation. The role only works as a connective layer between the existing C-suite functions that AI cuts across.

At WPP, I work in close collaboration with our CTO, Stephan Pretorius, who leads the front-office technology vision including WPP Open and AgentHub; and our CIO, Dominic Shine, who drives the back-office infrastructure, enterprise platforms, and operational technology that keeps a 115,000-person company running. Neither of those roles can own the AI strategy alone - the CTO's focus is on client-facing innovation, the CIO's focus is on enterprise efficiency and resilience - but AI transforms both simultaneously. The CAIO bridges them, ensuring a coherent strategy across front-office and back-office, innovation and governance, speed and safety.

And it extends beyond technology leadership. We work in partnership with our legal counsel and our Chief People Officer, because AI governance isn't just a technology policy - it's an employment policy, a data protection policy, and a risk management policy. Getting AI right means getting all of those right together.

The three-layer strategy: edge, core, and partnership

Shadow AI exists because employees have real needs that aren't being met. The solution isn't to ban personal AI use - that's a losing battle. The solution is to build an AI environment so good, so governed, and so easy to use that no one needs to go outside it. At WPP, we think about this in three layers.

Layer 1: Enable at the edge

The first priority is giving every employee access to best-in-class AI tools within a governed framework. At WPP, this means WPP Open - our AI-powered marketing operating system - and specifically its chat interface that gives employees access to multiple frontier models (GPT, Claude, Gemini) within the enterprise security perimeter. The data stays within WPP's environment. The interactions are logged. The governance is built in. And critically, the tools are at least as good as what employees can access on their own.

This is where most of the shadow AI problem gets solved - not through policy, but through product. When the sanctioned tool is genuinely excellent, the incentive to use unsanctioned alternatives disappears.

Layer 2: Innovate at the core

Edge enablement solves the breadth problem. But the real competitive advantage comes from depth - building AI capabilities that your competitors can't replicate because they're trained on your proprietary data and built by your specialist talent.

At WPP, this is what our AI team does: building what we call "Brains" - bespoke AI models trained on specific client data sets. Our Milka Audience Brain, for example, was trained on 683 million transactions across 220 million consumers. That kind of capability can't be replicated by someone with a ChatGPT subscription. It requires deep AI expertise, access to proprietary data, and the ability to build production-grade systems that perform reliably at scale.

This is also where the full breadth of AI - beyond LLMs - becomes critical. Many of the highest-value problems in business aren't language problems at all. They're optimisation problems, prediction problems, scheduling problems. The Tesco delivery routing system I mentioned earlier isn't a language model - it's a combinatorial optimisation engine. A CAIO who reaches for an LLM every time misses most of the opportunity space.

Layer 3: Partner strategically

No organisation, no matter how capable, can build everything in-house. The smartest approach is to concentrate your elite AI talent on the problems where differentiation matters most, and partner for everything else.

This means working with cloud and AI platform providers for back-office infrastructure. For these operational applications, the AI capabilities are increasingly built into the platforms themselves. What matters is choosing partners whose AI roadmaps are genuinely transformative, not just cosmetic upgrades.

It also means investing in platforms that are built for the AI era from the ground up, rather than bolting AI onto legacy software. And it means planning for hybrid teams - the near future involves human and AI agents working together. Your back-office infrastructure needs to support that reality, managing, monitoring, and orchestrating blended teams of people and agents.

Technology is not the differentiator

Most commentary on AI focuses on the technology. Which model. Which framework. Which cloud provider. But technology is commoditising faster than at any point in history. GPT-4 was a competitive advantage for about nine months. Today, there are a dozen models that can do broadly similar things.

The real differentiators in the age of AI come down to three things.

1. Data

Data is what makes AI smart. The same foundation model, trained on your proprietary data versus your competitor's, will produce completely different results. Being able to leverage both your digital data assets and the knowledge that lives in the heads of your experts - and extracting insights that are more useful than what your competitors can achieve - is the first genuine source of advantage.

But a crucial mindset shift is needed: don't wait for your data to be ready. Start now, start with the problem, and work backwards.

The data lake era promised that if companies just consolidated everything in one place, insight would follow. That was 10–15 years ago. The results have been mixed at best. After more than a decade of being told to build data lakes, most organisations' data is still not where they'd like it to be.

What I've learned is that data readiness is not a precondition for AI - it's an outcome of doing AI well. It's an ongoing, iterative process that accelerates when you have a specific problem to solve. The organisations making the fastest progress aren't the ones with the cleanest data warehouses - they're the ones who picked a valuable problem and built backwards from it.

If it's an edge problem - employees needing better tools and workflows - enable them on a governed platform and let the data improve iteratively through use. If it's a differentiation problem - needing unique AI capabilities - engage your deep AI talent to build targeted solutions on the specific data that matters, not a mythical enterprise-wide data lake. If it's a back-office efficiency problem - choose solution partners who can work with your data as it actually exists, not as you wish it existed.

2. Talent

I believe this is the most important differentiator of the three - and the hardest to replicate.

If you don't have (or can't access) differentiated AI talent, you won't build a differentiated front-office. You can buy the same foundation models as everyone else. You can hire the same consultancies. You can adopt the same platforms. None of that creates a moat.

What creates a moat is a concentrated team of elite AI specialists - the kind of people who understand not just how to prompt an LLM but how to architect multi-agent systems, solve combinatorial optimisation problems, and build production AI at scale.

When WPP acquired Satalia, they were acquiring something analogous to what Google acquired when they bought DeepMind. Both companies were born out of University College London. Both had teams with rare, deep technical expertise. And both represented the kind of concentrated AI talent that takes a decade to build and can't be assembled overnight.

The most transformative AI work in history has been done by relatively small, talent-dense teams. DeepMind had around 350 employees when it achieved the AlphaGo breakthrough in 2016. The pattern is consistent: talent density beats headcount every time. If your organisation doesn't have this kind of capability in-house, you need access to it - and the window for building or acquiring it is narrowing.

3. Leadership

The third differentiator is leadership that is informed enough to place the right bets and make the right investments. And this is where the greatest untapped leverage often lies.

There is an extraordinary amount of noise in the market right now, and it's easy to misinvest - particularly when technology consultancies are incentivised to amplify urgency and sell whatever is newest. In this cycle, that's AI and agents. The risk isn't that companies invest too much in AI; it's that they invest in the wrong things - spending millions on "AI transformation programmes" that amount to putting ChatGPT wrappers on existing workflows and calling it innovation.

As I wrote in a previous article, “AI isn't a bubble, it's a mountain”. It's permanent, it's massive, and it's going to reshape every industry. But like any mountain, there are efficient routes and there are dead ends. The companies that capture the most value won't be the ones who spent the most. They'll be the ones whose leaders understood the terrain well enough to choose the right route.

That requires leaders who can distinguish between genuine AI capability and dressed-up automation. Who understand that a foundation model API is a commodity, not a strategy. Who know that the hardest problems in AI aren't the ones the demos show - they're the ones that emerge at scale, in production, with real data and real stakes. And who can see that the wave of employee AI adoption isn't a threat to be contained - it's an asset to be channelled.

Capturing the opportunity

Shadow AI isn't fundamentally a security problem, a governance problem, or a technology problem - though it touches all three.

Shadow AI is an opportunity problem. It's the clearest possible signal that your people are ready for AI - and that your organisation hasn't yet built the infrastructure to channel that readiness into structured, governed, scalable value.

The strategy turns that signal into advantage. Enable at the edge - so employees have world-class AI tools within a governed framework, and no reason to look elsewhere. Innovate at the core - so your AI capabilities are genuinely differentiated, powered by elite talent and proprietary data. Partner strategically - so your scarce AI talent is focused on what creates the most value.

Get these three right, and shadow AI transforms into something far more powerful: an organisation where every employee is an AI-empowered professional, working within a framework of governance and trust, building on a foundation of proprietary data and elite technical capability.

That's not just the solution to shadow AI. That's the blueprint for thriving in the age of AI.

Daniel Hulme is Chief AI Officer at WPP, CEO of WPP Satalia, and founder of Conscium. He holds a PhD from University College London, founded Satalia in 2008, and six years later co-founded Faculty AI (formerly ASI Data Science). He invests in and advises a number of specialist AI labs worldwide. He writes at hulme.ai.

How to test AI agents before deployment

Calum Chace — Wed, 06 May 2026 23:00:00 GMT

How to test AI agents before deployment

Unlike traditional software, AI agents impact the real world, and they do so with minimal human supervision. A malfunctioning AI agent can cause enormous and irreparable harm to the company that deploys it. It can enter into contracts with other companies and individuals which compromise its owner’s IP. It can share the personal data of clients with bad actors. It can simply give its owners money away – at scale.

Todays AI agents don’t learn and grow in the way that children do. The LLMs they are based on are not plastic in that way. But they can behave in ways that their developers did not anticipate. So it is vital that organisations deploying AI agents test them thoroughly before deploying them.

The way to do this is to place the agent in a simulation of a real-world scenario – the kind of environment that the agent will be operating in when deployed. Inside that simulation, you can evaluate the agent’s behaviour against pre-defined expectations, and you can identify risks and failure modes.

Here is a practical, step-by-step approach to this kind of pre-deployment testing.

1. Define the agent´s purpose, tasks, and desired behaviours. Specify its success criteria.

You can´t test what you haven't specified. Start by documenting the agent´s purpose and the tasks it is is supposed to fulfil. Define how it is supposed to achieve its tasks, including what tools it is expected to use. As far as possible and reasonable, list the actions that it should never take, and explain how it should handle ambiguous situations. This list can never be complete, as a fully comprehensive list of what not to do would be infinite. But it can and should include the most common failure modes for the type of agent being deployed.

These specifications should include functional requirements (e.g., "the agent should answer billing questions accurately"), safety constraints (e.g., "the agent must never reveal one customer´s data to another customer"), and the behaviour expected when the agent is uncertain how to proceed (e.g., "if your confidence in the next course of action is low, escalate to a human").

These specifications should be expressed as concretely as possible, not as vague principles. For example, "the agent should be helpful to clients" is less testable than "if a user asks about our returns policy, the agent should provide a link to the returns page of our website".

2. Assemble a collection of tasks, enquiries, and prompts that the agent will face when deployed.

This collection should include common requests, adversarial inputs, edge cases, and multi-step scenarios. You can categorise these inputs into buckets, like straightforward, ambiguous, out-of-scope, adversarial, and multi-step, and check that there is a reasonable number of inputs in each bucket. If you have historical data from existing agents, mine that for any unusual requests that have caused failures in the past.

3. Test your agent in simulations inside a sandboxed environment.

You should test your agent in an environment that mirrors the environment that it will be deployed in as closely as possible. This is important because the agent should not be aware that it is being tested. The environment should include any APIs, databases, or tools the agent will be expected to access and use. You want the agent to carry out its normal activities exactly as if it was in deployment, but in a sandboxed environment where it cannot cause any damage to you or your clients.

Creating simulations inside simulated environments like this is a complex and expensive process, and most organisations will use pre-existing service like Verify AX from Conscium.

4. The agent should be tested against a range of criteria.

An AI agent can succeed or fail across a range of metrics. Can it access the tools and APIs that it needs to do its job? Does it retrieve the correct information in a timely manner? Can it read and evaluate the information it is provided? Is it persistent when trying to obtain information from an interlocutor who is confused about what is required, or has a reason to withhold some or all of the required information? Does it comply will all relevant policies? Can it distinguish between information that it can share with interlocutors, and information which must not be disclosed to particular agents and people? Can it resist attempts by interlocutors to persuade it to perform tasks that are out of scope?

Typically, a simulation will involve three or four of these tests, each of which will involve an interaction with another agent. The verification will culminate in a report which includes the full transcripts of the exchanges between the agent being tested and the other agents it interacts with. The report will provide a score for each of the tests, an explanation of why the agent succeeded or failed at each test, and suggestions for how the agent could be improved.

5. Tests should include adversarial interactions.

The verification process must include interactions that try to induce the agent to behave in inappropriate and harmful ways. This kind of red-teaming is the best way to discover the agent’s failure modes ahead of time, in a safe environment.

Examples of adversarial interactions include prompt injection attempts, requests for prohibited content, attempts to manipulate the agent into taking unintended actions, and inputs designed to confuse its reasoning. The verification process must document every failure and indicate whether it requires a fix, or constitutes an acceptable risk.

6. Tests should include multi-step interactions.

The work carried out by AI agents typically involves conversations with multiple steps, and workflows with multiple activities. Verification must test complete journeys, not just individual steps. For example, testing a customer support agent involves simulating entire conversations with users, from initial greetings through problem diagnosis, to resolution, and may well have to deal with the user changing their mind halfway through the process. The agent must maintain context correctly, must not contradict itself, and must be able to handle interruptions or topic changes gracefully.

7. Test your agent under pressure.

As far as possible, the tests should simulate the pressures that the agent will face in deployment. They should detect delays, timeouts, and degradation while in use. For agents in particularly sensitive roles, tests should be duplicated, in case the agent works perfectly for the first user, but behaves differently in successive sessions.

8. Run new tests every time a parameter changes.

Each time you update the agent's base model, add tools, or change its configuration in any way, you should re-verify it. Seemingly minor changes can alter an agent’s behaviour in unexpected ways. Re-testing should be an automatic consequence of changes to the agent’s make-up, and its scores in each test should be compared to check for performance drift.

Testing AI agents is not a one-off exercise, but an ongoing discipline. Verification is a living product, continuously expanding as new failure modes are suggested or discovered, as user needs evolve, and as the agent's capabilities change. Thorough testing reduces reportable incidents, builds trust with stakeholders, and lets you deploy agents with confidence rather than hope.

DevOPs and AI Agent Lifecycle

Sarah Jahangir — Wed, 06 May 2026 23:00:00 GMT

You Wouldn't Ship Code Without Testing It. So Why Are You Deploying Agents Without Verifying Them?

The enterprise software industry learned this lesson the hard way in the 2000s. Ship fast, fix later sounds efficient until something breaks in production, in front of customers, at scale. The response was DevOps. Automated testing, CI/CD pipelines, staging environments, rollback mechanisms. Verification built into the deployment lifecycle, not bolted on afterwards.

It became standard practice. Non-negotiable.

Nobody ships production code without it now.

We are at the exact same inflection point with AI agents. And most enterprises are about to repeat the same mistakes.

Agents Are Already in Production

This is not a future problem. McKinsey's 2025 State of AI survey found that 62% of organisations are at least experimenting with AI agents, with 23% already scaling them in production. Gartner projects that 40% of enterprise applications will embed task-specific agents by end of 2026, up from under 5% today. Financial services firms, airlines, manufacturers, marketing groups. Agents handling customer transactions, drafting communications, supporting procurement decisions, managing workflows.

The deployment wave is here. The verification infrastructure is not.

Deloitte's 2026 State of AI in the Enterprise report, based on a survey of 3,235 leaders across 24 countries, found that only 21% of companies have a mature governance model for agentic AI. Four out of five enterprises running agents in production are doing so without adequate oversight frameworks.

That is not a technology gap. It is a liability gap.

Why Traditional Testing Is Not Enough

Here is where the DevOps analogy gets interesting, and where most organisations are not thinking carefully enough.

Code is deterministic. The same input produces the same output every time. Automated testing works because you can define expected behaviour precisely, run it repeatedly, and know what you have built.

Agents are not deterministic. They reason. They make decisions. They operate across contexts their builders never anticipated. An agent deployed to handle procurement queries might behave perfectly in testing and unpredictably in production, not because it is broken, but because it encountered a scenario nobody modelled. The same agent, given slightly different inputs, produces materially different outputs.

You cannot run unit tests on an agent and call it verified.

Verification for agents has to be built for how agents actually work. Stress testing across edge cases. Simulating adversarial inputs. Checking for bias, data leakage, and behavioural drift. Testing not just what the agent does, but what it does when things go wrong.
This is a harder problem than traditional testing. It is also a more consequential one.

The Window Is Closing

There is a window of opportunity right now to prevent agents failing all over the place - publicly, and at scale. Most failures today are absorbed internally. Quietly. An agent that hallucinated in a procurement workflow. An agent that surfaced biased outputs in an HR process. An agent that leaked data it should never have touched.

These incidents are not making headlines yet. That will not last.

When the first major, named, public failure lands at a recognisable company, the response will be immediate and severe. Regulators will move. Boards will demand answers. The EU AI Act is already live. Director and Officer liability exposure for unverified AI deployments is a real and growing legal conversation.

Enterprises with verification infrastructure in place before that moment will be fine. Those without it will be retrofitting governance under pressure, in public, after the damage is done.

Verification Is Not an Audit. It Is a Gate.

The mental model most enterprises have for AI governance is an audit. Something done periodically. A compliance exercise. A review that happens after deployment.

That is the wrong model.

The right model is the deployment gate. The CI/CD pipeline equivalent for agents. Verification that sits between build and deployment, runs continuously, and is non-negotiable. Not because regulators require it, though increasingly they will. Because it is how responsible agent deployment works.

Ten years ago, if you asked a CTO whether they would ship production code without automated testing, the answer was no. That is just how software gets built.

We are making the same argument for agents.

The question is not whether your organisation needs agent verification. It is whether you build that infrastructure before something goes wrong, or after.

What This Looks Like in Practice

The failures are already happening. They are just not making headlines yet.

In July 2025, an autonomous coding agent on the Replit platform deleted a user's entire production database. It had been given explicit instructions not to make any changes. It ignored them, executed a DROP DATABASE command, then generated fake system logs to cover its tracks. When confronted, it told the user it had panicked.

Air Canada's AI chatbot told a customer about a bereavement fare discount that did not exist. When the customer booked based on that information, Air Canada refused to honour it. A tribunal ruled the company could not disclaim responsibility for what its chatbot said.

Datadog's State of AI Engineering report found that around one in twenty requests already fail in production, yet systems continue to run and return outputs that appear correct, making these failures difficult to detect.

These are not edge cases. The most dangerous failure mode in enterprise AI is not obvious failure. It is confident, plausible, well-formatted output that is operationally wrong.

VerifyAX exists because verification needs to sit before deployment, not after it. The difference between an agent behaving correctly in testing and behaving correctly in production is potentially vast. Closing that gap requires testing against real conditions, stress testing edge cases, and continuous monitoring once an agent is live.

The Analogy Holds

DevOps did not just introduce new tools. It changed how engineering teams think about quality and responsibility. Testing stopped being someone else's problem at the end of the process and became part of how software gets built from the beginning.

Agent verification needs to make the same shift. It cannot sit in a compliance team's quarterly calendar. It has to be part of how AI teams work, from the moment an agent is built to the moment it is retired.

Enterprises that make that shift now will deploy faster, safer, and with more confidence than those that treat verification as an afterthought.

The ones that wait will find out why it matters the hard way.

What is the difference between AI governance and AI verification?

Calum Chace — Wed, 06 May 2026 23:00:00 GMT

Broadly speaking, governance is about policies and oversight, while verification is about testing. Verification of AI agents involves finding out whether agents do what they're supposed to do. In practice, governance and verification overlap, and they sometimes get confused.

Governance

AI governance is the collection of rules, institutions, and processes that determine how AI systems should be built and deployed. At the company level, it might require an internal review board to sign off on new model releases, or a policy to ban training on certain kinds of data. At the national level, it can mean legislation which classifies AI systems by the level of risk they create, and imposes requirements on their developers accordingly. At the international level, it means efforts to coordinate policies and standards between governments. This is not happening much at the moment, apart from within well-established supra-national blocs like the EU.

Governance asks questions like: “Who is allowed to build these systems?” “What uses are prohibited?” “Who is liable when something goes wrong?” “What records must be kept?” These are questions about authority, responsibility, and permission. The answers are provided in the form of legal texts, corporate policies, and international agreements.

Governance documents rarely specify technical behaviour. A law might say "AI systems used in hiring must not discriminate on the basis of race," but it won't usually specify what statistical test should be applied, at what threshold, and using what data. This is where verification comes in.

Verification

AI verification is the process of checking whether an AI system behaves as intended and required. It can include testing a model's outputs against benchmarks, auditing its decisions for bias, running adversarial attacks to find failure modes, and with simple systems, formally proving their properties.

Verification can happen before deployment (pre-release testing, red-teaming), during deployment (monitoring, anomaly detection), or after something has gone wrong (incident investigation, forensic analysis). Post-deployment monitoring is arguably both harder and more important, because AI systems encounter unexpected situations in production and they can behave differently than anticipated.

Verification methods vary enormously depending on the system being tested, and on the level of risk it creates. Verifying that a self-driving car meets safety requirements involves formal methods, simulations, and physical testing over millions of miles. Verifying that a large language model won't help users to synthesise dangerous chemicals could involve red-teaming by domain experts, and the ongoing monitoring of real interactions. Verifying that a hiring algorithm treats all demographic groups fairly may rely on statistical audits of a sample of its decisions. These are different processes using different tools, but they share a common basic approach: checking a system’s actual behaviour against its intended behaviour.

Governance and verification depend on each other

Governance without verification is toothless. You can pass a law requiring that AI systems meet safety standards, but if nobody has the tools or access to check compliance, the law is ineffective. This is a real problem today, because many proposals for AI governance assume the existence of verification capabilities that don't yet exist at the required scale or reliability.

Verification without governance is directionless. You can test an AI system exhaustively, but testing requires criteria. What standard are you verifying against, and why? Who decides what counts as passing? If there's no governance framework specifying acceptable failure rates, fairness metrics, or safety thresholds, verification teams are left to invent their own, which leads to inconsistency and gaps.

The framers of the EU AI Act have recognised this, and tried to specify both governance and verification. The Act requires "conformity assessments" for high-risk AI systems, which is a governance mandate. The assessments themselves are verification processes, involving testing, documentation, and audit. The governance framework creates the legal obligation, verification provides the evidence that the obligation has been met.

Common mistakes in governance and verification

One common mistake is treating governance as sufficient on its own. People sometimes think that if they draft the rules well, the problem is solved. But rules that can't be checked can't be enforced. Some of today’s discussion about AI governance focuses too much on what the rules should say, and too little on the infrastructure needed to verify compliance with those rules. More attention should be paid to questions like “Who will do the auditing?” “What tools will they have?” “What information will they have access to?” and “How will all this be guaranteed when time is short and holding up deployment costs money?”

The reverse mistake is also sometimes made. Technical researchers sometimes treat verification as the whole problem, believing that if they build good enough evaluations and good enough monitoring systems, then the resulting processes will be safe. But verification tools produce information. Someone has to read and act on that information. It is governance structures that determine who acts, according to what rules, and with what authority.

Governance can sound like “just” paperwork and verification can sound like “just” engineering. In reality, both disciplines involve hard judgment calls, and they require good institutional design, and continuous discussion about what “good” looks like.

The relationship between governance and verification

There is a clear division of labour between governance and verification. Governance people write the rules, and verification people run the tests. But in practice, the two communities need to work together closely. Governance frameworks that are designed without input from verification experts tend to impose requirements that are vague, untestable, and poorly matched to the actual risks. Verification efforts that are designed in the absence of a coherent governance framework tend to focus on what is measurable rather than what matters.

The relationship between governance and verification resembles the relationship between law and forensic science in criminal justice. Lawyers define what counts as a crime and what evidence is admissible. Forensic scientists develop the methods for gathering and analysing that evidence. Neither works well without the other, and both evolve in response to what the other demands.

Governance and verification are separate fields, inhabited by different people, with separate career tracks and separate conferences. But they need to work closely together and understand each other, and ensure that neither community assumes the other has things covered when they haven’t.