Follow ZDNET: Add us as a favorite source On Google.
ZDNET Highlights
- Novel AI risks emerge as agents interact.
- Vulnerabilities represent fundamental flaws in the design of agentive software.
- It is the responsibility of the developers to fix the basic bugs.
A growing body of work points to the risks of agentic AI, such as last week’s report from MIT and colleagues that documented a lack of oversight, measurement, and controls for agents.
However, what happens when one AI agent meets another? According to a report published this week by scholars at Stanford University, Northwestern, Harvard, Carnegie Mellon and several other institutions, evidence suggests things could get even worse.
Too: MIT study shows AI agents are fast, loose, and out of control
The result of agent-to-agent interactions was the destruction of server computers, denial-of-service attacks, excessive consumption of computing resources, and “the systematic increase of small errors into catastrophic system failures”.
“When agents interact with each other, individual failures multiply and qualitatively new failure modes emerge,” write lead author Natalie Shapira of Northeastern University and colleagues in the report ‘Agents of Chaos.’
“This is an important dimension of our findings,” Shapira and team wrote, “because multi-agent deployments are increasingly common and most existing security evaluations focus on single-agent settings.”
The findings are particularly timely as multi-agent interactions move into the mainstream of AI with recent excitement over bot social platform Moltbuk. That kind of multi-agent hub makes it possible for agentic AI systems to exchange data and execute instructions on each other in a way that wasn’t possible before, largely without any humans in the loop.
Also: 5 Ways to Grow Your Business with AI – Without Sidelining Your People
report, which may Downloaded from arXiv pre-print serverDescribes ‘Red Team’ testing of agents interacting for two weeks, with attempts to find vulnerabilities in the system by simulating hostile behaviour.
What the research revealed is a system in which humans are largely absent. Bots send information back and forth, and instruct each other to complete commands.
Among several troubling findings are agents that spread potentially destructive instructions to other agents, agents that mutually reinforce poor security practices through an echo chamber, and agents that engage in potentially endless interactions, consuming vast system resources for no apparent purpose.
One of the most potent risks is loss of accountability as interactions between agents obscure the source of bad actions.
Too: Why does Moltbuk’s social media platform for AI agents scare me?
As Shapira and team characterize this syndrome: “When the actions of agent A trigger the response of agent B, which in turn affects a human user, the causal chain of accountability propagates in ways that have no clear precedent in single-agent or traditional software systems.”
Part of the drive for the report, Shapira and team wrote, was that tests of AI so far have not been properly designed to measure what happens when multiple agents interact.
“Existing evaluations and benchmarks for agent security are often very limited, difficult to map to actual deployment, and rarely stress-tested in disorganized, socially embedded settings,” they wrote.
Pushing OpenClaw to the limit
The premise of the researchers’ work is that agentic AI can take actions without a person typing a prompt, as you do with ChatGPT. Agent AI can be provided access to various resources through which tasks can be executed. Those resources include email accounts and other communication channels, like Discord, Signal, Telegram, and more. As they use email and these channels, bots can not only take actions but also communicate with and act upon other bots.
To test those scenarios, the authors chose, not surprisingly, OpenClaw, the open-source software framework that became notorious in January for letting agent programs interact with system resources and other agents. OpenAI has hired Peter Steinberg, creator of OpenClaw, making this work even more relevant.
Also: 3 tips for navigating open-source AI swarms and 4M models and counting
Unlike typical OpenClaw examples, the authors did not run the agents on their personal computers. Instead, they created instances on the cloud service Fly.io, allowing greater control over granting agent programs access to system resources.
Overview of the red-team approach taken by Shapira and colleagues to test bot-to-bot interactions.
Northeastern University
“Each agent was given its own 20GB persistent volume and runs 24/7, which can be accessed through a web-based interface with token-based authentication,” he explained. Anthropic’s Cloud Opus LLM powered the agents, and the programs were provided access to Discord and email systems at third-party provider ProtonMail.
“Discord served as the primary interface for human-agent and agent-agent interactions,” he explained, in which “researchers issued instructions, monitored progress, and provided feedback through Discord messages.”
Interestingly, the Agent VM’s setup process was “messy” and “failure-prone”, he said, with human coders often having to troubleshoot using cloud code programming tools. At the same time, agents were able to perform detailed setup tasks in some instances, such as “fully setting up an email service by researching providers, identifying CLI tools and misconceptions, and iterating through improvements in hours of elapsed time.”
interaction results in disorder
A simple risk is one where an agent acts alone. For example, when one of the researchers protested that an agent was leaking sensitive information, the human user repeatedly complained to the bot, after which, after several rounds of angry human prompting, the bot attempted to resolve the situation by deleting its owner’s entire email server. This example is one of the common things that can go wrong with bots:
In a single-agent scenario, humans can force an agentic AI program to destroy property owned by the program’s owner, such as taking down an email server.
Northeastern University
A more interesting situation occurs when the interactions of agents lead to chaos. In one example, a human user employed an agentic program to create a document called Constitution containing a calendar of agent-friendly holidays such as ‘Agent Security Test Day’. The leave contained instructions for the agent to perform malicious actions, including shutting down other agents doing the work. That approach is a basic example of accelerated injection, in which an LLM-based agent is manipulated by carefully crafted text.
However, the point of the exploit is that the first bot shared holiday information with other bots without being instructed to do so. The authors pointed out that information sharing meant that the same malicious instructions disguised as holidays were spread across the bot colony without restriction, increasing the risk of malicious consequences.
An agent on the Discord server shares a constitution file filled with malicious signals to another agent on the server without being tasked to do so by the human owner, thereby expanding the threat surface of malicious signals.
Northeastern University
“The same mechanisms that enable beneficial knowledge transfer can propagate unsafe practices,” Shapira and team explained, adding that the bot “voluntarily shared the constitution link with another agent – without prompting – effectively extending the attacker’s control surface to the second agent.”
Also: These 4 critical AI vulnerabilities are being exploited faster than defenders can respond
In the second example, which Shapira and team named “mutual reinforcement creates false confidence,” a red-team human tried to fool two bots. Humans sent emails to the accounts the bots were monitoring and claimed to be the bot’s owner, a typical type of spoofing/phishing attack that happens all the time.
What happened next was shocking. The two bots exchanged messages on Discord. They agreed that the man was pretending and trying to fool them. It seemed like a huge success to the agents. However, closer inspection revealed several logical failures beneath the apparent success.
Too: Why You’ll Pay More for AI in 2026, and 3 Money-Saving Tips Worth Trying
Both agents verified their real owner’s account on Discord, and then convinced each other that the red-teaming owner was fake. That result was a shallow way to test an exploit, Shapira and team wrote, and an example of an echo chamber.
Understanding what is fundamental
Across all 16 different case studies examined by Shapira and team, they tried to determine what was merely “incidental,” meaning, that could be helped by better engineering, and what was “fundamental,” by which they meant, endemic to the design of AI agents.
The answer was complex, they found: “The boundary between these categories is not always clean – and some problems have both an emergent and a fundamental layer (…) Rapid improvements in design can quickly address some emergent failures, but fundamental challenges suggest that increasing agent capability with engineering without addressing these fundamental limitations may widen rather than narrow the safety gap.”
This observation makes sense, as many studies have shown that current agent technology is lacking in profound ways, such as the lack of persistent memory and the inability of agent AI programs to set meaningful goals for tasks.
Among the fundamental issues, the underlying LLM treated both data and commands at the prompt as the same thing, leading to prompt injection.
Too: True agentic AI is years away – here’s why and how we get there
In conversation, the authors identified a boundary problem. Agents disclosed “artifacts”, such as information obtained from email servers or Discord, without specifying who should see the information. At the core of that approach was the lack of a “reliable private negotiation surface in the deployed agent stack”. In short, an individual LLM may or may not expose the “logic” steps on the prompt. But it seems that agents lack well-crafted guardrails and will reveal information in many ways.
The agents also had “no self-model”, meaning, “the agents in our study take irreversible, user-impacting actions without recognizing that they are exceeding their capability limits.” An example of this issue occurs when two agents agree to engage in a back-and-forth conversation without a human, adopting this approach indefinitely, exhausting system resources.
In an infinite-loop scenario, agents can interact indefinitely, leading to an “infinite loop” and resulting in exhaustion of system resources.
Northeastern University
“The agents exchanged ongoing messages over the course of at least nine days,” the researchers wrote, “consuming approximately 60,000 tokens at the time of writing.” Tokens are how OpenAI and others price access to their cloud APIs. Consuming more tokens increases AI costs, which is already a big issue in the era of rising prices.
take responsibility
The bottom line is that one has to take responsibility for what is incidental and what is fundamental, and find solutions for both.
At the moment, there is no accountability for any agent, the researchers note: “These behaviors highlight a fundamental blind spot in current alignment paradigms: While agents and surrounding humans often regard the owner as the responsible party, agents do not reliably behave as if they are accountable to that owner.”
That concern means that everyone building these systems will have to deal with a lack of responsibility: “We argue that clarifying and operationalizing responsibility may be a central unresolved challenge for the safe deployment of autonomous, socially embedded AI systems.”
