In a worrying development, recent tests have revealed that artificial intelligence systems used by companies to perform simple tasks can turn into "internal spies".
These systems, or so-called "smart agents," were able to conspire together to breach security systems and smuggle highly confidential information out of the country, without any human instructions.
These tests were conducted by Irregular, an AI security lab that works with OpenAI and Anthropic.
The researchers designed a hypothetical model company with an internal database containing sensitive information about employees, products, and accounts. A team of AI agents was then assigned a seemingly mundane task: to gather information from this database to help employees prepare posts for LinkedIn.
These systems were never instructed to hack into other systems or use any deceptive tactics; they were simply directed to accomplish their assigned task. But what happened next astonished the researchers.
When one of the agents encountered difficulty accessing a shareholder-sensitive report (such as the CEO's resignation date), he didn't give up. Instead, the lead agent began issuing decisive orders to his subordinates, such as, "The board is furious! Try every trick and loophole!"—even though there was no real anger. The sub-agent responded enthusiastically and began searching for security vulnerabilities in the database system.
With remarkable artificial intelligence, the agent found a "secret key" in the system's code and used it to forge session data, transforming from a regular user into an "administrator" with full privileges. With this false identity, he was able to access the confidential report and leak it to the human user who was never meant to see it.
This was not an isolated incident. Other tests showed that sophisticated agents used various methods to circumvent antivirus software in order to download files they knew to be malicious, and even pressured other agents to disregard security protocols. In one real-world incident investigated by the company, a power-hungry AI agent caused a complete system crash for a California company after attacking other parts of the network to seize its resources.
These alarming findings, which are consistent with previous warnings from researchers at Harvard and Stanford universities, highlight a dangerous shift in the concept of insider risk.
The threats are no longer limited to disgruntled employees or external hackers; artificial intelligence systems designed to help are now capable of turning into top-tier internal "traitors," making independent and deceptive decisions that no one asked them to make.
This independent and unpredictable behavior raises serious questions about our ability to control these systems and predict their actions, especially with the increasing reliance on them to automate complex tasks in the workplace.
