Is the risk of AI poisoning real?


Since the release of ChatGPT, cybersecurity experts have been wondering how to control the falsification of GenAI. They initially assumed a threat scenario, namely that the output of GenAI is already being distorted by data poisoning.

Even when the first chatbots were trained with machine learning and were then supposed to provide ready-made answers, there was manipulation. One example is Twitter now X. Back in 2016, members of the short messaging service found it funny to feed the chatbot Tay with racist content. The project was then terminated within a day. The situation is similar for all publicly accessible GenAI models. Sooner or later, trolling users will feed you with disinformation or ask you to search for it. The user himself is a problem, but there is a much worse one looming.


Split-view and front-running poisoning in practice

For a long time, this scenario represented the only threat. But that has changed since last year. Researchers at ETH Zurich, in collaboration with tech companies such as Google and NVIDIA, have demonstrated in a study how AI poisoning can be implemented. The researchers presented two types of attack for poisoning data sets. The study shows that these attacks can be used to poison 10 popular data sets such as LAION, FaceScrub and COYO. The first attack, split-view poisoning, exploits the mutability of Internet content to ensure that the initial view of the data set by a crawler differs from the view downloaded by subsequent clients. Using certain invalid confidence assumptions, the researchers showed how they could poison 0.01 percent of the LAION-400M or COYO-700M datasets with an effort budget of just $60. The second attack, front-running poisoning, targets data sets on the Internet that create snapshots of crowd-sourced content at regular intervals. Here the researchers opted for Wikipedia. The study shows that an attacker only needs a time-limited window to infiltrate malicious data.

This type of attack is becoming a serious threat and will have an impact on the software supply chain. By targeting inbound and outbound data pipelines, attackers can manipulate data to corrupt and even poison AI models and the results they produce. Even small changes to the code of an AI model during training can have serious consequences. Any malicious change to an AI model – no matter how insignificant it seems – will be amplified once the model is in production and acting autonomously.

With AI being used on an ever-increasing scale in business-critical applications and services, protecting the integrity of these systems is crucial. Kill switches are already widely used in software and hardware in sectors such as the manufacturing and chemical industries. They offer a safe way of preventing a dangerous situation from getting out of control and causing irreparable damage. This raises the question of what a kill switch for AI such as GenAI should look like. If a widely used GenAI is falsified, IT experts and especially IT security specialists must be able to check it and repair any damage. IT is already seeing such effects in numerous attacks on cloud providers or third-party software such as Solarwinds or even security software such as firewalls from Fortinet.


A solution to the problem described at the beginning can only be a kill switch for AI models. Instead of a single kill switch per AI model, there could be thousands of machine identities that are linked to a model and protect it at every stage – both during training and in production. IT security experts remain in control when AI is made dependent on identities. If the AI goes rogue, the affected identities are revoked. This means that it can no longer interact with other machines. It is then isolated from the rest of the system. If an AI system is compromised by hackers, enabling this kill switch can prevent it from communicating with certain services or shut it down entirely to prevent further damage and contain the threat.

Weitere Artikel