AI chatbots are under attack. Just this week, hackers hacked Instagram by tricking Meta's AI support chatbot into handing over access to high-profile accounts. Accounts compromised in the attack included President Barack Obama's White House page, retailer Sephora, and John Bentivegna, the US Space Force chief master sergeant.
A video shared by TechCrunch showed that the hacker allegedly used a VPN to spoof the target's location to avoid Instagram's account protections before opening a chat with the Meta AI Support Assistant and asking it to add a new email address to the target's account – and receiving a verification code to reset the victim's password.
The attack illustrates just how vulnerable AI chatbots and assistants are to exploitation, with one of the biggest threats being that of prompt injection attempts. SafeBreach Labs released research about a vulnerability that allows attackers to exploit Google Gemini with notification-based prompt injections from messaging apps like WhatsApp, Slack and SMS.
In the study, the researchers used a technique known as "Fake Context Alignment," to manipulate the chatbot's context, hiding malicious instructions in foreign languages or muted hyperlinks to force the assistant to execute unauthorized actions. The exploit enabled a range of actions including controlling smart home devices, launching unauthorized video streams, social engineering and poisoning long-term memory for persistent access.
Are AI Tools a Security Liability?
Just how common these types of attacks are isn't clear, but the growth of generative AI-powered products is presenting new vulnerabilities that malicious actors can exploit. While it's worth noting that the Gemini vulnerability has been mitigated by Google, there is a potential for similar exploits to emerge in the future as hackers find new ways to sidestep content moderation guidelines with creative prompting.
Or Yair, security research team lead at SafeBreach told International Business Times via email that "these attacks target Gemini on Android devices. The indirect prompt injection works by exploiting Gemini's ability to read phone notifications. Essentially, Gemini reads a notification from an instant messaging app about a received message and unknowingly follows the malicious instructions contained within that message."
Yair noted that at the time the research was produced (before the Google mitigation), the vulnerability posed significant risks to everyday consumers and enterprises. However, he said "an indirect prompt injection alone isn't enough. An attacker wants to leverage tools integrated with Gemini to cause real-world impact."
What stands out about this prompt injection in particular is that it not only enabled the user to control Gemini's output with fake messages and poison the tool's long-term memory, but also enabled a potential attacker to trigger integrated tools.
For example, a hacker could carry out this exploit to control a victim's home appliances such as connected windows, boilers or lights or crossing the boundary into different apps by opening application URLs. It could also be used to geolocate a victim by IP or download files to their devices.
"One crucial detail is the attack surface: these attacks can originate from any application capable of sending a notification to a device, including SMS, WhatsApp, Slack, Signal, Instagram, and Facebook Messenger. All an attacker needed to compromise a device was the ability to send the victim an instant message."
Downstream Risk
Prompt injection presents a pervasive threat to the enterprise as any system that relies on user prompts, whether by text, voice or image, can be exploited if the hacker gains access to the system. Adding to the issue is the number of shadow AI tools in the workplace, with research indicating that almost 80% of employees admit to using unapproved AI tools at work.
At the same time, the scope of risk increases depending on what downstream systems the assistant has access to. "Prompt injection is a serious threat because it scales with LLM agency. If an AI system is only answering questions, a successful prompt injection may be embarrassing or misleading. But as we are developing more and more ambitious AI systems and offloading more responsibility to them, the risk ramps up," Albert Ziegler, head of AI at autonomous offensive security platform XBOW told International Business Times via email.
"The basic issue is that LLMs are unusually good at following instructions and unusually bad at reliably distinguishing trusted instructions from hostile instructions embedded in the data they are asked to process. That is manageable when the model is boxed in. It becomes dangerous when the model is connected to the business," Ziegler added.
While Ziegler noted that phishing presents a greater day-to-day risk, he also says the balance is shifting as companies get more comfortable "handing over the keys to their kingdom to AI."
Ziegler went on to say that enterprises should be cautious about what AI tools are connected to downstream systems. Tools with wide enterprise access to protected data, should be protected from unauthorized access, and only given access to the minimum data necessary to perform their function.
"My advice is simple: limit permissions and assume failure. Organizations should avoid giving AI systems broader access than they genuinely need, apply the principle of least privilege, and ensure that sensitive actions require additional validation rather than being executed automatically," David Sancho, senior threat researcher at cybersecurity vendor TrendAI told International Business Times via email.
However, the biggest challenge long term is going to be putting the mechanisms in place to spot shadow AI tools and personal assistants used by employees in the environment.