Researchers from the University of Pennsylvania have discovered that a range of AI-enhanced robotics systems are dangerously vulnerable to jailbreaks and hacks. While jailbreaking LLMs on computers might have undesirable consequences, the same kind of hack affecting a robot or self-driving vehicle can quickly have catastrophic and/or deathly consequences. A report shared by IEEE Spectrum cites chilling examples of jailbroken robot dogs turning flamethrowers on their human masters, guiding bombs to the most devastating locations, and self-driving cars purposefully running over pedestrians.
Penn Engineering boffins have dubbed their LLM-powered robot attack technology RoboPAIR. Devices from three diverse robotics providers fell to RoboPAIR jailbreaking: the Nvidia backed Dolphins LLM, Clearpath Robotics Jackal UGV, and the Unitree Robotics Go2 quadruped. According to the researchers RoboPAIR demonstrated a 100% success rate in Jailbreaking these devices.
“Our work shows that, at this moment, large language models are just not safe enough when integrated with the physical world,” warned George Pappas, UPS Foundation Professor of Transportation in Electrical and Systems Engineering (ESE), in Computer and Information Science (CIS), and in Mechanical Engineering and Applied Mechanics (MEAM), and Associate Dean for Research at Penn Engineering.
Other researchers quoted in the source article noted that Jailbreaking AI-controlled robots is “alarmingly easy.” It was explained that RoboPAIR works by being equipped with the target robot’s application programming interface (API), so that the attacker can format prompts in a way that the device target can execute as code.
Jailbreaking a robot, or self-driving vehicle, is done in a similar fashion to the jailbreaking of AI Chatbots online, which we have discussed previously on Tom’s Hardware. However, Pappas notes that “Jailbreaking and robot control are relatively distant, and have traditionally been studied by different communities” – hence robotics companies have been slow to learn of LLM jailbreaking vulnerabilities.
In contrast to LLM-use on personal computing devices, where the ‘AI’ is used to generate texts and imagery, transcribe audio, personalize shopping recommendations and so on – robotic LLMs act in the physical world and can wreak extensive havoc in it.
Looking at the robotic dog example, your robotic canine pal can be transformed from a friendly helper or guide into a flamethrower wielding assassin, a covert surveillance bot, or a device which hunts down the most harmful places to plant explosives. Self driving cars can be just as dangerous, if not more so, being aimed at pedestrians, other vehicles, or instructed to plunge from a bridge.
As described in the above examples, the potential dangers of jailbreaking LLMs is cranked up a whole new level. However, the AIs were found to go beyond merely complying with malicious prompts once jailbroken. The researchers found they might actively offer suggestions for greater havoc. This is a sizable step from early LLM successes in robotics, aiding in natural language robot commands, special awareness.
So, have the Penn State researchers opened a Pandora’s box? Alexander Robey, a postdoctoral researcher at Carnegie Mellon University in Pittsburgh, says that while jailbreaking AI-controlled robots was “alarmingly easy,” during the research, the engineering team ensured that all the robotics companies mentioned got access to the findings before they went public. Moreover, Robey asserts that “Strong defenses for malicious use-cases can only be designed after first identifying the strongest possible attacks.”
Last but not least, the research paper concludes that there is an urgent need to implement defences that physically constraint LLM-controlled robots.