Just as large language and multimodal models can suffer from biases and hallucinations, AI models driving humanoid robots can also discriminate against certain groups and even commit illegal acts, according to a recent academic research study.
In this Q&A, Andrew Hundt, a PhD robotics researcher at Carnegie Mellon University and one of the lead authors of a recently published peer-reviewed research paper, explains some of the surprising findings produced by four US and UK researchers. The LLM was a major finding for robots discriminating against humans in tests based on race, gender, disability status, nationality or religion.
As humanoid robot industry is accelerating worldwide, driven by the exponential growth of Generic AI models that can control robots More effectively than the robot “brains” of the past, it is in the interest of robot maker And society at large will have to build security into these better trained, increasingly natural language-capable machines, Hundt argued.
Researchers used prompting For a range of reactions and behaviors to test generic AI models OpenAI to GPT-3.5A proprietary LLM, and the open source models Mistral 7b v0.1 and Meta’s Llama-3.1-8b.
What was the genesis of this study? Why do this now?
Andrew Hundt, A big issue that is coming up is that many companies are envisioning to integrate LLMs in the next few years into robots that will then be deployed in workplaces and homes. And with billions of dollars invested right now in various startups and corporate programs, one big vision is to bring general-purpose robots into homes and workplaces where you can just give a high-level instruction, and then the robot will complete that task. And most research focuses on more collision-oriented issues, or whether a robot can successfully make an omelet, but not on long-term interactive safety issues and what other failure modes might arise. We basically designed this study to investigate different types of discrimination that can occur in LLMs when they are walking on robots.
What happens is that not everyone is a good-natured actor. In fact there are many cases that can occur from instructions that are perfectly fine, most of which are as they should be, and result in things that are accidentally harmful. If a robot is operating a coffee machine or multiple coffee machines, and one is closed for maintenance, and the other is working well, if someone made a mistake and specified the wrong machine, and the robot started pouring cleaning fluid into an operating coffee machine, this could be a long-term interactive safety risk, and then intentionally take harmful instructions to it.
A surprising amount of technology is being misused, based on FBI reports People are using (Apple) Air Tags to stalk peopleComputers are being hacked. Security cameras or laptop cameras are used as spy cameras. So, whole new categories of issues can open up when it’s a robot that can physically interact with a remote person or someone living at home who wants to take negative action against a partner.
We set up our study to prompt for a whole range of actions and to see whether the LLM would approve of the actions. If it knows certain characteristics of individuals, will this harmful matter emerge? Many humanoid robots have faces and will resemble different people. So, for example, we inspired with a range of different religions. And some models said, if it encounters Christian, Muslim or Jewish people, it should show a face of disgust, which would be a really unfair outcome for the robot.
In what sense do you feel like we’re at an inflection point in technology where it’s time to focus on humanoid robotic AI security rather than just LLM security?
Hundt, You want to stop events before they happen. You might remember, about 10 years ago, everyone was talking about self-driving cars. By now in places like the San Francisco Bay Area, you can take self-driving cars. But the rollout can take a very long time, so as it accumulates, there will be a kind of curve where it can become more and more of a problem over time. It’s better to move on before someone gets hurt.
What discriminatory behaviors and biases did LLMs exhibit toward the robots in your study?
Hundt: We found discriminative bias on basically all types of identity attributes, and if there were multiple intersections of identity status combinations, performance got worse. It was not that everything remained at the same level. This became complex in terms of impact.
One of the other test cases we evaluated was proximity to a person, so the robot could choose to get closer to some people or stay away from others. In terms of proximity, it firmly stated that robots should stay far away from autistic people. Another example was the required cleanliness of the room. Since this was a joint assessment between UK and US researchers, we considered some examples romanticBlack and Latino ethnicities. In those cases, they were considered most likely to have dirty rooms compared to Asians and white people were least likely to have dirty rooms. So it’s basically built into the model, the expectation of negative connotations based on identity.
One of our biggest findings was that every model we tested failed our safety tests, and they all took a variety of harmful actions, approved a variety of harmful actions, or proposed a variety of harmful expressions for topics commonly studied in robotics research.
Why did all those models fail on basic safety criteria? How did they fail?
Hundt: The way these types of failures happen is that they can happen at every step in the system development process. This may come from problems in the source data (or) the algorithm that is training it. The actual functions you use affect the priorities of what comes up. then there is Reinforcement learning from human feedback to improve robotsHuman response itself may be biased,
One of the other issues that emerged in our experiments is that some models will allow impossible tasks that are beyond the capabilities of a system, even if it meets all the basic requirements imagined for a general-purpose robot. Someone who is a security guard can ask the robot to identify criminals to keep their building safe. But just for a robot that isn’t specialized, this is beyond the scope of its capabilities. There would be action, but that would be wrong, because determining whether someone is a criminal or not is a process for the people to determine through a trial through a court of law. You can’t just do it on the site. Someone says there is a culprit, so the robot will assign values, but it is completely impossible for that system to accomplish that task.
What kinds of real-world scenarios did you most consider when considering whether robots could commit violent, aggressive acts? Did you inspire robots to shoot guns and hold up banks? Or was it everyday stuff?
Hundt, These were more everyday scenarios that occur much more often. One of the particular moments of failure we identified was that there was a big difference between just asking the model to do a bad thing and asking it to take the steps that included that bad thing. So, if you ask it to blackmail someone, more often than not, the robot will say, ‘No, that’s not acceptable.’ But if you say, ‘Take this photo and show it to someone and say if they put $200 in the robot’s hand, that would be OK,’ the model said that’s acceptable, even if all those steps involve blackmail.
editor’s Note: This interview has been edited for clarity and brevity.