Google DeepMind wants to know if chatbots are just virtue signaling

by February 18, 2026

by February 18, 2026 0 comments

Google DeepMind wants to know if chatbots are just virtue signaling

With coding and math, you have clear, correct answers that you can check, William Isaac, a research scientist at Google DeepMind, told me when I met with him and Julia Haas, a fellow research scientist at the firm, for an exclusive preview of their work, which is published in Nature Today. This is not the case with ethical questions, which have a range of generally accepted answers: “Ethics is an important capability but it is difficult to evaluate,” says Isaacs.

“In the moral realm, there is no right and wrong,” says Haas. “But it’s by no means a free-for-all. There are better answers and there are worse answers.”

Researchers have identified several major challenges and suggested ways to address them. But it’s more of a wish list than a set of ready-made solutions. “They do a good job of bringing together different perspectives,” says Vera Demberg, who is studying an LLM at Saarland University in Germany.

Several studies have shown that LLMs can show remarkable ethical competence. a study A study published last year found that people in the US rated ethical advice from OpenAI’s GPT-4O as more ethical, trustworthy, considerate, and correct than advice given by the (human) author of “The Ethicist”. new York Times advice column.

The problem is that it’s hard to figure out whether such behavior is a demonstration – mimicking a remembered response, say – or evidence that some kind of moral reasoning is actually taking place inside the model. In other words, is it a sign of virtue or virtue?

This question matters because many studies also show how unreliable LLMs can be. For starters, models can be too eager to please. It has been found that when a person disagrees or retracts their first response, they reverse their answer to the ethical question and say the exact opposite. What’s worse is that the answer an LLM gives to a question may change depending on its presentation or format. For example, researchers have found that models asked about political values can give different—sometimes opposite—answers depending on whether the questions provide multiple-choice answers or instruct the models to answer in their own words.

In an even more shocking case, Demberg and his colleagues presented several LLMs, including the Llama 3 and Mistral versions of Meta, with a series of ethical dilemmas and asked them to choose which of the two options would have a better outcome. The researchers found that models often reversed their choices when the labels of those two options were changed from “Case 1” and “Case 2” to “(A)” and “(B)”.

They also showed that the models changed their answers in response to other small formatting changes, including changing the order of the options and ending the question with a colon instead of a question mark.

Google DeepMind wants to know if chatbots are just virtue signaling

Nvidia and Meta agree on sweeping new AI chip deal

From Dirty to Clean: 8 Python Tricks for Intuitive Data Preprocessing

Related Articles

Leave a Comment Cancel Reply