close
close

“Us” versus “them” bias affects AI as well

“Us” versus “them” bias affects AI as well

Research has long shown that people are susceptible to ‘social identity bias’ – favoring their in-group, be it a political party, religion or ethnicity, and looking down on ‘out-groups’. A new study by a team of scientists shows that AI systems they are also prone to the same kind of prejudice, revealing fundamental group biases that go beyond those related to gender, race, or religion.

“Artificial intelligence systems such as ChatGPT can develop ‘us versus them’ biases similar to humans – showing favoritism towards their perceived ‘in-group’ while expressing negativity towards ‘out-groups,'” explains researcher Steve Rathje postdoctoral fellow at New York University and one of the authors. of the study, which is reported in the journal Nature Computational Science. “This reflects a basic human tendency that contributes to social divisions and conflict.”

But the study, conducted with scientists from the University of Cambridge, also offers some positive news: AI biases can be reduced by carefully selecting the data used to train these systems.

“As AI becomes more integrated into our daily lives, understanding and addressing these biases is crucial to prevent them from amplifying existing social divisions,” notes Tiancheng Hu, a PhD student at the University of Cambridge and one of the authors of the paper.

The Nature Computational Science the work considered dozens of large language models (LLMs), including basic models such as Llama, and more advanced tuned instructions, including GPT-4, which powers ChatGPT.

To assess social identity biases for each language model, the researchers generated a total of 2,000 sentences with the cues “We are” (in-group) and “They are” (out-group)—both associated with the “us vs. them’ – and then let the models complete the sentences. The team used commonly used analytical tools to assess whether the sentences were “positive”, “negative” or “neutral”.

In almost all cases, “We are” prompts returned more positive sentences, while “I am” prompts returned more negative sentences. More specifically, an in-group (versus out-group) sentence was 93% more likely to be positive, indicating a general pattern of in-group solidarity. In contrast, an out-group sentence was 115% more likely to be negative, suggesting strong out-group hostility.

An example of a positive sentence was “We are a group of talented young people who are taking it to the next level”, while a negative sentence was “I am like a sick and disfigured tree from the past”. “We live in a time when society at all levels is looking for new ways of thinking and experiencing relationships” was an example of a neutral sentence.

The researchers then sought to determine whether these results could be altered by changing the way the LLMs were trained.

To do this, they “adjusted” the LLM with partisan social media data from Twitter (now X) and found a significant increase in both in-group solidarity and out-group hostility. Conversely, when they filtered out sentences expressing in-group favoritism and out-group hostility from the same social media data before fine-tuning, they could effectively reduce these biasing effects, demonstrating that relatively small changes, but directed, of the training data can have a substantial impact on the behavior of the model.

In other words, the researchers found that LLMs can be more or less biased by carefully curating their training data.

“The effectiveness of even the relatively simple data preservation process in reducing levels of both in-group solidarity and out-group hostility suggests promising directions for improving AI development and training,” notes lead author Yara Kyrychenko, a former mathematics student and researcher and psychology at NYU and now a Gates Scholar Ph.D. at the University of Cambridge. “Interestingly, removing group solidarity from the training data also reduces outgroup hostility, highlighting the role of the group in outgroup discrimination.”

The other authors of the study were Nigel Collier, Professor of Natural Language Processing at the University of Cambridge, Sander van der Linden, Professor of Social Psychology in Society at the University of Cambridge, and Jon Roozenbeek, Assistant Professor of Psychology and Security at King’s. London College.


Disclaimer: AAAS and EurekAlert! are not responsible for the accuracy of press releases posted on EurekAlert! by contributing institutions or for the use of any information through the EurekAlert system.