Our paper “Bias in Language Models: Interplay of Architecture and Data?” by Mozhgan Talebpour, Yunfei Long, Alba G. Seco De Herrera, and Shoaib Jameel has been accepted for presentation at the SIGIR 2025 Short Paper track.

In this work, we explore the foundational origins of bias in pre-trained language models (PLMs), going beyond detection to examine how biases form and propagate within different model architectures. Using a novel attention weight analysis, we uncover distinct patterns for biased versus neutral content, highlighting the crucial role of both training data and the self-attention mechanism in shaping these internal representations.