ICLR Poster Are Models Biased on Text without Gender-related Language?

Poster

Are Models Biased on Text without Gender-related Language?

Catarina Belém · Preethi Seshadri · Yasaman Razeghi · Sameer Singh

Halle B

[ Abstract ]

[ OpenReview]

Abstract: In the large language models era, it is imperative to measure and understand how gender biases present in the training data influence model behavior.Previous works construct benchmarks around known stereotypes (e.g., occupations) and demonstrate high levels of gender bias in large language models, raising serious concerns about models exhibiting undesirable behaviors.We expand on existing literature by asking the question: \textit{Do large language models still favor one gender over the other in non-stereotypical settings?}To tackle this question, we restrict language model evaluation to a \textit{neutral} subset, in which sentences are free of pronounced word-gender associations. After characterizing these associations in terms of pretraining data statistics,we use them to (1) create a new benchmark with low gender-word associations, and (2) repurpose popular benchmarks in the gendered pronoun setting | WinoBias and \Winogender |, removing pronounced gender-correlated words.Surprisingly, when testing $20+$ models (e.g., Llama-2, Pythia, and OPT) in the proposed benchmarks, we still detect critically high gender bias across all tested models. For instance, after adjusting for strong word-gender associations, we find that all models still exhibit clear gender preferences in about $60\%$-$95\%$ of the sentences, representing a small change (up to $5\%$) from the original \textit{stereotypical} setting.By demonstrating that measured bias is not necessarily due to the presence of highly gender-associated words, our work highlights important questions about bias evaluation as well as potentially underlying model biases.

Chat is not available.