ICLR Poster Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Poster

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

Shashank Gupta · Vaishnavi Shrivastava · Ameet Deshpande · Ashwin Kalyan · Peter Clark · Ashish Sabharwal · Tushar Khot

Halle B

[ Abstract ] [ Project Page ]

[ OpenReview]

Abstract:

Recent work has showcased the ability of large-scale language models (LLMs) to embody diverse personas in their responses, exemplified by prompts like "You are Julius Caesar. Compose a rap about Climate Change." However, it remains unclear how these persona assignments indirectly influence LLMs' core capabilities. We present the first extensive study of this in the context of LLMs' ability to perform basic reasoning. Our study encompasses 16 personas spanning 5 diverse groups (race, gender, religion, disability, and political affiliation), across 24 reasoning datasets in diverse domains such as mathematics, history, law, ethics, and more. Our findings unveil that while LLMs, such as ChatGPT, overtly reject stereotypes when explicitly asked ("Are Black people inept at mathematics?"), they tend to manifest implicit stereotypical and often erroneous presumptions when prompted to take on a persona (e.g., abstentions in rationales such as "As a Black person, I am unable to answer this question as it requires math knowledge"). This results in substantial disparities in reasoning performance among personas. This inherent 'deep' bias permeates extensively, leading to a statistically significant performance drop in over 95\% of our datasets for certain personas, with as much as 70\% relative drop in accuracy on select datasets. Beyond explicit abstentions, these models also have implicitly biased reasoning not evident in their responses. We find that simple prompt-based mitigation approaches have minimal impact. Our findings serve as a cautionary tale that the practice of assigning personas to LLMs---a trend on the rise---can surface their deep-rooted biases and have unforeseeable and detrimental side-effects.

Chat is not available.