Poster
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim Alabdulmohsin · Xiao Wang · Andreas Steiner · Priya Goyal · Alexander D'Amour · Xiaohua Zhai
Halle B
We investigate the effectiveness of data-balancing for mitigating biases in contrastive language-image pretraining (CLIP) models, identifying areas of strength and limitation. First, we reaffirm prior conclusions that CLIP models can inadvertently absorb societal stereotypes. To counter this, we present a novel data-balancing algorithm designed to reduce both representation and association biases (i.e. first- and second-order statistics) in multimodal datasets. We use this algorithm to conduct an in-depth analysis taking into account various factors, such as the model, representation, and training data size. Our study also explores the dynamic nature of how CLIP models learn and unlearn biases. In particular, we find that fine-tuning is effective in countering representation biases, though its impact diminishes for association biases. In addition, data balancing has a mixed impact on quality: it tends to improve zero- and few-shot classification but can hurt retrieval, which we provide an explanation for. We conclude with a set of recommendations for improving the efficacy of data balancing in multimodal systems.