Mitigating Gender Bias in Captioning Systems

Recent studies have shown that captioning datasets, such as the COCO dataset, may contain severe social bias which could potentially lead to unintentional discrimination in learning models. In this work, we specifically focus on the gender bias problem. The existing dataset fails to quantify bias because models that intrinsically memorize gender bias from training data could still achieve a competitive performance on the biased test dataset. To bridge the gap, we create two new splits: COCO-GB v1 and v2 to quantify the inherent gender bias which could be learned by models. Several widely used baselines are evaluated on our new settings, and experimental results indicate that most models learn gender bias from the training data, leading to an undesirable gender prediction error towards women. To overcome the unwanted bias, we propose a novel Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to explore correct gender visual evidence. Experimental results validate that GAIC can significantly reduce gender prediction error, with a competitive caption quality. Our codes and the designed benchmark datasets are available at this https URL.

[1]  Catherine Tucker,et al.  Algorithmic bias? An empirical study into apparent gender-based discrimination in the display of STEM career ads , 2019 .

[2]  Moustapha Cissé,et al.  ConvNets and ImageNet Beyond Accuracy: Understanding Mistakes and Uncovering Biases , 2017, ECCV.

[3]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[4]  Mai ElSherief,et al.  Mitigating Gender Bias in Natural Language Processing: Literature Review , 2019, ACL.

[5]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Amit K. Roy-Chowdhury,et al.  Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval , 2018, ACM Multimedia.

[8]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[10]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[11]  Trevor Darrell,et al.  Object Hallucination in Image Captioning , 2018, EMNLP.

[12]  Jieyu Zhao,et al.  Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[13]  Md. Zakir Hossain,et al.  A Comprehensive Survey of Deep Learning for Image Captioning , 2018, ACM Comput. Surv..

[14]  Chenxi Liu,et al.  Attention Correctness in Neural Image Captioning , 2016, AAAI.

[15]  Matthias Bethge,et al.  Shortcut Learning in Deep Neural Networks , 2020, Nat. Mach. Intell..

[16]  David García,et al.  It's a Man's Wikipedia? Assessing Gender Inequality in an Online Encyclopedia , 2015, ICWSM.

[17]  Andy Way,et al.  Getting Gender Right in Neural Machine Translation , 2019, EMNLP.

[18]  Trevor Darrell,et al.  Women also Snowboard: Overcoming Bias in Captioning Models , 2018, ECCV.

[19]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Geoffrey Zweig,et al.  From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Marta R. Costa-jussà,et al.  Equalizing Gender Bias in Neural Machine Translation with Word Embeddings Techniques , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[22]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[23]  Emily M. Bender,et al.  Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science , 2018, TACL.

[24]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[25]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[26]  Yun Fu,et al.  Tell Me Where to Look: Guided Attention Inference Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.

[28]  Wei Xu,et al.  Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[29]  Cynthia Carter,et al.  Women and news: A long and winding road , 2011 .

[30]  C. Lawrence Zitnick,et al.  CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Shruti Bhargava,et al.  Exposing and Correcting the Gender Bias in Image Captioning Datasets and Models , 2019, ArXiv.

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[34]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[35]  Jianwei Yang,et al.  Neural Baby Talk , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Catherine E. Tucker,et al.  Algorithmic Bias? An Empirical Study of Apparent Gender-Based Discrimination in the Display of STEM Career Ads , 2019, Manag. Sci..

[37]  Pratik Rane,et al.  Self-Critical Sequence Training for Image Captioning , 2018 .

[38]  Alon Lavie,et al.  Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems , 2011, WMT@EMNLP.

[39]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Timnit Gebru,et al.  Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification , 2018, FAT.

[41]  Richard S. Zemel,et al.  Understanding the Origins of Bias in Word Embeddings , 2018, ICML.

[42]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[43]  Dhruv Batra,et al.  Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Richard Socher,et al.  Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).