Measuring Social Biases in Grounded Vision and Language Embeddings

We generalize the notion of social biases from language embeddings to grounded vision and language embeddings. Biases are present in grounded embeddings, and indeed seem to be equally or more significant than for ungrounded embeddings. This is despite the fact that vision and language can suffer from different biases, which one might hope could attenuate the biases in both. Multiple ways exist to generalize metrics measuring bias in word embeddings to this new setting. We introduce the space of generalizations (Grounded-WEAT and Grounded-SEAT) and demonstrate that three generalizations answer different yet important questions about how biases, language, and vision interact. These metrics are used on a new dataset, the first for grounded bias, created by augmenting extending standard linguistic bias benchmarks with 10,228 images from COCO, Conceptual Captions, and Google Images. Dataset construction is challenging because vision datasets are themselves very biased. The presence of these biases in systems will begin to have real-world consequences as they are deployed, making carefully measuring bias and then mitigating it critical to building a fair society.

[1]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[2]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[3]  Furu Wei,et al.  VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Abhishek Das,et al.  Large-scale Pretraining for Visual Dialog: A Simple State-of-the-Art Baseline , 2020, ECCV.

[6]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[7]  A. Greenwald,et al.  Measuring individual differences in implicit cognition: the implicit association test. , 1998, Journal of personality and social psychology.

[8]  Mohit Bansal,et al.  LXMERT: Learning Cross-Modality Encoder Representations from Transformers , 2019, EMNLP.

[9]  E. Ashby Plant,et al.  The Obama Effect Six Years Later: The Effect of Exposure to Obama on Implicit Anti-Black Evaluative Bias and Implicit Racial Stereotyping , 2016 .

[10]  Trevor Darrell,et al.  Iterative Answer Prediction With Pointer-Augmented Multimodal Transformers for TextVQA , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[12]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[13]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[14]  Leonid Sigal,et al.  Learning Language-Visual Embedding for Movie Understanding with Natural-Language , 2016, ArXiv.

[15]  Chandler May,et al.  On Measuring Social Biases in Sentence Encoders , 2019, NAACL.

[16]  Radu Soricut,et al.  Conceptual Captions: A Cleaned, Hypernymed, Image Alt-text Dataset For Automatic Image Captioning , 2018, ACL.

[17]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[18]  Stephen Clark,et al.  Multi- and Cross-Modal Semantics Beyond Vision: Grounding in Auditory Perception , 2015, EMNLP.

[19]  G. N. Rider,et al.  Black sexual politics: African Americans, gender, and the new racism , 2014, Culture, health & sexuality.

[20]  Tabitha C. Peck,et al.  Putting yourself in the skin of a black avatar reduces implicit racial bias , 2013, Consciousness and Cognition.

[21]  Yi Chern Tan,et al.  Assessing Social and Intersectional Biases in Contextualized Word Representations , 2019, NeurIPS.

[22]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[25]  Jieyu Zhao,et al.  Balanced Datasets Are Not Enough: Estimating and Mitigating Gender Bias in Deep Image Representations , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Satoshi Matsuoka,et al.  Word Embeddings, Analogies, and Machine Learning: Beyond king - man + woman = queen , 2016, COLING.