UNQOVERing Stereotypical Biases via Underspecified Questions

While language embeddings have been shown to have stereotyping biases, how these biases affect downstream question answering (QA) models remains unexplored. We present UNQOVER, a general framework to probe and quantify biases through underspecified questions. We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors: positional dependence and question independence. We design a formalism that isolates the aforementioned errors. As case studies, we use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion. We probe five transformer-based QA models trained on two QA datasets, along with their underlying language models. Our broad study reveals that (1) all these models, with and without fine-tuning, have notable stereotyping biases in these classes; (2) larger models often have higher bias; and (3) the effect of fine-tuning on bias varies strongly with the dataset and the model size.

[1]  Nam Soo Kim,et al.  On Measuring Gender Bias in Translation of Gender-neutral Pronouns , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[2]  Mai ElSherief,et al.  Mitigating Gender Bias in Natural Language Processing: Literature Review , 2019, ACL.

[3]  Haoran Zhang,et al.  Hurtful words: quantifying biases in clinical contextual word embeddings , 2020, CHIL.

[4]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[5]  J. Henrich,et al.  Most people are not WEIRD , 2010, Nature.

[6]  Jieyu Zhao,et al.  Gender Bias in Multilingual Embeddings and Cross-Lingual Transfer , 2020, ACL.

[7]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[8]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[9]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[10]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[11]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[12]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[14]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.

[15]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[16]  Yi Chern Tan,et al.  Assessing Social and Intersectional Biases in Contextualized Word Representations , 2019, NeurIPS.

[17]  Alfredo Maldonado,et al.  Measuring Gender Bias in Word Embeddings across Domains and Discovering New Gender Bias Word Categories , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[18]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[19]  Brian Larson,et al.  Gender as a Variable in Natural-Language Processing: Ethical Considerations , 2017, EthNLP@EACL.

[20]  Sameer Singh,et al.  Are Red Roses Red? Evaluating Consistency of Question-Answering Models , 2019, ACL.

[21]  Suresh Venkatasubramanian,et al.  Fairness in representation: quantifying stereotyping as a representational harm , 2019, SDM.

[22]  Nanyun Peng,et al.  The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[23]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[24]  Yoav Goldberg,et al.  Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection , 2020, ACL.

[25]  Vivek Srikumar,et al.  OSCaR: Orthogonal Subspace Correction and Rectification of Biases in Word Embeddings , 2020, ArXiv.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[28]  Alexandra Chouldechova,et al.  Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.

[29]  Vivek Srikumar,et al.  On Measuring and Mitigating Biased Inferences of Word Embeddings , 2019, AAAI.

[30]  Noah A. Smith,et al.  Evaluating Gender Bias in Machine Translation , 2019, ACL.

[31]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[32]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[33]  Alan W Black,et al.  Quantifying Social Biases in Contextual Word Representations , 2019, ACL 2019.

[34]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.