论文信息 - Measuring and Reducing Gendered Correlations in Pre-trained Models

Measuring and Reducing Gendered Correlations in Pre-trained Models

Pre-trained models have revolutionized natural language understanding. However, researchers have found they can encode artifacts undesired in many applications, such as professions correlating with one gender more than another. We explore such gendered correlations as a case study for how to address unintended correlations in pre-trained models. We define metrics and reveal that it is possible for models with similar accuracy to encode correlations at very different rates. We show how measured correlations can be reduced with general-purpose techniques, and highlight the trade offs different strategies have. With these results, we make recommendations for training robust models: (1) carefully evaluate unintended correlations, (2) be mindful of seemingly innocuous configuration differences, and (3) focus on general mitigations.

[1] Rachel Rudinger,et al. Gender Bias in Coreference Resolution , 2018, NAACL.

[2] Yiming Yang,et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[3] Zeyu Li,et al. Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[4] Ryan Cotterell,et al. It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution , 2019, EMNLP.

[5] Ryan Cotterell,et al. Counterfactual Data Augmentation for Mitigating Gender Stereotypes in Languages with Rich Morphology , 2019, ACL.

[6] Alex Wang,et al. What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[7] Jieyu Zhao,et al. Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[8] Noah A. Smith,et al. Evaluating Gender Bias in Machine Translation , 2019, ACL.

[9] Saif Mohammad,et al. Examining Gender and Race Bias in Two Hundred Sentiment Analysis Systems , 2018, *SEMEVAL.

[10] Ryan Cotterell,et al. Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[11] Yi Chern Tan,et al. Assessing Social and Intersectional Biases in Contextualized Word Representations , 2019, NeurIPS.

[12] Yoav Goldberg,et al. Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[13] Chandler May,et al. On Measuring Social Biases in Sentence Encoders , 2019, NAACL.

[14] Solon Barocas,et al. Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[15] Siva Reddy,et al. StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[16] Luís C. Lamb,et al. Assessing gender bias in machine translation: a case study with Google Translate , 2018, Neural Computing and Applications.

[17] Zhe Zhao,et al. Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations , 2017, ArXiv.

[18] Rachel Rudinger,et al. Hypothesis Only Baselines in Natural Language Inference , 2018, *SEMEVAL.

[19] Alexandra Chouldechova,et al. What’s in a Name? Reducing Bias in Bios without Access to Protected Attributes , 2019, NAACL.

[20] Ankur Taly,et al. Counterfactual Fairness in Text Classification through Robustness , 2018, AIES.

[21] R. Thomas McCoy,et al. Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[22] Alan W Black,et al. Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[23] Max Welling,et al. The Variational Fair Autoencoder , 2015, ICLR.

[24] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[25] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[26] Arvind Narayanan,et al. Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[27] Jieyu Zhao,et al. Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints , 2017, EMNLP.

[28] Toniann Pitassi,et al. Learning Adversarially Fair and Transferable Representations , 2018, ICML.

[29] Yoav Goldberg,et al. Adversarial Removal of Demographic Attributes from Text Data , 2018, EMNLP.

[30] Desmond Elliott,et al. Adversarial Removal of Demographic Attributes Revisited , 2019, EMNLP.

[31] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[32] Amos J. Storkey,et al. Censoring Representations with an Adversary , 2015, ICLR.

[33] Alexandra Chouldechova,et al. Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting , 2019, FAT.

[34] Lucy Vasserman,et al. Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.