Improving Gender Fairness of Pre-Trained Language Models without Catastrophic Forgetting

Although pre-trained language models, such as BERT, achieve state-of-art performance in many language understanding tasks, they have been demonstrated to inherit strong gender bias from its training data. Existing studies addressing the gender bias issue of pre-trained models, usually recollect and build genderneutral data on their own and conduct a second phase pre-training on the released pre-trained model with such data. However, given the limited size of the genderneutral data and its potential distributional mismatch with the original pre-training data, catastrophic forgetting would occur during the second-phase pre-training. Forgetting on the original training data may damage the model’s downstream performance to a large margin. In this work, we first empirically show that even if the gender-neutral data for second-phase pre-training comes from the original training data, catastrophic forgetting still occurs if the size of gender-neutral data is smaller than that of original training data. Then, we propose a new method, GEnder Equality Prompt (GEEP), to improve gender fairness of pre-trained models without forgetting. GEEP learns gender-related prompts to reduce gender bias, conditioned on frozen language models. Since all pre-trained parameters are frozen, forgetting on information from the original training data can be alleviated to the most extent. Then GEEP trains new embeddings of profession names as gender equality prompts conditioned on the frozen model. This makes GEEP more effective at debiasing as well. Because gender bias from previous data embedded in profession embeddings is already removed when they are re-intialized in GEEP before second-phase pre-training starts. Empirical results show that GEEP not only achieves state-of-the-art performances on gender debiasing in various applications such as pronoun predicting and coreference resolution, but also achieves comparable results on general downstream tasks such as GLUE with original pre-trained models without much forgetting.

[1]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[2]  Yoav Goldberg,et al.  Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them , 2019, NAACL-HLT.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Mai ElSherief,et al.  Mitigating Gender Bias in Natural Language Processing: Literature Review , 2019, ACL.

[7]  Slav Petrov,et al.  Measuring and Reducing Gendered Correlations in Pre-trained Models , 2020, ArXiv.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  Vincent Ng,et al.  Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge , 2012, EMNLP.

[10]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[11]  Yi Chern Tan,et al.  Assessing Social and Intersectional Biases in Contextualized Word Representations , 2019, NeurIPS.

[12]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[13]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[14]  Lawrence Carin,et al.  FairFil: Contrastive Neural Debiasing Method for Pretrained Text Encoders , 2021, ICLR.

[15]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[16]  Hector J. Levesque,et al.  The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[17]  Pasquale Minervini,et al.  Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models , 2021, EACL.

[18]  Zeyu Li,et al.  Learning Gender-Neutral Word Embeddings , 2018, EMNLP.

[19]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[20]  Thomas Lukasiewicz,et al.  WikiCREM: A Large Unsupervised Corpus for Coreference Resolution , 2019, EMNLP.

[21]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[22]  Eduardo Graells-Garrido,et al.  Women through the glass ceiling: gender asymmetries in Wikipedia , 2016, EPJ Data Science.

[23]  Jason Baldridge,et al.  Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns , 2018, TACL.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[26]  Alan W Black,et al.  Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[27]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.