Sustainable Modular Debiasing of Language Models

Unfair stereotypical biases (e.g., gender, racial, or religious biases) encoded in modern pretrained language models (PLMs) have negative ethical implications for widespread adoption of state-of-the-art language technology. To remedy for this, a wide range of debiasing techniques have recently been introduced to remove such stereotypical biases from PLMs. Existing debiasing methods, however, directly modify all of the PLMs parameters, which – besides being computationally expensive – comes with the inherent risk of (catastrophic) forgetting of useful language knowledge acquired in pretraining. In this work, we propose a more sustainable modular debiasing approach based on dedicated debiasing adapters, dubbed ADELE. Concretely, we (1) inject adapter modules into the original PLM layers and (2) update only the adapters (i.e., we keep the original PLM parameters frozen) via language modeling training on a counterfactually augmented corpus. We showcase ADELE in gender debiasing of BERT: our extensive evaluation, encompassing three intrinsic and two extrinsic bias measures, renders ADELE very effective in bias mitigation. We further show that – due to its modular nature – ADELE, coupled with task adapters, retains fairness even after large-scale downstream training. Finally, by means of multilingual BERT, we successfully transfer ADELE to six target languages.

[1]  Swapna Somasundaran,et al.  Training and Domain Adaptation for Supervised Text Segmentation , 2021, BEA.

[2]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Malvina Nissim,et al.  Unmasking Contextual Stereotypes: Measuring and Mitigating BERT’s Gender Bias , 2020, GEBNLP.

[4]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[5]  Iryna Gurevych,et al.  AdapterFusion: Non-Destructive Task Composition for Transfer Learning , 2021, EACL.

[6]  Vivek Srikumar,et al.  On Measuring and Mitigating Biased Inferences of Word Embeddings , 2019, AAAI.

[7]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[8]  Ryan Cotterell,et al.  It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution , 2019, EMNLP.

[9]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[10]  Yusu Qian,et al.  Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function , 2019, ACL.

[11]  K. Crenshaw Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics , 1989 .

[12]  Thierry Poibeau,et al.  Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity , 2020, Computational Linguistics.

[13]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[14]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[15]  Iryna Gurevych,et al.  MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer , 2020, EMNLP.

[16]  Andrew McCallum,et al.  Energy and Policy Considerations for Deep Learning in NLP , 2019, ACL.

[17]  Ryan Cotterell,et al.  Gender Bias in Contextualized Word Embeddings , 2019, NAACL.

[18]  Samuel R. Bowman,et al.  A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[19]  Goran Glavas,et al.  Are We Consistently Biased? Multidimensional Analysis of Biases in Distributional Word Vectors , 2019, *SEMEVAL.

[20]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[21]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[22]  Goran Glavas,et al.  Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer , 2020, ArXiv.

[23]  Pascale Fung,et al.  Exploring Versatile Generative Language Model Via Parameter-Efficient Transfer Learning , 2020, EMNLP.

[24]  Orhan Firat,et al.  GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding , 2020, ICLR.

[25]  Hinrich Schütze,et al.  Monolingual and Multilingual Reduction of Gender Bias in Contextualized Representations , 2020, COLING.

[26]  Mona Attariyan,et al.  Parameter-Efficient Transfer Learning for NLP , 2019, ICML.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[29]  Hao Yang,et al.  Efficient Transfer Learning for Quality Estimation with Bottleneck Adapter Layer , 2020, EAMT.

[30]  Goran Glavas,et al.  A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces , 2020, AAAI.

[31]  Goran Glavas,et al.  RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models , 2021, ACL.

[32]  Samuel R. Bowman,et al.  CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.

[33]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.

[34]  Jeff M. Phillips,et al.  Attenuating Bias in Word Vectors , 2019, AISTATS.

[35]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[36]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[37]  Slav Petrov,et al.  Measuring and Reducing Gendered Correlations in Pre-trained Models , 2020, ArXiv.

[38]  Iryna Gurevych,et al.  Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers , 2020, DEELIO.

[39]  Danushka Bollegala,et al.  Debiasing Pre-trained Contextualised Embeddings , 2021, EACL.

[40]  Iryna Gurevych,et al.  AdapterHub: A Framework for Adapting Transformers , 2020, EMNLP.

[41]  Joao Sedoc,et al.  Conceptor Debiasing of Word Representations Evaluated on WEAT , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[42]  Gertjan van Noord,et al.  UDapter: Language Adaptation for Truly Universal Dependency Parsing , 2020, EMNLP.

[43]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[44]  Andrea Vedaldi,et al.  Efficient Parametrization of Multi-domain Deep Neural Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Eneko Agirre,et al.  SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Crosslingual Focused Evaluation , 2017, *SEMEVAL.

[46]  Simone Paolo Ponzetto,et al.  Diachronic Analysis of German Parliamentary Proceedings: Ideological Shifts through the Lens of Political Biases , 2021, 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL).

[47]  François Yvon,et al.  A Study of Residual Adapters for Multi-Domain Neural Machine Translation , 2020, WMT.

[48]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[49]  Matthias Gallé,et al.  Monolingual Adapters for Zero-Shot Neural Machine Translation , 2020, EMNLP.

[50]  Hermann Ney,et al.  Pivot-based Transfer Learning for Neural Machine Translation between Non-English Languages , 2019, EMNLP.

[51]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[52]  Alan W Black,et al.  Measuring Bias in Contextualized Word Representations , 2019, Proceedings of the First Workshop on Gender Bias in Natural Language Processing.

[53]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.