FairPy: A Toolkit for Evaluation of Social Biases and their Mitigation in Large Language Models

Studies have shown that large pretrained language models exhibit biases against social groups based on race, gender etc, which they inherit from the datasets they are trained on. Various researchers have proposed mathematical tools for quantifying and identifying these biases. There have been methods proposed to mitigate such biases. In this paper, we present a comprehensive quantitative evaluation of different kinds of biases such as race, gender, ethnicity, age etc. exhibited by popular pretrained language models such as BERT, GPT-2 etc. and also present a toolkit that provides plug-and-play interfaces to connect mathematical tools to identify biases with large pretrained language models such as BERT, GPT-2 etc. and also present users with the opportunity to test custom models against these metrics. The toolkit also allows users to debias existing and custom models using the debiasing techniques proposed so far. The toolkit is available at https://github.com/HrishikeshVish/Fairpy.

[1]  Yoav Goldberg,et al.  LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models , 2022, EMNLP.

[2]  Jia Yuan Yu,et al.  Reward modeling for mitigating toxicity in transformer-based language models , 2022, Applied Intelligence.

[3]  Siva Reddy,et al.  An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models , 2021, ACL.

[4]  Alice H. Oh,et al.  Mitigating Language-Dependent Ethnic Bias in BERT , 2021, EMNLP.

[5]  Liam Magee,et al.  Intersectional Bias in Causal Language Models , 2021, ArXiv.

[6]  Ruslan Salakhutdinov,et al.  Towards Understanding and Mitigating Social Biases in Language Models , 2021, ICML.

[7]  Goran Glavas,et al.  RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models , 2021, ACL.

[8]  Leonardo Neves,et al.  On Transferability of Bias Mitigation Effects in Language Model Fine-Tuning , 2021, NAACL.

[9]  Dirk Hovy,et al.  HONEST: Measuring Hurtful Sentence Completion in Language Models , 2021, NAACL.

[10]  Soroush Vosoughi,et al.  Mitigating Political Bias in Language Models Through Reinforced Calibration , 2021, AAAI.

[11]  Timo Schick,et al.  Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP , 2021, Transactions of the Association for Computational Linguistics.

[12]  Elias Benussi,et al.  Bias Out-of-the-Box: An Empirical Analysis of Intersectional Occupational Biases in Popular Generative Language Models , 2021, NeurIPS.

[13]  Malvina Nissim,et al.  Unmasking Contextual Stereotypes: Measuring and Mitigating BERT’s Gender Bias , 2020, GEBNLP.

[14]  Slav Petrov,et al.  Measuring and Reducing Gendered Correlations in Pre-trained Models , 2020, ArXiv.

[15]  Daniel Khashabi,et al.  UNQOVERing Stereotypical Biases via Underspecified Questions , 2020, FINDINGS.

[16]  Samuel R. Bowman,et al.  CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.

[17]  Aylin Caliskan,et al.  Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases , 2020, AIES.

[18]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[19]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[20]  Orestis Papakyriakopoulos,et al.  Bias in word embeddings , 2020, FAT*.

[21]  Jie M. Zhang,et al.  Automatic Testing and Improvement of Machine Translation , 2019, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[22]  Noah A. Smith,et al.  Evaluating Gender Bias in Machine Translation , 2019, ACL.

[23]  Yusu Qian,et al.  Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function , 2019, ACL.

[24]  Chandler May,et al.  On Measuring Social Biases in Sentence Encoders , 2019, NAACL.

[25]  Luís C. Lamb,et al.  Assessing gender bias in machine translation: a case study with Google Translate , 2018, Neural Computing and Applications.

[26]  Pascale Fung,et al.  Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.

[27]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[28]  Joanna Bryson,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[29]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[30]  R. Beran Minimum Hellinger distance estimates for parametric models , 1977 .

[31]  Federico Bianchi,et al.  Pipelines for Social Bias Testing of Large Language Models , 2022, BIGSCIENCE.

[32]  Navid Rekabsaz,et al.  Parameter Efficient Diff Pruning for Bias Mitigation , 2022, ArXiv.