CHBias: Bias Evaluation and Mitigation of Chinese Conversational Language Models

redWarning: This paper contains content that may be offensive or upsetting.Pretrained conversational agents have been exposed to safety issues, exhibiting a range of stereotypical human biases such as gender bias. However, there are still limited bias categories in current research, and most of them only focus on English. In this paper, we introduce a new Chinese dataset, CHBias, for bias evaluation and mitigation of Chinese conversational language models.Apart from those previous well-explored bias categories, CHBias includes under-explored bias categories, such as ageism and appearance biases, which received less attention. We evaluate two popular pretrained Chinese conversational models, CDial-GPT and EVA2.0, using CHBias. Furthermore, to mitigate different biases, we apply several debiasing methods to the Chinese pretrained models. Experimental results show that these Chinese pretrained models are potentially risky for generating texts that contain social biases, and debiasing methods using the proposed dataset can make response generation less biased while preserving the models’ conversational capabilities.

[1]  Minlie Huang,et al.  EVA2.0: Investigating Open-domain Chinese Dialogue Systems with Large-scale Pre-training , 2022, Machine Intelligence Research.

[2]  Yitong Li,et al.  Towards Identifying Social Bias in Dialog Systems: Framework, Dataset, and Benchmark , 2022, EMNLP.

[3]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[4]  Xiaoyan Zhu,et al.  EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training , 2021, ArXiv.

[5]  Goran Glavas,et al.  RedditBias: A Real-World Resource for Bias Evaluation and Debiasing of Conversational Language Models , 2021, ACL.

[6]  Kai-Wei Chang,et al.  Societal Biases in Language Generation: Progress and Challenges , 2021, ACL.

[7]  Eduard Hovy,et al.  A Survey of Data Augmentation Approaches for NLP , 2021, FINDINGS.

[8]  Chen Li,et al.  NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation , 2021, AAAI.

[9]  M. Hutson Robo-writers: the rise and risks of language-generating AI , 2021, Nature.

[10]  Samuel R. Bowman,et al.  CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models , 2020, EMNLP.

[11]  Hui Liu,et al.  Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning , 2020, EMNLP.

[12]  Minlie Huang,et al.  A Large-Scale Chinese Short-Text Conversation Dataset , 2020, NLPCC.

[13]  Catherine Yeo,et al.  Defining and Evaluating Fair Natural Language Generation , 2020, WINLP.

[14]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[15]  Zheng-Yu Niu,et al.  Towards Conversational Recommendation over Multi-Type Dialogs , 2020, ACL.

[16]  Siva Reddy,et al.  StereoSet: Measuring stereotypical bias in pretrained language models , 2020, ACL.

[17]  Minlie Huang,et al.  KdConv: A Chinese Multi-domain Dialogue Dataset Towards Multi-turn Knowledge-driven Conversation , 2020, ACL.

[18]  J. Weston,et al.  Queens Are Powerful Too: Mitigating Gender Bias in Dialogue Generation , 2019, EMNLP.

[19]  Jianfeng Gao,et al.  DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.

[20]  Hua Wu,et al.  PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable , 2019, ACL.

[21]  Jiliang Tang,et al.  Does Gender Matter? Towards Fairness in Dialogue Systems , 2019, COLING.

[22]  Simone Paolo Ponzetto,et al.  A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces , 2019, AAAI.

[23]  Nanyun Peng,et al.  The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[24]  Nayeon Lee,et al.  Exploring Social Bias in Chatbots using Stereotype Knowledge , 2019, WNLP@ACL.

[25]  Xiyuan Zhang,et al.  Proactive Human-Machine Conversation with Explicit Conversation Goal , 2019, ACL.

[26]  Yusu Qian,et al.  Reducing Gender Bias in Word-Level Language Models with a Gender-Equalizing Loss Function , 2019, ACL.

[27]  Shikha Bordia,et al.  Identifying and Reducing Gender Bias in Word-Level Language Models , 2019, NAACL.

[28]  Jason Weston,et al.  Learning to Speak and Act in a Fantasy Text Adventure Game , 2019, EMNLP.

[29]  Pascale Fung,et al.  Reducing Gender Bias in Abusive Language Detection , 2018, EMNLP.

[30]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[31]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[32]  Joanna Bryson,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[33]  Lyle H. Ungar,et al.  Analyzing Biases in Human Perception of User Age and Gender from Text , 2016, ACL.

[34]  Jörg Tiedemann,et al.  OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles , 2016, LREC.

[35]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[36]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Vasile Rus,et al.  An Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, ITS.

[39]  Deborah L. Rhode The Beauty Bias: The Injustice of Appearance in Life and Law , 2010 .

[40]  C. Krekula The Intersection of Age and Gender , 2007 .

[41]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[42]  A. Greenwald,et al.  Measuring individual differences in implicit cognition: the implicit association test. , 1998, Journal of personality and social psychology.

[43]  Meng Fang,et al.  Stay Moral and Explore: Learn to Behave Morally in Text-based Games , 2023, ICLR.

[44]  Xiaoxi Mao,et al.  LOT: A Benchmark for Evaluating Chinese Long Text Understanding and Generation , 2021, ArXiv.

[45]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[46]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[47]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .