Revealing Persona Biases in Dialogue Systems

Dialogue systems in the form of chatbots and personal assistants are being increasingly integrated into people’s lives. Modern dialogue systems may consider adopting anthropomorphic personas, mimicking societal demographic groups to appear more approachable and trustworthy to users. However, the adoption of a persona can result in the adoption of biases. In this paper, we present the first large-scale study on persona biases in dialogue systems and conduct analyses on personas of different social classes, sexual orientations, races, and genders. We define persona biases as harmful differences in responses (e.g., varying levels of offensiveness, agreement with harmful statements) generated from adopting different demographic personas. Furthermore, we introduce an open-source framework, UNITPERSONABIAS, to explore and aggregate persona biases in dialogue systems. By analyzing the Blender and DialoGPT dialogue systems, we observe that adopting personas can actually decrease harmful responses, compared to not using any personas. Additionally, we find that persona choices can affect the degree of harms in generated responses and thus should be systematically evaluated before deployment. We also analyze how personas can result in different amounts of harm towards specific demographics.

[1]  Jason Weston,et al.  Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation , 2020, EMNLP.

[2]  Jieyu Zhao,et al.  Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods , 2018, NAACL.

[3]  Nanyun Peng,et al.  Societal Biases in Language Generation: Progress and Challenges , 2021, ACL/IJCNLP.

[4]  Joel R. Tetreault,et al.  Abusive Language Detection in Online User Content , 2016, WWW.

[5]  Jiliang Tang,et al.  Does Gender Matter? Towards Fairness in Dialogue Systems , 2020, COLING.

[6]  Yejin Choi,et al.  RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models , 2020, FINDINGS.

[7]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[8]  Yi Pan,et al.  Conversational AI: The Science Behind the Alexa Prize , 2018, ArXiv.

[9]  Preslav Nakov,et al.  Towards Automated Customer Support , 2018, AIMSA.

[10]  Julia Hirschberg,et al.  Detecting Hate Speech on the World Wide Web , 2012 .

[11]  Jason Weston,et al.  ParlAI: A Dialog Research Software Platform , 2017, EMNLP.

[12]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[13]  Nanyun Peng,et al.  The Woman Worked as a Babysitter: On Biases in Language Generation , 2019, EMNLP.

[14]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[15]  Jason Weston,et al.  Personalizing Dialogue Agents: I have a dog, do you have pets too? , 2018, ACL.

[16]  Peter Henderson,et al.  Ethical Challenges in Data-Driven Dialogue Systems , 2017, AIES.

[17]  Alan Ritter,et al.  Just Say No: Analyzing the Stance of Neural Dialogue Generation in Offensive Contexts , 2021, EMNLP.

[18]  Kai-Wei Chang,et al.  “Nice Try, Kiddo”: Investigating Ad Hominems in Dialogue Responses , 2021, NAACL.

[19]  Kai-Wei Chang,et al.  BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation , 2021, FAccT.

[20]  Björn Gambäck,et al.  Negation Scope Detection for Twitter Sentiment Analysis , 2015, WASSA@EMNLP.

[21]  Ellen Riloff,et al.  Sarcasm as Contrast between a Positive Sentiment and Negative Situation , 2013, EMNLP.

[22]  Mary Williamson,et al.  Recipes for Building an Open-Domain Chatbot , 2020, EACL.

[23]  Eric Gilbert,et al.  VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text , 2014, ICWSM.

[24]  Jiliang Tang,et al.  Mitigating Gender Bias for Neural Dialogue Generation with Adversarial Learning , 2020, EMNLP.

[25]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[26]  Jianfeng Gao,et al.  Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models , 2017, IJCNLP.

[27]  Rachel Rudinger,et al.  Gender Bias in Coreference Resolution , 2018, NAACL.

[28]  Anupam Datta,et al.  Gender Bias in Neural Natural Language Processing , 2018, Logic, Language, and Security.

[29]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[30]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[31]  Jason Weston,et al.  Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack , 2019, EMNLP.

[32]  Lucy Vasserman,et al.  Measuring and Mitigating Unintended Bias in Text Classification , 2018, AIES.