A taxonomy, data set, and benchmark for detecting and classifying malevolent dialogue responses

Conversational interfaces are increasingly popular as a way of connecting people to information. With the increased generative capacity of corpus‐based conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of detecting and classifying inappropriate content are mostly focused on a specific category of malevolence or on single sentences instead of an entire dialogue. We make three contributions to advance research on the malevolent dialogue response detection and classification (MDRDC) task. First, we define the task and present a hierarchical malevolent dialogue taxonomy. Second, we create a labeled multiturn dialogue data set and formulate the MDRDC task as a hierarchical classification task. Last, we apply state‐of‐the‐art text classification methods to the MDRDC task, and report on experiments aimed at assessing the performance of these approaches.

[1]  Masahiro Araki,et al.  Fatal or not? Finding errors that lead to dialogue breakdowns in chat-oriented dialogue systems , 2015, EMNLP.

[2]  Pengfei Wang,et al.  Hierarchical Matching Network for Crime Classification , 2019, SIGIR.

[3]  M. de Rijke,et al.  Hierarchical multi-label classification of social text streams , 2014, SIGIR.

[4]  Devon L. Greyson,et al.  The Social Informatics of Ignorance , 2019, J. Assoc. Inf. Sci. Technol..

[5]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[6]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[7]  M. de Rijke,et al.  TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation , 2020, ArXiv.

[8]  Zahra Ashktorab,et al.  Resilient Chatbots: Repair Strategy Preferences for Conversational Breakdowns , 2019, CHI.

[9]  Peter A. Gloor,et al.  In bot we trust: A new methodology of chatbot performance measures , 2019, Business Horizons.

[10]  Hai Zhao,et al.  Modeling Multi-turn Conversation with Deep Utterance Aggregation , 2018, COLING.

[11]  W. Lawson Autism spectrum conditions: the pathophysiological basis for inattention and the new Diagnostic and Statistical Manual of Mental Disorders (DSM-V) , 2013 .

[12]  Ralf Krestel,et al.  Challenges for Toxic Comment Classification: An In-Depth Error Analysis , 2018, ALW.

[13]  Paul N. Bennett,et al.  Generating Clarifying Questions for Information Retrieval , 2020, WWW.

[14]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[15]  Solon Barocas,et al.  Language (Technology) is Power: A Critical Survey of “Bias” in NLP , 2020, ACL.

[16]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[17]  R. Gonzalez Applied Multivariate Statistics for the Social Sciences , 2003 .

[18]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..

[19]  R. Mason Four ethical issues of the information age , 1986 .

[20]  武田 一哉,et al.  Recurrent Neural Networkに基づく日常生活行動認識 , 2016 .

[21]  Karolien Poels,et al.  Predicting Consumer Responses to a Chatbot on Facebook , 2018, Cyberpsychology Behav. Soc. Netw..

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Xiaoyu Shen,et al.  DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset , 2017, IJCNLP.

[24]  Ran Wang,et al.  To Tune or Not To Tune? How About the Best of Both Worlds? , 2019, ArXiv.

[25]  R. Bull,et al.  Detecting Deceit via Analysis of Verbal and Nonverbal Behavior , 2000 .

[26]  Barbara Poblete,et al.  Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation , 2019, SIGIR.

[27]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[28]  P S BaumerEric,et al.  Who is the "Human" in Human-Centered Machine Learning , 2019 .

[29]  Munmun De Choudhury,et al.  Who is the "Human" in Human-Centered Machine Learning , 2019, Proc. ACM Hum. Comput. Interact..

[30]  K. Á. T.,et al.  Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.

[31]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[32]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[33]  Yue Zhang,et al.  MuTual: A Dataset for Multi-Turn Dialogue Reasoning , 2020, ACL.

[34]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[35]  J. Henry,et al.  Immoral behaviour following brain damage: A review. , 2019, Journal of neuropsychology.

[36]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[37]  Zhoujun Li,et al.  Sequential Match Network: A New Architecture for Multi-turn Response Selection in Retrieval-based Chatbots , 2016, ArXiv.

[38]  Joelle Pineau,et al.  The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.

[39]  Jung-ran Park Linguistic politeness and face-work in computer mediated communication, Part 2: An application of the theoretical framework , 2008, J. Assoc. Inf. Sci. Technol..

[40]  ParkJung-ran Linguistic politeness and face-work in computer-mediated communication, Part 1: A theoretical framework , 2008 .

[41]  John Sabini,et al.  Ekman's basic emotions: Why not love and jealousy? , 2005 .

[42]  Brian Everitt,et al.  Applied multivariate statistics for the social sciences: James Stevens: Lawrence Erlbaum, Hillsdale, N.J. , 1989 .

[43]  Matthieu Cord,et al.  Addressing Failure Prediction by Learning Model Confidence , 2019, NeurIPS.

[44]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[45]  Jianxin Li,et al.  Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN , 2018, WWW.

[46]  Peter Henderson,et al.  Ethical Challenges in Data-Driven Dialogue Systems , 2017, AIES.

[47]  Jung-ran Park,et al.  Linguistic politeness and face-work in computer-mediated communication, Part 1: A theoretical framework , 2008, J. Assoc. Inf. Sci. Technol..

[48]  Pushpak Bhattacharyya,et al.  Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network , 2019, NAACL.

[49]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[50]  P. Ekman Are there basic emotions? , 1992, Psychological review.

[51]  Joanna Bryson,et al.  Standardizing Ethical Design for Artificial Intelligence and Autonomous Systems , 2017, Computer.

[52]  M. de Rijke,et al.  Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss , 2019, WWW.

[53]  Robert West,et al.  Churn Intent Detection in Multilingual Chatbot Conversations and Social Media , 2018, CoNLL.

[54]  Robert Graham,et al.  Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.

[55]  Dhiraj Amin,et al.  HATE SPEECH DETECTION , 2020 .

[56]  M. de Rijke,et al.  Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation , 2019, AAAI.

[57]  BaesensBart,et al.  To tune or not to tune , 2015 .

[58]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[59]  Xipeng Qiu,et al.  Recurrent Neural Network for Text Classification with MultiTask Learning , 2016 .

[60]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[61]  Peter Bruza,et al.  Human Information Interaction and the Cognitive Predicting Theory of Trust , 2020, CHIIR.

[62]  Emiel Krahmer,et al.  Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.

[63]  Justine Cassell,et al.  External manifestations of trustworthiness in the interface , 2000, CACM.

[64]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[65]  Munmun De Choudhury,et al.  A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media , 2019, FAT.

[66]  M. N. Sulaiman,et al.  A Review On Evaluation Metrics For Data Classification Evaluations , 2015 .