Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology

Conversational interfaces are increasingly popular as a way of connecting people to information. Corpus-based conversational interfaces are able to generate more diverse and natural responses than template-based or retrieval-based agents. With their increased generative capacity of corpusbased conversational agents comes the need to classify and filter out malevolent responses that are inappropriate in terms of content and dialogue acts. Previous studies on the topic of recognizing and classifying inappropriate content are mostly focused on a certain category of malevolence or on single sentences instead of an entire dialogue. In this paper, we define the task of Malevolent Dialogue Response Detection and Classification (MDRDC). We make three contributions to advance research on this task. First, we present a Hierarchical Malevolent Dialogue Taxonomy (HMDT). Second, we create a labelled multi-turn dialogue dataset and formulate the MDRDC task as a hierarchical classification task over this taxonomy. Third, we apply stateof-the-art text classification methods to the MDRDC task and report on extensive experiments aimed at assessing the performance of these approaches.

[1]  P. Ekman Are there basic emotions? , 1992, Psychological review.

[2]  Emiel Krahmer,et al.  Squibs and Discussions: Real versus Template-Based Natural Language Generation: A False Opposition? , 2005, CL.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Mathias B. Freese Where Is the Human , 1971 .

[5]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[6]  M. de Rijke,et al.  Improving Neural Response Diversity with Frequency-Aware Cross-Entropy Loss , 2019, WWW.

[7]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[8]  K. Á. T.,et al.  Towards a tool for the Subjective Assessment of Speech System Interfaces (SASSI) , 2000, Natural Language Engineering.

[9]  Jason Weston,et al.  Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.

[10]  E. Valuations A REVIEW ON EVALUATION METRICS FOR DATA CLASSIFICATION EVALUATIONS , 2015 .

[11]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[12]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[13]  Ralf Krestel,et al.  Challenges for Toxic Comment Classification: An In-Depth Error Analysis , 2018, ALW.

[14]  Paolo Rosso,et al.  SemEval-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in Twitter , 2019, *SEMEVAL.

[15]  R. Gonzalez Applied Multivariate Statistics for the Social Sciences , 2003 .

[16]  Justine Cassell,et al.  External manifestations of trustworthiness in the interface , 2000, CACM.

[17]  Zahra Ashktorab,et al.  Resilient Chatbots: Repair Strategy Preferences for Conversational Breakdowns , 2019, CHI.

[18]  Peter Bruza,et al.  Human Information Interaction and the Cognitive Predicting Theory of Trust , 2020, CHIIR.

[19]  Ran Wang,et al.  To Tune or Not To Tune? How About the Best of Both Worlds? , 2019, ArXiv.

[20]  Filip Radlinski,et al.  A Theoretical Framework for Conversational Search , 2017, CHIIR.

[21]  Masahiro Araki,et al.  Fatal or not? Finding errors that lead to dialogue breakdowns in chat-oriented dialogue systems , 2015, EMNLP.

[22]  J. Henry,et al.  Immoral behaviour following brain damage: A review. , 2019, Journal of neuropsychology.

[23]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[24]  Pengfei Wang,et al.  Hierarchical Matching Network for Crime Classification , 2019, SIGIR.

[25]  D. Paulhus,et al.  The Dark Triad of personality: Narcissism, Machiavellianism, and psychopathy , 2002 .

[26]  Paul N. Bennett,et al.  Generating Clarifying Questions for Information Retrieval , 2020, WWW.

[27]  Preslav Nakov,et al.  Predicting the Type and Target of Offensive Posts in Social Media , 2019, NAACL.

[28]  Dirk Hovy,et al.  Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter , 2016, NAACL.

[29]  Xuanjing Huang,et al.  Recurrent Neural Network for Text Classification with Multi-Task Learning , 2016, IJCAI.

[30]  Alan Ritter,et al.  Data-Driven Response Generation in Social Media , 2011, EMNLP.

[31]  John Sabini,et al.  Ekman's basic emotions: Why not love and jealousy? , 2005 .

[32]  Gregory J. Park,et al.  Predicting Dark Triad Personality Traits from Twitter Usage and a Linguistic Analysis of Tweets , 2012, 2012 11th International Conference on Machine Learning and Applications.

[33]  R. Mason Four ethical issues of the information age , 1986 .

[34]  Barbara Poblete,et al.  Hate Speech Detection is Not as Easy as You May Think: A Closer Look at Model Validation , 2019, SIGIR.

[35]  Emine Yilmaz,et al.  Research Frontiers in Information Retrieval Report from the Third Strategic Workshop on Information Retrieval in Lorne (SWIRL 2018) , 2018 .

[36]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[37]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[40]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[41]  W. Lawson Autism spectrum conditions: the pathophysiological basis for inattention and the new Diagnostic and Statistical Manual of Mental Disorders (DSM-V) , 2013 .

[42]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[43]  R. Bull,et al.  Detecting Deceit via Analysis of Verbal and Nonverbal Behavior , 2000 .

[44]  Ritesh Kumar,et al.  Benchmarking Aggression Identification in Social Media , 2018, TRAC@COLING 2018.

[45]  BaesensBart,et al.  To tune or not to tune , 2015 .

[46]  M. de Rijke,et al.  TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation , 2020, ArXiv.

[47]  Jung-ran Park,et al.  Linguistic politeness and face-work in computer-mediated communication, Part 1: A theoretical framework , 2008, J. Assoc. Inf. Sci. Technol..

[48]  Jung-ran Park Linguistic politeness and face-work in computer mediated communication, Part 2: An application of the theoretical framework , 2008, J. Assoc. Inf. Sci. Technol..

[49]  Munmun De Choudhury,et al.  A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media , 2019, FAT.

[50]  Jianxin Li,et al.  Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN , 2018, WWW.

[51]  P S BaumerEric,et al.  Who is the "Human" in Human-Centered Machine Learning , 2019 .

[52]  Xipeng Qiu,et al.  Recurrent Neural Network for Text Classification with MultiTask Learning , 2016 .

[53]  Joanna Bryson,et al.  Standardizing Ethical Design for Artificial Intelligence and Autonomous Systems , 2017, Computer.

[54]  Pushpak Bhattacharyya,et al.  Courteously Yours: Inducing courteous behavior in Customer Care responses using Reinforced Pointer Generator Network , 2019, NAACL.

[55]  Janet B W Williams,et al.  Diagnostic and Statistical Manual of Mental Disorders , 2013 .

[56]  Peter Henderson,et al.  Ethical Challenges in Data-Driven Dialogue Systems , 2017, AIES.

[57]  Devon L. Greyson,et al.  The Social Informatics of Ignorance , 2019, J. Assoc. Inf. Sci. Technol..

[58]  Alan Ritter,et al.  Unsupervised Modeling of Twitter Conversations , 2010, NAACL.

[59]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[60]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[61]  Peter A. Gloor,et al.  In bot we trust: A new methodology of chatbot performance measures , 2019, Business Horizons.

[62]  Robert West,et al.  Churn Intent Detection in Multilingual Chatbot Conversations and Social Media , 2018, CoNLL.

[63]  M. de Rijke,et al.  Thinking Globally, Acting Locally: Distantly Supervised Global-to-Local Knowledge Selection for Background Based Conversation , 2019, AAAI.

[64]  Karolien Poels,et al.  Predicting Consumer Responses to a Chatbot on Facebook , 2018, Cyberpsychology Behav. Soc. Netw..

[65]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[66]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..