Time to Transfer: Predicting and Evaluating Machine-Human Chatting Handoff

Is chatbot able to completely replace the human agent? The short answer could be - "it depends...". For some challenging cases, e.g., dialogue's topical spectrum spreads beyond the training corpus coverage, the chatbot may malfunction and return unsatisfied utterances. This problem can be addressed by introducing the Machine-Human Chatting Handoff (MHCH), which enables human-algorithm collaboration. To detect the normal/transferable utterances, we propose a Difficulty-Assisted Matching Inference (DAMI) network, utilizing difficulty-assisted encoding to enhance the representations of utterances. Moreover, a matching inference mechanism is introduced to capture the contextual matching features. A new evaluation metric, Golden Transfer within Tolerance (GT-T), is proposed to assess the performance by considering the tolerance property of the MHCH. To provide insights into the task and validate the proposed model, we collect two new datasets. Extensive experimental results are presented and contrasted against a series of baseline models to demonstrate the efficacy of our model on MHCH.

[1]  Biplav Srivastava,et al.  A Measure for Dialog Complexity and its Application in Streamlining Service Operations , 2017, ArXiv.

[2]  Joel R. Tetreault,et al.  Dialogue Act Classification with Context-Aware Self-Attention , 2019, NAACL.

[3]  Rada Mihalcea,et al.  DialogueRNN: An Attentive RNN for Emotion Detection in Conversations , 2018, AAAI.

[4]  Jatin Ganhotra,et al.  Learning End-to-End Goal-Oriented Dialog with Maximal User Task Success and Minimal Human Agent Use , 2019, Transactions of the Association for Computational Linguistics.

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Deng Cai,et al.  Dialogue Act Recognition via CRF-Attentive Structured Network , 2017, SIGIR.

[7]  Ting-Hao Huang,et al.  Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time , 2018, CHI.

[8]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[9]  Qiong Zhang,et al.  Using Customer Service Dialogues for Satisfaction Analysis with Context-Assisted Multiple Instance Learning , 2019, EMNLP.

[10]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[11]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[12]  Ming Zhou,et al.  Hierarchical Recurrent Neural Network for Document Modeling , 2015, EMNLP.

[13]  Qian Liu,et al.  You Impress Me: Dialogue Generation via Mutual Persona Perception , 2020, ACL.

[14]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[15]  Walt Detmar Meurers,et al.  Readability Classification for German using Lexical, Syntactic, and Morphological Features , 2012, COLING.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Da Luo,et al.  Multi-Turn Response Selection for Chatbots With Hierarchical Aggregation Network of Multi-Representation , 2019, IEEE Access.

[18]  Walt Detmar Meurers,et al.  Word frequency and readability: Predicting the text-level readability with a lexical-level attribute , 2018 .

[19]  Guodong Zhou,et al.  Sentiment Classification towards Question-Answering with Hierarchical Matching Network , 2018, EMNLP.

[20]  Yan Wang,et al.  Contextualized Emotion Recognition in Conversation as Sequence Tagging , 2020, SIGDIAL.

[21]  Paolo Soda,et al.  A multi-objective optimisation approach for class imbalance learning , 2011, Pattern Recognit..

[22]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[23]  Kevyn Collins-Thompson,et al.  A Language Modeling Approach to Predicting Reading Difficulty , 2004, NAACL.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  F. Pukelsheim The Three Sigma Rule , 1994 .

[26]  Daoxu Chen,et al.  Enriching Word Embeddings with Domain Knowledge for Readability Assessment , 2018, COLING.

[27]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[28]  Morgan C. Benton,et al.  Evaluating Quality of Chatbots and Intelligent Conversational Agents , 2017, ArXiv.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Grace Hui Yang,et al.  Modeling Long-Range Context for Concurrent Dialogue Acts Recognition , 2019, CIKM.

[31]  Rebekah George Benjamin Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[32]  Yuan Qi,et al.  Local Contextual Attention with Hierarchical Structure for Dialogue Act Recognition , 2020, ArXiv.

[33]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[34]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[35]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[36]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[37]  Dong Yu,et al.  Multi-turn Inference Matching Network for Natural Language Inference , 2018, NLPCC.

[38]  Harshit Kumar,et al.  Dialogue Act Sequence Labeling using Hierarchical encoder with CRF , 2017, AAAI.