Chatbot or Chat-Blocker: Predicting Chatbot Popularity before Deployment

Chatbots are widely employed in various scenarios. However, given the high costs of chatbot development and chatbots’ tremendous social influence, chatbot failures may inevitably lead to a huge economic loss. Previous chatbot evaluation frameworks rely heavily on human evaluation, lending little support for automatic early-stage chatbot examination prior to deployment. To reduce the risk of potential loss, we propose a computational approach to extracting features and training models that make a priori prediction about chatbots’ popularity, which indicates chatbot general performance. The features we extract cover chatbot Intent, Conversation Flow, and Response Design. We studied 1050 customer service chatbots on one of the most popular chatbot service platforms. Our model achieves 77.36% prediction accuracy among very popular and very unpopular chatbots, making the first step towards computational feedback before chatbot deployment. Our evaluation results also reveal the key design features associated with chatbot popularity and offer guidance on chatbot design.

[1]  Neha Kumar,et al.  Goodbye Text, Hello Emoji: Mobile Communication on WeChat in China , 2017, CHI.

[2]  Flavio Figueiredo,et al.  The Effect of Audiences on the User Experience with Conversational Interfaces in Physical Spaces , 2019, CHI.

[3]  Yunlong Wang,et al.  The Effect of Emojis when interacting with Conversational Interface Assisted Health Coaching System , 2018, PervasiveHealth.

[4]  Karolina Kuligowska,et al.  Commercial Chatbot: Performance Evaluation, Usability Metrics and Quality Standards of Embodied Conversational Agents , 2015 .

[5]  Jon Doyle,et al.  Doyle See Infer Choose Do Perceive Act , 2009 .

[6]  Joseph Weizenbaum,et al.  ELIZA—a computer program for the study of natural language communication between man and machine , 1966, CACM.

[7]  E A Smith,et al.  Automated readability index. , 1967, AMRL-TR. Aerospace Medical Research Laboratories.

[8]  Claudio S. Pinhanez,et al.  Typefaces and the Perception of Humanness in Natural Language Chatbots , 2017, CHI.

[9]  Oksana Smal,et al.  POLITICAL DISCOURSE CONTENT ANALYSIS: A CRITICAL OVERVIEW OF A COMPUTERIZED TEXT ANALYSIS PROGRAM LINGUISTIC INQUIRY AND WORD COUNT (LIWC) , 2020, Naukovì zapiski Nacìonalʹnogo unìversitetu «Ostrozʹka akademìâ». Serìâ «Fìlologìâ».

[10]  Morgan C. Benton,et al.  Evaluating Quality of Chatbots and Intelligent Conversational Agents , 2017, ArXiv.

[11]  H. Raiffa,et al.  Decisions with Multiple Objectives , 1993 .

[12]  Hadeel Al-Zubaide,et al.  OntBot: Ontology based chatbot , 2011, International Symposium on Innovations in Information and Communications Technology.

[13]  Zahra Ashktorab,et al.  Resilient Chatbots: Repair Strategy Preferences for Conversational Breakdowns , 2019, CHI.

[14]  Chris Callison-Burch,et al.  ChatEval: A Tool for Chatbot Evaluation , 2019, NAACL.

[15]  Jonathan Grudin,et al.  Chatbots, Humbots, and the Quest for Artificial General Intelligence , 2019, CHI.

[16]  Michael Vetter Quality aspects of bots , 2002 .

[17]  Joonhwan Lee,et al.  Comparing Data from Chatbot and Web Surveys: Effects of Platform and Conversational Style on Survey Response Quality , 2019, CHI.

[18]  Marie Clément,et al.  Applications of random forest feature selection for fine‐scale genetic population assignment , 2017, Evolutionary applications.

[19]  Oscar Díaz,et al.  A quality analysis of facebook messenger's most popular chatbots , 2018, SAC.

[20]  Anbang Xu,et al.  A New Chatbot for Customer Service on Social Media , 2017, CHI.

[21]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[22]  Chan Chun Ho,et al.  Developing a Chatbot for College Student Programme Advisement , 2018, 2018 International Symposium on Educational Technology (ISET).

[23]  Harry Shum,et al.  From Eliza to XiaoIce: challenges and opportunities with social chatbots , 2018, Frontiers of Information Technology & Electronic Engineering.

[24]  A. Canuto,et al.  Evaluation of Emotional Agents ' Architectures : an Approach Based on Quality Metrics and the Influence of Emotions on Users , 2022 .

[25]  R. Gunning The Fog Index After Twenty Years , 1969 .

[26]  Eric Steven Atwell,et al.  Different measurement metrics to evaluate a chatbot system , 2007, HLT-NAACL 2007.

[27]  Eric Atwell,et al.  Chatbots: Are they Really Useful? , 2007, LDV Forum.

[28]  Michael S. Bernstein,et al.  Empath: Understanding Topic Signals in Large-Scale Text , 2016, CHI.

[29]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[30]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[31]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[32]  R. L. Keeney,et al.  Decisions with Multiple Objectives: Preferences and Value Trade-Offs , 1977, IEEE Transactions on Systems, Man, and Cybernetics.

[33]  Rahul Goel,et al.  On Evaluating and Comparing Conversational Agents , 2018, ArXiv.

[34]  David Coniam The linguistic accuracy of chatbots: usability from an ESL perspective , 2014 .

[35]  H. N. Io,et al.  Chatbots and conversational agents: A bibliometric analysis , 2017, 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM).

[36]  Adam Kilgarriff,et al.  of the European Chapter of the Association for Computational Linguistics , 2006 .

[37]  Ong Sing Goh,et al.  A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems , 2007 .

[38]  Joachim M. Buhmann,et al.  The Balanced Accuracy and Its Posterior Distribution , 2010, 2010 20th International Conference on Pattern Recognition.