Considerations on Creating Conversational Agents For Multiple Environments and Users

Advances in artificial intelligence algorithms and expansion of straightforward cloud-based platforms have enabled the adoption of conversational assistants by both, medium and large companies, to facilitate interaction between clients and employees. The interactions are possible through the use of ubiquitous devices (e.g., Amazon Echo, Apple HomePod, Google Nest), virtual assistants (e.g., Apple Siri, Google Assistant, Samsung Bixby, or Microsoft Cortana), chat windows on the corporate website, or social network applications (e.g. Facebook Messenger, Telegram, Slack, WeChat).Creating a useful, personalized conversational agent that is also robust and popular is nonetheless challenging work. It requires picking the right algorithm, framework, and/or communication channel, but perhaps more importantly, consideration of the specific task, user needs, environment, available training data, budget, and a thoughtful design.In this paper, we will consider the elements necessary to create a conversational agent for different types of users, environments, and tasks. The elements will account for the limited amount of data available for specific tasks within a company and for non-English languages. We are confident that we can provide a useful resource for the new practitioner developing an agent. We can point out novice problems/traps to avoid, create consciousness that the development of the technology is achievable despite comprehensive and significant challenges, and raise awareness about different ethical issues that may be associated with this technology. We have compiled our experience with deploying conversational systems for daily use in multicultural, multilingual, and intergenerational settings. Additionally, we will give insight on how to scale the proposed solutions.

[1]  Rejo Mathew,et al.  Review of Cloud-Based Natural Language Processing Services and Tools for Chatbots , 2018 .

[2]  Jonathan Le Roux,et al.  A Purely End-to-End System for Multi-speaker Speech Recognition , 2018, ACL.

[3]  Tara N. Sainath,et al.  State-of-the-Art Speech Recognition with Sequence-to-Sequence Models , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Arantxa Otegi,et al.  Survey on evaluation methods for dialogue systems , 2019, Artificial Intelligence Review.

[5]  Jason Weston,et al.  What makes a good conversation? How controllable attributes affect human judgments , 2019, NAACL.

[6]  Tuva Lunde Smestad Personality Matters! Improving The User Experience of Chatbot Interfaces - Personality provides a stable pattern to guide the design and behaviour of conversational agents , 2018 .

[7]  Navdeep Jaitly,et al.  RNN Approaches to Text Normalization: A Challenge , 2016, ArXiv.

[8]  Omar Khadeer Hussain,et al.  A Survey on Chatbot Implementation in Customer Service Industry through Deep Neural Networks , 2018, 2018 IEEE 15th International Conference on e-Business Engineering (ICEBE).

[9]  Lingjia Tang,et al.  Data Collection for Dialogue System: A Startup Perspective , 2018, NAACL-HLT.

[10]  Divya Gupta,et al.  An analysis on LPC, RASTA and MFCC techniques in Automatic Speech recognition system , 2016, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).

[11]  Thomas Wolf,et al.  Transfer Learning in Natural Language Processing , 2019, NAACL.

[12]  Abhishek Joshi,et al.  A Survey of Design Techniques for Conversational Agents , 2017 .

[13]  Joelle Pineau,et al.  A Survey of Available Corpora for Building Data-Driven Dialogue Systems , 2015, Dialogue Discourse.

[14]  Xiao-Bai Li,et al.  Anonymizing and Sharing Medical Text Records , 2017, Inf. Syst. Res..

[15]  Matthew Henderson,et al.  ConveRT: Efficient and Accurate Conversational Representations from Transformers , 2020, EMNLP.

[16]  Joelle Pineau,et al.  The Second Conversational Intelligence Challenge (ConvAI2) , 2019, The NeurIPS '18 Competition.

[17]  Dongyan Zhao,et al.  RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems , 2017, AAAI.

[18]  Matthew Henderson,et al.  Training Neural Response Selection for Task-Oriented Dialogue Systems , 2019, ACL.

[19]  Marin Litoiu,et al.  Chatbots as assistants: an architectural framework , 2017, CASCON.

[20]  G. B. Varile Multilingual Speech Processing , 2005 .

[21]  A. Følstad,et al.  Users' experiences with chatbots: findings from a questionnaire study , 2020, Quality and User Experience.

[22]  Rafael E. Banchs,et al.  Automatic Correction of ASR Outputs by Using Machine Translation , 2016, INTERSPEECH.

[23]  Prasanta Kumar Ghosh,et al.  Speech Enhancement Using Multiple Deep Neural Networks , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[25]  Dongyan Zhao,et al.  An Ensemble of Retrieval-Based and Generation-Based Human-Computer Conversation Systems , 2018, IJCAI.

[26]  J. F. Quesada,et al.  Review of spoken dialogue systems , 2014 .

[27]  Shasha Li,et al.  The effects of visual feedback designs on long wait time of mobile application user interface , 2019, Interact. Comput..

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Sanjeev Khudanpur,et al.  A study on data augmentation of reverberant speech for robust speech recognition , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30]  Adrian David Cheok,et al.  Open-Domain Neural Conversational Agents: The Step Towards Artificial General Intelligence , 2018 .

[31]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[32]  Biao Wu,et al.  Automated Scoring of Chatbot Responses in Conversational Dialogue , 2018, IWSDS.

[33]  Jozef Juhár,et al.  Anticipation in speech-based human-machine interfaces , 2018, 2018 9th IEEE International Conference on Cognitive Infocommunications (CogInfoCom).

[34]  James A. Landay,et al.  Comparing Speech and Keyboard Text Entry for Short Messages in Two Languages on Touchscreen Phones , 2016, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[35]  Verena Rieser,et al.  Benchmarking Natural Language Understanding Services for building Conversational Agents , 2019, IWSDS.

[36]  Marilyn A. Walker,et al.  Automatic Detection of Poor Speech Recognition at the Dialogue Level , 1999, ACL.

[37]  Svetlana Yarosh,et al.  Speech interface reformulations and voice assistant personification preferences of children and parents , 2019, Int. J. Child Comput. Interact..

[38]  Andrés Montoyo,et al.  Advances on natural language processing , 2007, Data Knowl. Eng..

[39]  Chris Donahue,et al.  Exploring Speech Enhancement with Generative Adversarial Networks for Robust Speech Recognition , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[40]  Jiliang Tang,et al.  Does Gender Matter? Towards Fairness in Dialogue Systems , 2020, COLING.

[41]  Xindong Wu,et al.  A Self-Adaptive Sliding Window Based Topic Model for Non-uniform Texts , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[42]  Christof Monz,et al.  Data Augmentation for Low-Resource Neural Machine Translation , 2017, ACL.

[43]  Hilde A. M. Voorveld,et al.  Privacy Concerns in Chatbot Interactions , 2019, CONVERSATIONS.

[44]  Heiga Zen,et al.  Speech Research at Google to Enable Universal Speech Interfaces , 2017, New Era for Robust Speech Recognition, Exploiting Deep Learning.

[45]  John H. L. Hansen,et al.  Microphone Array Processing Strategies for Distant-Based Automatic Speech Recognition , 2016, IEEE Signal Processing Letters.

[46]  Jennifer Marlow,et al.  Designing for Workplace Reflection: A Chat and Voice-Based Conversational Agent , 2018, Conference on Designing Interactive Systems.

[47]  Haizhou Li,et al.  Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics , 2019, Comput. Speech Lang..

[48]  Tao Zhang,et al.  A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.

[49]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[50]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[51]  Jiliang Tang,et al.  A Survey on Dialogue Systems: Recent Advances and New Frontiers , 2017, SKDD.

[52]  Yiming Wang,et al.  Recurrent Neural Network Language Model Adaptation for Conversational Speech Recognition , 2018, INTERSPEECH.

[53]  Alan W. Black,et al.  Data Augmentation for Neural Online Chats Response Selection , 2018, SCAI@EMNLP.

[54]  František Dařena,et al.  Chatbots for Enterprises: Outlook , 2019, Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis.

[55]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[56]  Chirag Jain,et al.  Production Ready Chatbots: Generate if not Retrieve , 2018, AAAI Workshops.

[57]  Dominika Kaczorowska-Spychalska How chatbots influence marketing , 2019, Management.

[58]  Shruti Palaskar,et al.  ASR Error Correction and Domain Adaptation Using Machine Translation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[59]  Tanja Schultz,et al.  Automatic speech recognition for under-resourced languages: A survey , 2014, Speech Commun..

[60]  Diego Giuliani,et al.  Deep-neural network approaches for speech recognition with heterogeneous groups of speakers including children† , 2016, Natural Language Engineering.

[61]  Keith W. Miller,et al.  Why we should have seen that coming: comments on Microsoft's tay "experiment," and wider implications , 2017, CSOC.

[62]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[63]  Mo Wang,et al.  Subjective Annotation and Evaluation of Three Different Chatbots WOCHAT: Shared Task Report , 2018, IWSDS.

[64]  Lucy Vasserman,et al.  Contextual language model adaptation using dynamic classes , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[65]  Jason Yosinski,et al.  Plug and Play Language Models: A Simple Approach to Controlled Text Generation , 2020, ICLR.

[66]  Father Hacker No.116 News 11 @bullet Membership Report the Newsletter of the Society for the Study of Artificial Intelligence and Simulation of Behaviour , 2022 .

[67]  Hagen Soltau,et al.  Joint Speech Recognition and Speaker Diarization via Sequence Transduction , 2019, INTERSPEECH.

[68]  Maxine Eskenazi,et al.  USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation , 2020, ACL.

[69]  Kevin Gimpel,et al.  Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations , 2017, ArXiv.

[70]  Matthias Peissner,et al.  Voice User Interface Design , 2004, UP.

[71]  Yijia Liu,et al.  Sequence-to-Sequence Data Augmentation for Dialogue Language Understanding , 2018, COLING.