Automated Scoring of Chatbot Responses in Conversational Dialogue

Rapid advancement in natural language processing (NLP) and machine learning has led to the recent development of many chatbot systems using various algorithms. However, in a conversational dialogue setting, creating a system to communicate with humans in a meaningful and coherent manner remains a challenging task. Furthermore, it is very difficult even for humans to evaluate the responses of a chatbot system given the context of the conversation. In this paper, we will focus on the problem of automatically evaluating and scoring the quality of chatbot responses in human-chatbot dialogue settings. We propose a novel approach of combining the word representations of human and chatbot responses, and using machine learning algorithms, such as support vector machines (SVM), random forests (RF), and neural networks (NN) to learn the quality of the chatbot responses. Our experimental results show that our proposed approach is able to perform well.

[1]  Haizhou Li,et al.  Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking , 2016, ACL.

[2]  Joelle Pineau,et al.  Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses , 2017, ACL.

[3]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[4]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[5]  Kevin Gimpel,et al.  Tailoring Continuous Word Representations for Dependency Parsing , 2014, ACL.

[6]  Wei Xu,et al.  Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[8]  Haizhou Li,et al.  IRIS: a Chat-oriented Dialogue System based on the Vector Space Model , 2012, ACL.

[9]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[10]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[11]  Yoshua Bengio,et al.  Equilibrated adaptive learning rates for non-convex optimization , 2015, NIPS.

[12]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[13]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[14]  Zhou Yu,et al.  TickTock: A Non-Goal-Oriented Multimodal Dialog System with Engagement Awareness , 2015, AAAI Spring Symposia.

[15]  Hwee Tou Ng,et al.  A Neural Approach to Automated Essay Scoring , 2016, EMNLP.

[16]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[17]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[18]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[19]  Joelle Pineau,et al.  How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation , 2016, EMNLP.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[23]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Guillaume Dubuisson Duplessis,et al.  Purely Corpus-based Automatic Conversation Authoring , 2016, LREC.

[25]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[26]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[27]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[28]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[29]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL 2006.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Yunming Ye,et al.  An Improved Random Forest Classifier for Text Categorization , 2012, J. Comput..