Type of Response Selection utilizing User Utterance Word Sequence, LSTM and Multi-task Learning for Chat-like Spoken Dialog Systems

This paper describes a method of automatically selecting types of responses, such as back-channel responses, changing the topic or expanding the topic, in conversational spoken dialog systems by using an LSTM-RNN-based encoder-decoder framework and multi-task learning. In our dialog system architecture, response utterances are generated after the response type is explicitly determined in order to generate more appropriate and cooperative response than the conventional end-to-end approach which generate response utterances directly. As a response type selector, an encoder and two decoders share states of hidden layers and are trained with the interpolated loss function of the two decoders. One of the decoders is for selecting types of responses and the other is for estimating the word sequence of the response utterances. In an evaluation experiment using a corpus of dialogs between elderly people and an interviewer, our proposed method achieved better performance than the standard method using single-task learning, especially when the amount of training data was limited.

[1]  Quoc V. Le,et al.  Multi-task Sequence to Sequence Learning , 2015, ICLR.

[2]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[3]  D. Fox,et al.  Towards Personal Service Robots for the Elderly , 1999 .

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Shinji Watanabe,et al.  Joint CTC-attention based end-to-end speech recognition using multi-task learning , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Kengo Ohta,et al.  Selecting type of response for chat-like spoken dialogue systems based on acoustic features of user utterances , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[9]  Hanae Koiso,et al.  Survey of Conversational Behavior: Towards the Design of a Balanced Corpus of Everyday Japanese Conversation , 2016, LREC.

[10]  Tatsuya Kawahara,et al.  Talking with ERICA, an autonomous android , 2016, SIGDIAL Conference.

[11]  Kenichi Takahashi,et al.  Neural Utterance Ranking Model for Conversational Dialogue Systems , 2016, SIGDIAL Conference.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.