Role Play Dialogue Aware Language Models Based on Conditional Hierarchical Recurrent Encoder-Decoder

We propose role play dialogue-aware language models (RPDALMs) that can leverage interactive contexts in role play multiturn dialogues for estimating the generative probability of words. Our motivation is to improve automatic speech recognition (ASR) performance in role play dialogues such as contact center dialogues and service center dialogues. Although long short-term memory recurrent neural network based language models (LSTM-RNN-LMs) can capture long-range contexts within an utterance, they cannot utilize sequential interactive information between speakers in multi-turn dialogues. Our idea is to explicitly leverage speakers’ roles of individual utterances, which are often available in role play dialogues, for neural language modeling. The RPDA-LMs are represented as a generative model conditioned by a role sequence of a target role play dialogue. We compose the RPDA-LMs by extending hierarchical recurrent encoder-decoder modeling so as to handle the role information. Our ASR evaluation in a contact center dialogue demonstrates that RPDA-LMs outperform LSTMRNN-LMs and document-context LMs in terms of perplexity and word error rate. In addition, we verify the effectiveness of explicitly taking interactive contexts into consideration.

[1]  Matthew R. Walter,et al.  Coherent Dialogue with Attention-Based Language Models , 2016, AAAI.

[2]  Kyunghyun Cho,et al.  Larger-Context Language Modelling with Recurrent Neural Network , 2015, ACL.

[3]  Jianfeng Gao,et al.  A Persona-Based Neural Conversation Model , 2016, ACL.

[4]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[5]  Ming Zhou,et al.  Hierarchical Recurrent Neural Network for Document Modeling , 2015, EMNLP.

[6]  Jeff A. Bilmes,et al.  Multi-Speaker Language Modeling , 2004, HLT-NAACL.

[7]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[8]  Akinori Ito,et al.  Combinations of various language model technologies including data expansion and adaptation in spontaneous speech recognition , 2015, INTERSPEECH.

[9]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[10]  Lucia Specia,et al.  Semi-Supervised Adaptation of RNNLMs by Fine-Tuning with Domain-Specific Auxiliary Features , 2017, INTERSPEECH.

[11]  Akinori Ito,et al.  Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation , 2016, IEICE Trans. Inf. Syst..

[12]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[13]  Yee Whye Teh,et al.  A Hierarchical Bayesian Language Model Based On Pitman-Yor Processes , 2006, ACL.

[14]  Jianfeng Gao,et al.  A Diversity-Promoting Objective Function for Neural Conversation Models , 2015, NAACL.

[15]  Naoki Sawada,et al.  Parallel Hierarchical Attention Networks with Shared Memory Reader for Multi-Stream Conversational Document Classification , 2017, INTERSPEECH.

[16]  Hang Li,et al.  Neural Responding Machine for Short-Text Conversation , 2015, ACL.

[17]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[18]  Joelle Pineau,et al.  Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus , 2017, Dialogue Discourse.

[19]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[20]  Bing Liu,et al.  Dialog context language modeling with recurrent neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Jakob Grue Simonsen,et al.  A Hierarchical Recurrent Encoder-Decoder for Generative Context-Aware Query Suggestion , 2015, CIKM.

[22]  Quoc V. Le,et al.  A Neural Conversational Model , 2015, ArXiv.

[23]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[24]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Satoshi Takahashi,et al.  Role play dialogue topic model for language model adaptation in multi-party conversation speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Atsushi Nakamura,et al.  Efficient WFST-Based One-Pass Decoding With On-The-Fly Hypothesis Rescoring in Extremely Large Vocabulary Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[27]  Nobuaki Minematsu,et al.  Use of Global and Acoustic Features Associated with Contextual Factors to Adapt Language Models for Spontaneous Speech Recognition , 2017, INTERSPEECH.

[28]  Tetsuji Ogawa,et al.  Exploiting end of sentences and speaker alternations in language modeling for multiparty conversations , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[29]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[30]  Yik-Cheung Tam,et al.  Unsupervised Latent Speaker Language Modeling , 2011, INTERSPEECH.

[31]  Mark J. F. Gales,et al.  Recurrent neural network language model adaptation for multi-genre broadcast speech recognition , 2015, INTERSPEECH.

[32]  John R. Hershey,et al.  Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs , 2016, INTERSPEECH.