Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems

© 2015 Association for Computational Linguistics. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact both on usability and perceived quality. Most NLG systems in common use employ rules and heuristics and tend to generate rigid and stylised responses without the natural variation of human language. They are also not easily scaled to systems covering multiple domains and languages. This paper presents a statistical language generator based on a semantically controlled Long Short-term Memory (LSTM) structure. The LSTM generator can learn from unaligned data by jointly optimising sentence planning and surface realisation using a simple cross entropy training criterion, and language variation can be easily achieved by sampling from output candidates. With fewer heuristics, an objective evaluation in two differing test domains showed the proposed method improved performance compared to previous methods. Human judges scored the LSTM system higher on informativeness and naturalness and overall preferred it to the other systems..

[1]  Karen Kukich,et al.  Where do Phrases Come from: Some Preliminary Experiments in Connectionist Phrase Generation , 1987 .

[2]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[3]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[4]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[7]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[8]  K. Paliwal,et al.  Bidirectional Recurrent Neural Networks-Signal Processing, IEEE Transactions on , 1998 .

[9]  Alexander I. Rudnicky,et al.  Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[10]  Marilyn A. Walker,et al.  Training a sentence planner for spoken dialogue using boosting , 2002, Comput. Speech Lang..

[11]  Adwait Ratnaparkhi,et al.  Trainable approaches to surface natural language generation and their application to conversational dialog systems , 2002, Comput. Speech Lang..

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Marilyn A. Walker,et al.  Trainable Sentence Planning for Complex Information Presentations in Spoken Dialog Systems , 2004, ACL.

[14]  Matthew Marge,et al.  Evaluating Evaluation Methods for Generation in the Presence of Variation , 2005, CICLing.

[15]  Marilyn A. Walker,et al.  Individual and Domain Adaptation in Sentence Planning for Dialogue , 2007, J. Artif. Intell. Res..

[16]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[17]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[18]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Oliver Lemon,et al.  Natural Language Generation as Planning Under Uncertainty for Spoken Dialogue Systems , 2009, EACL.

[20]  Martin Molina,et al.  Evaluating Automatic Extraction of Rules for Sentence Plan Construction , 2009, SIGDIAL Conference.

[21]  Milica Gasic,et al.  Phrase-Based Statistical Language Generation Using Graphical Models and Active Learning , 2010, ACL.

[22]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[23]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[24]  Lukás Burget,et al.  Extensions of recurrent neural network language model , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Tomas Mikolov,et al.  RNNLM - Recurrent Neural Network Language Modeling Toolkit , 2011 .

[26]  Marilyn A. Walker,et al.  Controlling User Perceptions of Linguistic Style: Trainable Generation of Personality Traits , 2011, CL.

[27]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[28]  Tara N. Sainath,et al.  FUNDAMENTAL TECHNOLOGIES IN MODERN SPEECH RECOGNITION Digital Object Identifier 10.1109/MSP.2012.2205597 , 2012 .

[29]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[30]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[31]  Tomohiro Nakatani,et al.  Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling? , 2013, INTERSPEECH.

[32]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[34]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[35]  Blake Howald,et al.  A Statistical NLG Framework for Aggregated Planning and Realization , 2013, ACL.

[36]  Steve J. Young,et al.  Stochastic Language Generation in Dialogue using Factored Language Models , 2014, Computational Linguistics.

[37]  Mirella Lapata,et al.  Chinese Poetry Generation with Recurrent Neural Networks , 2014, EMNLP.

[38]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[39]  Dongho Kim,et al.  Incremental on-line adaptation of POMDP-based dialogue managers to extended domains , 2014, INTERSPEECH.

[40]  Matthew Henderson,et al.  Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[41]  Geoffrey Zweig,et al.  Spoken language understanding using long short-term memory neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[42]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[43]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[44]  Hermann Ney,et al.  Translation Modeling with Bidirectional Recurrent Neural Networks , 2014, EMNLP.

[45]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[46]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[47]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[48]  Dongho Kim,et al.  Distributed dialogue policies for multi-domain statistical dialogue management , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).