An empirical assessment of deep learning approaches to task-oriented dialog management

Abstract Deep learning is providing very positive results in areas related to conversational interfaces, such as speech recognition, but its potential benefit for dialog management has still not been fully studied. In this paper, we perform an assessment of different configurations for deep-learned dialog management with three dialog corpora from different application domains and varying in size, dimensionality and possible system responses. Our results have allowed us to identify several aspects that can have an impact on accuracy, including the approaches used for feature extraction, input representation, context consideration and the hyper-parameters of the deep neural networks employed.

[1]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[3]  Gary Geunbae Lee,et al.  Hybrid approach to robust dialog management using agenda and dialog examples , 2010, Comput. Speech Lang..

[4]  Ramón López-Cózar,et al.  A domain-independent statistical methodology for dialog management in spoken dialog systems , 2014, Comput. Speech Lang..

[5]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management , 2008, SIGDIAL.

[6]  Gary Geunbae Lee,et al.  Recent Approaches to Dialog Management for Spoken Dialog Systems , 2010, J. Comput. Sci. Eng..

[7]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Andreas Stolcke,et al.  Recurrent neural network and LSTM models for lexical utterance classification , 2015, INTERSPEECH.

[9]  Antoine Raux,et al.  The Dialog State Tracking Challenge , 2013, SIGDIAL Conference.

[10]  Heriberto Cuayáhuitl,et al.  A Data-Efficient Deep Learning Approach for Deployable Multimodal Social Robots , 2019, Neurocomputing.

[11]  Milica Gasic,et al.  Bayesian dialogue system for the Let's Go Spoken Dialogue Challenge , 2010, 2010 IEEE Spoken Language Technology Workshop.

[12]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[13]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[14]  Roberto Pieraccini,et al.  The use of belief networks for mixed-initiative dialog modeling , 2000, IEEE Trans. Speech Audio Process..

[15]  Satoshi Nakamura,et al.  Recent advances in WFST-based dialog system , 2009, INTERSPEECH.

[16]  Iñigo Casanueva,et al.  Deep Learning for Conversational AI , 2018, NAACL.

[17]  Yun-Nung Chen,et al.  Natural Language Generation by Hierarchical Decoding with Linguistic Patterns , 2018, NAACL.

[18]  Klaus-Peter Engelbrecht,et al.  Assessment of user simulators for spoken dialogue systems by means of subspace multidimensional clustering , 2012, INTERSPEECH.

[19]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[20]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[21]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[22]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[23]  Tsung-Hsien Wen,et al.  Neural Belief Tracker: Data-Driven Dialogue State Tracking , 2016, ACL.

[24]  Larry P. Heck,et al.  Domain Adaptation of Recurrent Neural Networks for Natural Language Understanding , 2016, INTERSPEECH.

[25]  Ramón López-Cózar,et al.  Using knowledge of misunderstandings to increase the robustness of spoken dialogue systems , 2010, Knowl. Based Syst..

[26]  Stefan Ultes,et al.  Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management , 2017, SIGDIAL Conference.

[27]  Maxine Eskénazi,et al.  Let's go public! taking a spoken dialog system to the real world , 2005, INTERSPEECH.

[28]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[29]  H. Cuayahuitl,et al.  Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[30]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[31]  Wolfgang Minker,et al.  A Parameterized and Annotated Spoken Dialog Corpus of the CMU Let’s Go Bus Information System , 2012, LREC.

[32]  Pei-hao Su,et al.  Reward estimation for dialogue policy optimisation , 2018, Comput. Speech Lang..

[33]  Maxine Eskénazi,et al.  Doing research on a deployed spoken dialogue system: one year of let's go! experience , 2006, INTERSPEECH.

[34]  Satoshi Nakamura,et al.  Statistical dialog management applied to WFST-based dialog systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[36]  Gary Geunbae Lee,et al.  Natural Language Dialog Systems and Intelligent Assistants , 2015, Springer International Publishing.

[37]  Pierre Lison,et al.  A hybrid approach to dialogue management based on probabilistic rules , 2015, Comput. Speech Lang..

[38]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[39]  Oliver Lemon,et al.  Reinforcement Learning for Adaptive Dialogue Systems - A Data-driven Methodology for Dialogue Management and Natural Language Generation , 2011, Theory and Applications of Natural Language Processing.

[40]  Hongjie Shi,et al.  Convolutional Neural Networks for Multi-topic Dialog State Tracking , 2016, IWSDS.

[41]  David Vandyke,et al.  Reward Shaping with Recurrent Neural Networks for Speeding up On-Line Policy Learning in Spoken Dialogue Systems , 2015, SIGDIAL Conference.

[42]  Chin-Hui Lee,et al.  A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition , 2016, Neurocomputing.

[43]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[44]  Gary Geunbae Lee,et al.  Example-based dialog modeling for practical multi-domain dialog system , 2009, Speech Commun..

[45]  Maxine Eskénazi,et al.  Spoken Dialog Challenge 2010 , 2010, 2010 IEEE Spoken Language Technology Workshop.

[46]  David Griol,et al.  A stochastic finite-state transducer approach to spoken dialog management , 2010, INTERSPEECH.

[47]  Roberto Pieraccini The Voice in the Machine: Building Computers That Understand Speech , 2012 .

[48]  Peter A. Heeman Combining Reinformation Learning with Information-State Update Rules , 2007, HLT-NAACL.

[49]  David Vandyke,et al.  Multi-domain Dialog State Tracking using Recurrent Neural Networks , 2015, ACL.

[50]  Dong Yu,et al.  Conversational Speech Transcription Using Context-Dependent Deep Neural Networks , 2012, ICML.

[51]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[52]  Dilek Z. Hakkani-Tür,et al.  Interactive reinforcement learning for task-oriented dialogue management , 2016 .

[53]  Jason D. Williams,et al.  The best of both worlds: unifying conventional dialog systems and POMDPs , 2008, INTERSPEECH.

[54]  Bowen Zhou,et al.  Neural Models for Sequence Chunking , 2017, AAAI.

[55]  Laila Dybkjær,et al.  Recent trends in discourse and dialogue , 2008 .

[56]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[57]  David Griol,et al.  A statistical approach to spoken dialog systems design and evaluation , 2008, Speech Commun..

[58]  Maxine Eskénazi,et al.  From rule-based to data-driven lexical entrainment models in spoken dialog systems , 2015, Comput. Speech Lang..

[59]  David Vandyke,et al.  Stochastic Language Generation in Dialogue using Recurrent Neural Networks with Convolutional Sentence Reranking , 2015, SIGDIAL Conference.

[60]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[61]  David Traum,et al.  The Information State Approach to Dialogue Management , 2003 .

[62]  Yasuyoshi Inagaki,et al.  Example-based Spoken Dialogue System using WOZ System Log , 2003, SIGDIAL Workshop.

[63]  Helen F. Hastie,et al.  “Let's Go, DUDE!” using the Spoken Dialogue Challenge to teach Spoken Dialogue development , 2010, 2010 IEEE Spoken Language Technology Workshop.

[64]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Encarna Segarra,et al.  Multilingual Spoken Language Understanding using graphs and multiple translations , 2016, Comput. Speech Lang..

[66]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[67]  Dong Yu,et al.  Deep Learning: Methods and Applications , 2014, Found. Trends Signal Process..

[68]  Arpit Gupta,et al.  Scaling Multi-Domain Dialogue State Tracking via Query Reformulation , 2019, NAACL.

[69]  David Griol,et al.  The Conversational Interface: Talking to Smart Devices , 2016 .

[70]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[71]  Oliver Lemon,et al.  Recent research advances in Reinforcement Learning in Spoken Dialogue Systems , 2009, The Knowledge Engineering Review.

[72]  Cynthia Breazeal,et al.  Real-Time Interactive Reinforcement Learning for Robots , 2005 .

[73]  Michael F. McTear,et al.  The Rise of the Conversational Interface: A New Kid on the Block? , 2016, FETLT.

[74]  Bhuvana Ramabhadran,et al.  Deep belief nets for natural language call-routing , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[75]  Matthew Henderson,et al.  Robust dialog state tracking using delexicalised recurrent neural networks and unsupervised adaptation , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[76]  Dilek Z. Hakkani-Tür,et al.  Deep Learning for Dialogue Systems , 2017, COLING.

[77]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[78]  Yanning Zhang,et al.  An unsupervised deep domain adaptation approach for robust speech recognition , 2017, Neurocomputing.

[79]  Dong Yu,et al.  Automatic Speech Recognition: A Deep Learning Approach , 2014 .

[80]  David L. Roberts,et al.  Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning , 2015, Autonomous Agents and Multi-Agent Systems.