Learning and Evaluation of Dialogue Strategies for New Applications: Empirical Methods for Optimization from Small Data Sets

We present a new data-driven methodology for simulation-based dialogue strategy learning, which allows us to address several problems in the field of automatic optimization of dialogue strategies: learning effective dialogue strategies when no initial data or system exists, and determining a data-driven reward function. In addition, we evaluate the result with real users, and explore how results transfer between simulated and real interactions. We use Reinforcement Learning (RL) to learn multimodal dialogue strategies by interaction with a simulated environment which is “bootstrapped” from small amounts of Wizard-of-Oz (WOZ) data. This use of WOZ data allows data-driven development of optimal strategies for domains where no working prototype is available. Using simulation-based RL allows us to find optimal policies which are not (necessarily) present in the original data. Our results show that simulation-based RL significantly outperforms the average (human wizard) strategy as learned from the data by using Supervised Learning. The bootstrapped RL-based policy gains on average 50 times more reward when tested in simulation, and almost 18 times more reward when interacting with real users. Users also subjectively rate the RL-based policy on average 10% higher. We also show that results from simulated interaction do transfer to interaction with real users, and we explicitly evaluate the stability of the data-driven reward function.

[1]  Kallirroi Georgila,et al.  Hybrid Reinforcement/Supervised Learning of Dialogue Policies from Fixed Data Sets , 2008, CL.

[2]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[3]  Oliver Lemon,et al.  Cluster-based user simulations for learning dialogue strategies , 2006, INTERSPEECH.

[4]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[5]  Oliver Lemon,et al.  Adaptive natural language generation in dialogue using reinforcement learning , 2008 .

[6]  Roberto Pieraccini,et al.  User modeling for spoken dialogue system evaluation , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[7]  Kallirroi Georgila,et al.  User simulation for spoken dialogue systems: learning and evaluation , 2006, INTERSPEECH.

[8]  Sun-Yuan Kung,et al.  Environment adaptation for robust speaker verification by cascading maximum likelihood linear regression and reinforced learning , 2007, Comput. Speech Lang..

[9]  Oliver Lemon,et al.  Natural Language Generation as Planning Under Uncertainty for Spoken Dialogue Systems , 2009, EACL.

[10]  Oliver Lemon,et al.  User simulations for online adaptation and knowledge-alignment in troubleshooting dialogue systems , 2008 .

[11]  Gary Geunbae Lee,et al.  A data-driven grapheme-to-phoneme conversion method using dynamic contextual converting rules for Korean TTS systems , 2009, Comput. Speech Lang..

[12]  Milica Gasic,et al.  The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management , 2010, Comput. Speech Lang..

[13]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[14]  Julia Hirschberg,et al.  Identifying User Corrections Automatically in Spoken Dialogue Systems , 2001, NAACL.

[15]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[16]  Gwyneth Doherty-Sneddon,et al.  The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[17]  Oliver Lemon,et al.  A Corpus Collection and Annotation Framework for Learning Multimodal Clarification Strategies , 2005, SIGDIAL.

[18]  Kallirroi Georgila,et al.  D4. 1: Integration of Learning and Adaptivity with the ISU approach , 2005 .

[19]  Alexander H. Waibel,et al.  Rapid simulation-driven reinforcement learning of multimodal dialog strategies in human-robot interaction , 2006, INTERSPEECH.

[20]  Oliver Lemon,et al.  Learning Effective Multimodal Dialogue Strategies from Wizard-of-Oz Data: Bootstrapping and Evaluation , 2008, ACL.

[21]  Oliver Lemon,et al.  Optimising Information Presentation for Spoken Dialogue Systems , 2010, ACL.

[22]  Oliver Lemon,et al.  Does this list contain what you were searching for? Learning adaptive dialogue strategies for interactive question answering , 2009, Natural Language Engineering.

[23]  Esther Levin,et al.  A WOz Variant with Contrastive Conditions , 2006 .

[24]  Oliver Lemon,et al.  Recent research advances in Reinforcement Learning in Spoken Dialogue Systems , 2009, The Knowledge Engineering Review.

[25]  Peter A. Heeman Combining Reinformation Learning with Information-State Update Rules , 2007, HLT-NAACL.

[26]  Steve J. Young,et al.  A framework for dialogue data collection with a simulated ASR channel , 2004, INTERSPEECH.

[27]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[30]  H. Cuayahuitl,et al.  Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[31]  Gabriel Skantze,et al.  Exploring human error recovery strategies: Implications for spoken dialogue systems , 2005, Speech Communication.

[32]  Daniel Jurafsky,et al.  An introduction to natural language processing , 2000 .

[33]  Marilyn A. Walker Can We Talk? Methods for Evaluation and Training of Spoken Dialogue Systems , 2005, Lang. Resour. Evaluation.

[34]  Diane J. Litman,et al.  Comparing real-real, simulated-simulated, and simulated-real spoken dialogue corpora , 2006 .

[35]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[36]  Stephen Young Probabilistic methods in spoken–dialogue systems , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[37]  Jon Oberlander,et al.  Data-Driven Generation of Emphatic Facial Displays , 2006, EACL.

[38]  Melita Hajdinjak,et al.  The PARADISE Evaluation Framework: Issues and Findings , 2006, Computational Linguistics.

[39]  Kallirroi Georgila,et al.  Learning user simulations for information state update dialogue systems , 2005, INTERSPEECH.

[40]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[41]  Nigel Gilbert,et al.  Simulating speech systems , 1991 .

[42]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[43]  Ian Witten,et al.  Data Mining , 2000 .

[44]  Tim Paek,et al.  Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment , 2006 .

[45]  Hua Ai,et al.  Comparing User Simulation Models For Dialog Strategy Learning , 2007, HLT-NAACL.

[46]  Oliver Lemon,et al.  Combining Acoustic and Pragmatic Features to Predict Recognition Performance in Spoken Dialogue Systems , 2004, ACL.

[47]  Marilyn A. Walker,et al.  Learning Optimal Dialogue Strategies: A Case Study of a Spoken Dialogue Agent for Email , 1998, COLING-ACL.

[48]  Kallirroi Georgila,et al.  EVALUATING EFFECTIVENESS AND PORTABILITY OF REINFORCEMENT LEARNED DIALOGUE STRATEGIES WITH REAL USERS: THE TALK TOWNINFO EVALUATION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[49]  David R. Traum,et al.  Degrees of Grounding Based on Evidence of Understanding , 2008, SIGDIAL Workshop.

[50]  Kristiina Jokinen,et al.  User expectations and real experience on a multimodal interactive system , 2006, INTERSPEECH.

[51]  Johanna D. Moore,et al.  Automatic annotation of context and speech acts for dialogue corpora , 2009, Natural Language Engineering.

[52]  Verena Rieser,et al.  Bootstrapping reinforcement learning-based dialogue strategies from wizard-of-oz data , 2008 .

[53]  Rong Zhang,et al.  Is this conversation on track? , 2001, INTERSPEECH.

[54]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[55]  Oliver Lemon,et al.  Using Machine Learning to Explore Human Multimodal Clarification Strategies , 2006, ACL.

[56]  Oliver Lemon,et al.  User Simulations for Context-Sensitive Speech Recognition in Spoken Dialogue Systems , 2009, EACL.

[57]  Oliver Lemon,et al.  Learning human multimodal dialogue strategies , 2009, Natural Language Engineering.

[58]  Marilyn A. Walker,et al.  Towards developing general models of usability with PARADISE , 2000, Natural Language Engineering.

[59]  Maxine Eskénazi,et al.  ONLINE SUPERVISED LEARNING OF NON-UNDERSTANDING RECOVERY POLICIES , 2006, 2006 IEEE Spoken Language Technology Workshop.

[60]  Pat Langley,et al.  Separating Skills from Preference: Using Learning to Program by Reward , 2002, ICML.

[61]  Johanna D. Moore,et al.  Implications for Generating Clarification Requests in Task-Oriented Dialogues , 2005, ACL.

[62]  Jason D. Williams,et al.  A method for evaluating and comparing user simulations: The Cramér-von Mises divergence , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[63]  KearnsMichael,et al.  Optimizing dialogue management with reinforcement learning , 2002 .

[64]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[65]  Shimei Pan,et al.  Empirically Evaluating an Adaptable Spoken Dialogue System , 1999, ArXiv.

[66]  Peter A. Heeman Combining Reinforcement Learning with Information-State Update Rules ∗ , 2007 .

[67]  Thierry Dutoit,et al.  A probabilistic framework for dialog simulation and optimal strategy learning , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[68]  Kepa Joseba Rodríguez,et al.  Form, Intonation and Function of Clarification Requests in German Task-Oriented Spoken Dialogues , 2004 .

[69]  Sebastian Möller,et al.  Pragmatic Usage of Linear Regression Models for the Prediction of User Judgments , 2007, SIGDIAL.

[70]  Verena Rieser,et al.  The SAMMIE Corpus of Multimodal Dialogues with an MP3 Player , 2006, LREC.

[71]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[72]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[73]  Oliver Lemon,et al.  Learning what to say and how to say it: Joint optimisation of spoken dialogue management and natural language generation , 2011, Comput. Speech Lang..

[74]  Oliver Lemon,et al.  Author manuscript, published in "European Conference on Speech Communication and Technologies (Interspeech'07), Anvers: Belgium (2007)" Machine Learning for Spoken Dialogue Systems , 2022 .

[75]  Kallirroi Georgila,et al.  Quantitative Evaluation of User Simulation Techniques for Spoken Dialogue Systems , 2005, SIGDIAL.

[76]  Oliver Lemon,et al.  Hierarchical Reinforcement Learning of Dialogue Policies in a development environment for dialogue systems: REALL-DUDE , 2006 .

[77]  Gary Geunbae Lee,et al.  Data-driven user simulation for automated evaluation of spoken dialog systems , 2009, Comput. Speech Lang..

[78]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[79]  Steve Young,et al.  Statistical User Simulation with a Hidden Agenda , 2007, SIGDIAL.

[80]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[81]  Steve J. Young,et al.  Using Wizard-of-Oz simulations to bootstrap Reinforcement - Learning based dialog management systems , 2003, SIGDIAL Workshop.

[82]  Tim Paek,et al.  Toward Evaluation that Leads to Best Practices: Reconciling Dialog Evaluation in Research and Industry , 2007, Proceedings of the Workshop on Bridging the Gap Academic and Industrial Research in Dialog Technologies - NAACL-HLT '07.

[83]  Oliver Lemon,et al.  Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems , 2010, ACL.

[84]  Steve J. Young,et al.  Error simulation for training statistical dialogue systems , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[85]  Sebastian Möller,et al.  Memo: towards automatic usability evaluation of spoken dialogue services by user error simulations , 2006, INTERSPEECH.

[86]  Virginia Teller Review of Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition by Daniel Jurafsky and James H. Martin. Prentice Hall 2000. , 2000 .