Evolutionary reinforcement learning of spoken dialogue strategies

From a system developer’s perspective, designing a spoken dialogue system can be a time-consuming and difficult process. A developer may spend a lot of time anticipating how a potential user might interact with the system and then deciding on the most appropriate system response. These decisions are encoded in a dialogue strategy, essentially a mapping between anticipated user inputs and appropriate system outputs. To reduce the time and effort associated with developing a dialogue strategy, recent work has concentrated on modelling the development of a dialogue strategy as a sequential decision problem. Using this model, reinforcement learning algorithms have been employed to generate dialogue strategies automatically. These algorithms learn strategies by interacting with simulated users. Some progress has been made with this method but a number of important challenges remain. For instance, relatively little success has been achieved with the large state representations that are typical of reallife systems. Another crucial issue is the time and effort associated with the creation of simulated users. In this thesis, I propose an alternative to existing reinforcement learning methods of dialogue strategy development. More specifically, I explore how XCS, an evolutionary reinforcement learning algorithm, can be used to find dialogue strategies that cover large state spaces. Furthermore, I suggest that hand-coded simulated users are sufficient for the learning of useful dialogue strategies. I argue that the use of evolutionary reinforcement learning and hand-coded simulated users is an effective approach to the rapid development of spoken dialogue strategies. Finally, I substantiate this claim by evaluating a learned strategy with real users. Both the learned strategy and a state-of-the-art hand-coded strategy were integrated into an end-to-end spoken dialogue system. The dialogue system allowed real users to make flight enquiries using a live database for an Edinburgh-based airline. The performance of the learned and hand-coded strategies were compared. The evaluation results show that the learned strategy performs as well as the hand-coded one (81% and 77% task completion respectively) but takes much less time to design (two days instead of two weeks). Moreover, the learned strategy compares favourably with previous user evaluations of learned strategies.

[1]  Bruce Lucas VoiceXML for Web-based distributed conversational applications , 2000, CACM.

[2]  Gerald Tesauro,et al.  Practical issues in temporal difference learning , 1992, Machine Learning.

[3]  Risto Miikkulainen,et al.  Forming Neural Networks Through Efficient and Adaptive Coevolution , 1997, Evolutionary Computation.

[4]  Grace Chung,et al.  Developing a Flexible Spoken Dialog System Using Simulation , 2004, ACL.

[5]  T. Kovacs XCS Classifier System Reliably Evolves Accurate, Complete, and Minimal Representations for Boolean Functions , 1998 .

[6]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[7]  Dorothea Heiss-Czedik,et al.  An Introduction to Genetic Algorithms. , 1997, Artificial Life.

[8]  Oliver Lemon,et al.  REINFORCEMENT LEARNING OF DIALOGUE STRATEGIES WITH HIERARCHICAL ABSTRACT MACHINES , 2006, 2006 IEEE Spoken Language Technology Workshop.

[9]  H. Cuayahuitl,et al.  Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  Wlodek Zadrozny,et al.  Natural language dialogue for personalized interaction , 2000, CACM.

[11]  Paul Lamere,et al.  Design of the CMU Sphinx-4 Decoder , 2022 .

[12]  H. Crichton-Miller Adaptation , 1926 .

[13]  Oliver Lemon,et al.  Learning multi-goal dialogue strategies using reinforcement learning with reduced state-action spaces , 2006, INTERSPEECH.

[14]  Daisuke Sasaki,et al.  Multiobjective evolutionary computation for supersonic wing-shape optimization , 2000, IEEE Trans. Evol. Comput..

[15]  Gregory A. Sanders,et al.  DARPA communicator dialog travel planning systems: the june 2000 data collection , 2001, INTERSPEECH.

[16]  Joseph Polifroni,et al.  Learning Database Content for Spoken Dialogue System Design , 2006, LREC.

[17]  James F. Allen,et al.  Human-Machine Collaborative Planning , 2002 .

[18]  Thomas B. Moeslund,et al.  Developing Intelligent MultiMedia applications , 2002 .

[19]  Larry Bull,et al.  Learning Classifier Systems , 2002, Annual Conference on Genetic and Evolutionary Computation.

[20]  Ute Ehrlich Task hierarchies representing sub-dialogs in speech dialog systems , 1999, EUROSPEECH.

[21]  Michael J. A. Berry,et al.  Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management , 2004 .

[22]  Philip R. Cohen,et al.  MULTIMODAL INTERFACES THAT PROCESS WHAT COMES NATURALLY , 2000 .

[23]  James F. Allen,et al.  An architecture for a generic dialogue shell , 2000, Natural Language Engineering.

[24]  YankelovichNicole How do users know what to say , 1996 .

[25]  Oliver Lemon,et al.  Developing conversational interfaces with XCS , 2006 .

[26]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[27]  Pier Luca Lanzi,et al.  Learning classifier systems from a reinforcement learning perspective , 2002, Soft Comput..

[28]  Johanna D. Moore,et al.  Evolving optimal inspectable strategies for spoken dialogue systems , 2006, HLT-NAACL.

[29]  Goldberg,et al.  Genetic algorithms , 1993, Robust Control Systems with Genetic Algorithms.

[30]  Joel R. Tetreault,et al.  Using Reinforcement Learning to Build a Better Model of Dialogue State , 2006, EACL.

[31]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[32]  Gerald Tesauro,et al.  Temporal Difference Learning and TD-Gammon , 1995, J. Int. Comput. Games Assoc..

[33]  Julia Hirschberg,et al.  Characterizing and Predicting Corrections in Spoken Dialogue Systems , 2006, CL.

[34]  Diane J. Litman,et al.  Correlations between dialogue acts and learning in spoken tutoring dialogues , 2006, Natural Language Engineering.

[35]  Naren Ramakrishnan,et al.  Mixed-initiative interaction = mixed computation , 2001, PEPM '02.

[36]  Steve J. Young,et al.  Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[37]  Jason R. Wilcox,et al.  Organizational Learning Within A Learning Classifier System , 1995 .

[38]  Paul A. Crook,et al.  Active Perception in Navigation of Partially Observable Grid Worlds , 2003 .

[39]  Dimitri P. Bertsekas,et al.  Reinforcement Learning for Dynamic Channel Allocation in Cellular Telephone Systems , 1996, NIPS.

[40]  Nigel Gilbert,et al.  Simulating speech systems , 1991 .

[41]  Kalyanmoy Deb,et al.  A Comparative Analysis of Selection Schemes Used in Genetic Algorithms , 1990, FOGA.

[42]  Elsevier Sdol International Journal of Human-Computer Studies , 2009 .

[43]  Chunsheng Fu,et al.  A Modified Classifier System Compaction Algorithm , 2002, GECCO.

[44]  John R. Koza,et al.  Genetic Programming IV: Routine Human-Competitive Machine Intelligence , 2003 .

[45]  Joseph Polifroni,et al.  A new restaurant guide conversational system: issues in rapid prototyping for specialized domains , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[46]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[47]  Eric K. Ringger,et al.  A Robust System for Natural Spoken Dialogue , 1996, ACL.

[48]  Oliver Lemon,et al.  Reinforcement learning of dialogue strategies using the user's last dialogue act , 2005 .

[49]  Joelle Pineau,et al.  Fast reinforcement learning of dialog strategies , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[50]  Ulrich Heid,et al.  Best practice in spoken language dialogue systems engineering: Introduction to the special issue , 2000, Natural Language Engineering.

[51]  Susan McRoy,et al.  Achieving robust human-computer communication , 1998, Int. J. Hum. Comput. Stud..

[52]  Dale Schuurmans,et al.  Representational Difficulties with Classifier Systems , 1989, ICGA.

[53]  Marilyn A. Walker,et al.  Evaluation for Darpa Communicator Spoken Dialogue Systems , 2000, LREC.

[54]  Gert Veldhuijzen van Zanten User modelling in adaptive dialogue management , 1999, EUROSPEECH.

[55]  Michael F. McTear,et al.  Book Review , 2005, Computational Linguistics.

[56]  Renaud Lecoeuche Learning Optimal Dialogue Management Rules by Using Reinforcement Learning and Inductive Logic Programming , 2001, NAACL.

[57]  Kallirroi Georgila,et al.  Hybrid reinforcement/supervised learning for dialogue policies from COMMUNICATOR data , 2005 .

[58]  Paul A. Crook,et al.  Learning in a State of Confusion: Perceptual Aliasing in Grid World Navigation , 2003 .

[59]  Ralf Salomon,et al.  The Influence of Different Coding Schemes on the Computational Complexity of Genetic Algorithms in Function Optimization , 1996, PPSN.

[60]  B. Bloom The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring , 1984 .

[61]  Joelle Pineau,et al.  Spoken Dialogue Management Using Probabilistic Reasoning , 2000, ACL.

[62]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[63]  Fergus McInnes,et al.  Effects of Prompt Style on User Responses to an Automated Banking Service Using Word-Spotting , 1999 .

[64]  John J. Grefenstette,et al.  Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..

[65]  Anoop K. Sinha,et al.  Suede: a Wizard of Oz prototyping tool for speech user interfaces , 2000, UIST '00.

[66]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation (3rd Edition) , 2007 .

[67]  Thierry Dutoit,et al.  A probabilistic framework for dialog simulation and optimal strategy learning , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[68]  Chris Baber,et al.  Designing habitable dialogues for speech-based interaction with computers , 2001, Int. J. Hum. Comput. Stud..

[69]  Stewart W. Wilson Generalization in the XCS Classifier System , 1998 .

[70]  Manny Rayner,et al.  Adding intelligent help to mixed-initiative spoken dialogue systems , 2002, INTERSPEECH.

[71]  Stewart W. Wilson,et al.  Toward Optimal Classifier System Performance in Non-Markov Environments , 2000, Evolutionary Computation.

[72]  Roberto Pieraccini,et al.  A stochastic model of human-machine interaction for learning dialog strategies , 2000, IEEE Trans. Speech Audio Process..

[73]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[74]  Diane J. Litman,et al.  Comparing real-real, simulated-simulated, and simulated-real spoken dialogue corpora , 2006 .

[75]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[76]  Marc Cavazza,et al.  An Empirical Study of Speech Recognition Errors in a Task-Oriented Dialogue System , 2001, SIGDIAL Workshop.

[77]  Marilyn A. Walker,et al.  Evaluating spoken dialogue agents with PARADISE: Two case studies , 1998, Comput. Speech Lang..

[78]  Efstratios F. Georgopoulos,et al.  Exchange-Rates Forecasting: A Hybrid Algorithm Based on Genetically Optimized Adaptive Neural Networks , 2002 .

[79]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[80]  Kallirroi Georgila,et al.  Learning user simulations for information state update dialogue systems , 2005, INTERSPEECH.

[81]  Renaud Lecœuche Learning optimal dialogue management rules by using reinforcement learning and inductive logic programming , 2001, HTL 2001.

[82]  Stewart W. Wilson,et al.  Learning Classifier Systems, From Foundations to Applications , 2000 .

[83]  Robert E. Smith,et al.  The Fighter Aircraft LCS: A Case of Different LCS Goals and Techniques , 1999, Learning Classifier Systems.

[84]  V. Rich Personal communication , 1989, Nature.

[85]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[86]  Andrew G. Barto,et al.  Robot Weightlifting By Direct Policy Search , 2001, IJCAI.

[87]  Pascal Poupart,et al.  Partially Observable Markov Decision Processes with Continuous Observations for Dialogue Management , 2008, SIGDIAL.

[88]  Kenneth A. De Jong,et al.  A formal analysis of the role of multi-point crossover in genetic algorithms , 1992, Annals of Mathematics and Artificial Intelligence.

[89]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[90]  Michael English,et al.  Learning Mixed Initiative Dialog Strategies By Using Reinforcement Learning On Both Conversants , 2005, HLT.

[91]  Marilyn A. Walker,et al.  Automatic Optimization of Dialogue Management , 2000, COLING.

[92]  C. Raymond,et al.  Belief confirmation in spoken dialog systems using confidence measures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[93]  John H. Holland,et al.  Cognitive systems based on adaptive algorithms , 1977, SGAR.

[94]  Michael Johnston,et al.  Beyond structured dialogues: factoring out grounding , 1998, ICSLP.

[95]  Tim Paek Empirical Methods for Evaluating Dialog Systems , 2001, SIGDIAL Workshop.

[96]  Martin V. Butz,et al.  An algorithmic description of XCS , 2000, Soft Comput..

[97]  D. Goldberg,et al.  Bounding Learning Time in XCS , 2004, GECCO.

[98]  L. D. Whitley,et al.  Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[99]  Alexander I. Rudnicky,et al.  A unified design for human-machine voice interaction , 2001, CHI Extended Abstracts.

[100]  Tim Kovacs,et al.  Applications of Learning Classifier Systems , 2004 .

[101]  Yonghong Yan,et al.  Universal speech tools: the CSLU toolkit , 1998, ICSLP.

[102]  Alexander I. Rudnicky,et al.  Building voiceXML-based applications , 2002, INTERSPEECH.

[103]  John J. Grefenstette,et al.  Learning sequential decision rules using simulation models and competition , 2004, Machine Learning.

[104]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[105]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[106]  Victor Zue,et al.  JUPlTER: a telephone-based conversational interface for weather information , 2000, IEEE Trans. Speech Audio Process..

[107]  Johanna D. Moore,et al.  Information Presentation in Spoken Dialogue Systems , 2006, EACL.

[108]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[109]  Sang-Hoon Oh,et al.  Sensitivity analysis of single hidden-layer neural networks with threshold functions , 1995, IEEE Trans. Neural Networks.

[110]  Stephen F. Smith,et al.  Flexible Learning of Problem Solving Heuristics Through Adaptive Search , 1983, IJCAI.

[111]  Stanley Peters,et al.  The WITAS multi-modal dialogue system I , 2001, INTERSPEECH.

[112]  Hisao Ishibuchi,et al.  Linguistic Rule Extraction by Genetics-Based Machine Learning , 2000, GECCO.

[113]  Mikio Nakano,et al.  Fast Reinforcement Learning of Dialogue Policies Using Stable Function Approximation , 2004, IJCNLP.

[114]  Daniele Loiacono,et al.  XCS with computed prediction in multistep environments , 2005, GECCO '05.

[115]  Martin V. Butz,et al.  How XCS evolves accurate classifiers , 2001 .

[116]  Lin-Shan Lee,et al.  Computer-aided analysis and design for spoken dialogue systems based on quantitative simulations , 2001, IEEE Trans. Speech Audio Process..

[117]  Takahiro Watanabe,et al.  Evaluating Dialogue Strategies under Communication Errors Using Computer-to-Computer Simulation , 1998 .

[118]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[119]  Stewart W. Wilson Compact Rulesets from XCSI , 2001, IWLCS.

[120]  Steve Young,et al.  Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning , 2002 .

[121]  Gilbert Syswerda,et al.  Uniform Crossover in Genetic Algorithms , 1989, ICGA.

[122]  Staffan Larsson,et al.  Information state and dialogue management in the TRINDI dialogue move engine toolkit , 2000, Natural Language Engineering.

[123]  Stewart W. Wilson State of XCS Classifier System Research , 1999, Learning Classifier Systems.

[124]  Nicole Yankelovich,et al.  How do users know what to say? , 1996, INTR.

[125]  John R. Koza,et al.  Genetic programming - on the programming of computers by means of natural selection , 1993, Complex adaptive systems.

[126]  Konrad Scheffler,et al.  Probabilistic simulation of human-machine dialogues , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[127]  Olivier Pietquin,et al.  Comparing ASR modeling methods for spoken dialogue simulation and optimal strategy learning , 2005, INTERSPEECH.

[128]  Tim Paek,et al.  Reinforcement Learning for Spoken Dialogue Systems: Comparing Strengths and Weaknesses for Practical Deployment , 2006 .

[129]  Valerie J. Shute,et al.  SMART: Student modeling approach for responsive tutoring , 1995, User Modeling and User-Adapted Interaction.

[130]  Lawrence J. Fogel,et al.  Artificial Intelligence through Simulated Evolution , 1966 .

[131]  John H. Holland,et al.  COGNITIVE SYSTEMS BASED ON ADAPTIVE ALGORITHMS1 , 1978 .

[132]  Steve J. Young,et al.  A survey of statistical user simulation techniques for reinforcement-learning of dialogue management strategies , 2006, The Knowledge Engineering Review.

[133]  Pier Luca Lanzi,et al.  An Analysis of Generalization in the XCS Classifier System , 1999, Evolutionary Computation.

[134]  Oliver Lemon,et al.  Using Machine Learning to Explore Human Multimodal Clarification Strategies , 2006, ACL.

[135]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[136]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[137]  Randall D. Beer,et al.  Sequential Behavior and Learning in Evolved Dynamical Neural Networks , 1994, Adapt. Behav..

[138]  Marilyn A. Walker,et al.  Empirical Evaluation of a Reinforcement Learning Spoken Dialogue System , 2000, AAAI/IAAI.

[139]  Brady Clark,et al.  Advantages of Spoken Language Interaction in Dialogue-Based Intelligent Tutoring Systems , 2004, Intelligent Tutoring Systems.

[140]  Volker Steinbiss,et al.  The Philips automatic train timetable information system , 1995, Speech Commun..

[141]  Oliver Lemon,et al.  Learning More Effective Dialogue Strategies Using Limited Dialogue Move Features , 2006, ACL.

[142]  Kallirroi Georgila,et al.  EVALUATING EFFECTIVENESS AND PORTABILITY OF REINFORCEMENT LEARNED DIALOGUE STRATEGIES WITH REAL USERS: THE TALK TOWNINFO EVALUATION , 2006, 2006 IEEE Spoken Language Technology Workshop.

[143]  Oliver Lemon,et al.  DIPPER: Description and Formalisation of an Information-State Update Dialogue System Architecture , 2003, SIGDIAL Workshop.

[144]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[145]  Ramón López-Cózar,et al.  Testing dialogue systems by means of automatic generation of conversations , 2002, Interact. Comput..

[146]  Jason D. Williams,et al.  Partially Observable Markov Decision Processes for Spoken Dialogue Management , 2006 .

[147]  Stewart W. Wilson Classifier Fitness Based on Accuracy , 1995, Evolutionary Computation.