Document-editing Assistants and Model-based Reinforcement Learning as a Path to Conversational AI

Intelligent assistants that follow commands or answer simple questions, such as Siri and Google search, are among the most economically important applications of AI. Future conversational AI assistants promise even greater capabilities and a better user experience through a deeper understanding of the domain, the user, or the user's purposes. But what domain and what methods are best suited to researching and realizing this promise? In this article we argue for the domain of voice document editing and for the methods of model-based reinforcement learning. The primary advantages of voice document editing are that the domain is tightly scoped and that it provides something for the conversation to be about (the document) that is delimited and fully accessible to the intelligent assistant. The advantages of reinforcement learning in general are that its methods are designed to learn from interaction without explicit instruction and that it formalizes the purposes of the assistant. Model-based reinforcement learning is needed in order to genuinely understand the domain of discourse and thereby work efficiently with the user to achieve their goals. Together, voice document editing and model-based reinforcement learning comprise a promising research direction for achieving conversational AI.

[1]  H. Woodrow The ability to learn. , 1946, Psychological review.

[2]  D. C. Englebart,et al.  Augmenting human intellect: a conceptual framework , 1962 .

[3]  Nilo Lindgren Purposive systems: The edge of Knowledge , 1968, IEEE Spectrum.

[4]  Jaime R Carbonell,et al.  Mixed-initiative man-computer instructional dialogues , 1970 .

[5]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[6]  J. Forrester Counterintuitive behavior of social systems , 1971 .

[7]  Robert F. Simmons,et al.  Generating English discourse from semantic networks , 1972, CACM.

[8]  Terry Winograd,et al.  Breaking the complexity barrier again , 1973, SIGPLAN '73.

[9]  Richard Power,et al.  A computer model of conversation , 1974 .

[10]  Mario C. Grignetti,et al.  An "intelligent" on-line assistant and tutor: NLS-scholar , 1975, AFIPS '75.

[11]  Bertram C. Bruce Belief systems and language understanding , 1975 .

[12]  Philip R. Cohen On knowing what to say: planning speech acts. , 1978 .

[13]  James F. Allen A plan-based approach to speech act recognition , 1979 .

[14]  Allen Newell,et al.  Computer text-editing: An information-processing analysis of a routine cognitive skill , 1980, Cognitive Psychology.

[15]  Glenn Langford The Nature of Purpose , 1981 .

[16]  Michael H. Long INPUT, INTERACTION, AND SECOND‐LANGUAGE ACQUISITION , 1981 .

[17]  Julia Hirschberg,et al.  User Participation in the Reasoning Processes of Expert Systems , 1982, AAAI.

[18]  Barbara J. Grosz,et al.  TEAM: A Transportable Natural-Language Interface System , 1983, ANLP.

[19]  William A Woods,et al.  Natural Language Communication with Machines: An Ongoing Goal. , 1983 .

[20]  Stephen F. Smith,et al.  ISIS—a knowledge‐based system for factory scheduling , 1984 .

[21]  Richard Young,et al.  Making Input Comprehensible: Do Interactional Modifications Help? , 1986 .

[22]  Richard C. Waters KBEmacs: Where's the AI? , 1986, AI Mag..

[23]  Geoffrey E. Hinton,et al.  Schemata and Sequential Thought Processes in PDP Models , 1986 .

[24]  Timothy W. Finin,et al.  Natural language interactions with artificial experts , 1986, Proceedings of the IEEE.

[25]  T. Pica Second-language Acquisition, Social Interaction, and the Classroom. , 1987 .

[26]  Gail E. Kaiser,et al.  Intelligent assistance for software development and maintenance , 1988, IEEE Software.

[27]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues , 1989, ACL.

[28]  Sandra Carberry,et al.  Plan Recognition and Its Use in Understanding Dialog , 1989 .

[29]  Yaman Arkun,et al.  Neural Network Modeling and an Extended DMC Algorithm to Control Nonlinear Systems , 1990, 1990 American Control Conference.

[30]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[31]  Leslie Pack Kaelbling,et al.  Input Generalization in Delayed Reinforcement Learning: An Algorithm and Performance Comparisons , 1991, IJCAI.

[32]  Satinder P. Singh,et al.  Reinforcement Learning with a Hierarchy of Abstract Models , 1992, AAAI.

[33]  Pattie Maes,et al.  A learning interface agent for scheduling meetings , 1993, IUI '93.

[34]  Oliver G. Selfridge,et al.  The Gardens of Learning: A Vision for AI , 1993, AI Mag..

[35]  D. Richard Hipp,et al.  Spoken Natural Language Dialog Systems: A Practical Approach , 1994 .

[36]  Susan M. Gass,et al.  Input, Interaction, and Second Language Production , 1994, Studies in Second Language Acquisition.

[37]  C Kamm,et al.  User interfaces for voice applications. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[38]  Eugenio Guglielmelli,et al.  Robot assistants: Applications and evolution , 1996, Robotics Auton. Syst..

[39]  Tom Routen,et al.  Intelligent Tutoring Systems , 1996, Lecture Notes in Computer Science.

[40]  Christopher G. Atkeson,et al.  A comparison of direct and model-based reinforcement learning , 1997, Proceedings of International Conference on Robotics and Automation.

[41]  Roberto Pieraccini,et al.  Learning dialogue strategies within the Markov decision process framework , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[42]  R. Sutton Between MDPs and Semi-MDPs : Learning , Planning , and Representing Knowledge at Multiple Temporal Scales , 1998 .

[43]  Doina Precup,et al.  Between MOPs and Semi-MOP: Learning, Planning & Representing Knowledge at Multiple Temporal Scales , 1998 .

[44]  J. Cassell,et al.  More Than Just Another Pretty Face: Embodied Conversational Interface Agents , 1999 .

[45]  Marilyn A. Walker,et al.  Reinforcement Learning for Spoken Dialogue Systems , 1999, NIPS.

[46]  Marilyn A. Walker,et al.  An Application of Reinforcement Learning to Dialogue Strategy Selection in a Spoken Dialogue System for Email , 2000, J. Artif. Intell. Res..

[47]  Marilyn A. Walker,et al.  NJFun- A Reinforcement Learning Spoken Dialogue System , 2000 .

[48]  Pedro M. Domingos,et al.  Learning Repetitive Text-Editing Procedures with SMARTedit , 2001, Your Wish is My Command.

[49]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[50]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[51]  Jeff G. Schneider,et al.  Autonomous helicopter control using reinforcement learning policy search methods , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[52]  Jürgen Schmidhuber,et al.  Model-based reinforcement learning for evolving soccer strategies , 2001 .

[53]  Nikolaos G. Bourbakis,et al.  An intelligent assistant for navigation of visually impaired people , 2001, Proceedings 2nd Annual IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001).

[54]  Joelle Pineau,et al.  Pearl: A Mobile Robotic Assistant for the Elderly , 2002 .

[55]  Pierre-Yves Oudeyer,et al.  Robotic clicker training , 2002, Robotics Auton. Syst..

[56]  S. Singh,et al.  Optimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System , 2011, J. Artif. Intell. Res..

[57]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[58]  Milind Tambe,et al.  Adjustable Autonomy Challenges in Personal Assistant Agents: A Position Paper , 2003, Agents and Computational Autonomy.

[59]  Joelle Pineau,et al.  Towards robotic assistants in nursing homes: Challenges and results , 2003, Robotics Auton. Syst..

[60]  Marilyn A. Walker,et al.  Trainable Sentence Planning for Complex Information Presentations in Spoken Dialog Systems , 2004, ACL.

[61]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[62]  J. Kocijan,et al.  Gaussian process model based predictive control , 2004, Proceedings of the 2004 American Control Conference.

[63]  Gökhan Tür,et al.  Combining active and semi-supervised learning for spoken language understanding , 2005, Speech Commun..

[64]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[65]  H. Cuayahuitl,et al.  Human-computer dialogue simulation using hidden Markov models , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[66]  Cynthia Breazeal,et al.  Real-Time Interactive Reinforcement Learning for Robots , 2005 .

[67]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[68]  Andrea Lockerd Thomaz,et al.  Reinforcement Learning with Human Teachers: Understanding How People Want to Teach Robots , 2006, ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication.

[69]  Daniel C. Fain,et al.  Sponsored search: A brief history , 2006 .

[70]  Yuzhu Lu,et al.  Augmented Reality E-Commerce Assistant System: Trying While Shopping , 2007, HCI.

[71]  Hui Ye,et al.  Agenda-Based User Simulation for Bootstrapping a POMDP Dialogue System , 2007, NAACL.

[72]  Marilyn A. Walker,et al.  Individual and Domain Adaptation in Sentence Planning for Dialogue , 2007, J. Artif. Intell. Res..

[73]  Lihong Li,et al.  An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning , 2008, ICML '08.

[74]  Alborz Geramifard,et al.  Dyna-Style Planning with Linear Function Approximation and Prioritized Sweeping , 2008, UAI.

[75]  Suresh Manandhar,et al.  Designing an interactive open-domain question answering system , 2009, Natural Language Engineering.

[76]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[77]  Alan K. Mackworth,et al.  Artificial Intelligence - Foundations of Computational Agents , 2010 .

[78]  Panagiotis G. Ipeirotis,et al.  Running Experiments on Amazon Mechanical Turk , 2010, Judgment and Decision Making.

[79]  Gökhan Tür,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 The CALO Meeting Assistant System , 2022 .

[80]  Christopher C. Yang Search Engines Information Retrieval in Practice , 2010, J. Assoc. Inf. Sci. Technol..

[81]  L Poole David,et al.  Artificial Intelligence: Foundations of Computational Agents , 2010 .

[82]  Stephen Ades,et al.  Voice Annotation and Editing in a Workstation Environment , 2010 .

[83]  Farbod Fahimi,et al.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning , 2011, 2011 IEEE International Conference on Rehabilitation Robotics.

[84]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[85]  Mohammad Amin Bassiri Interactional Feedback and the Impact of Attitude and Motivation on Noticing L2 Form , 2011 .

[86]  Milica Gasic,et al.  On-line policy optimisation of spoken dialogue systems via live interaction with human subjects , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[87]  N. Daw,et al.  The ubiquity of model-based reinforcement learning , 2012, Current Opinion in Neurobiology.

[88]  Peter Stone,et al.  Reinforcement learning from simultaneous human and MDP reward , 2012, AAMAS.

[89]  Patrick M. Pilarski,et al.  Between Instruction and Reward: Human-Prompted Switching , 2012, AAAI Fall Symposium: Robots Learning Interactively from Human Teachers.

[90]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[91]  Kiyoshi Yasuda,et al.  Towards Assessing the Communication Responsiveness of People with Dementia , 2012, IVA.

[92]  Dongho Kim,et al.  POMDP-based dialogue manager adaptation to extended domains , 2013, SIGDIAL Conference.

[93]  Dongho Kim,et al.  On-line policy optimisation of Bayesian spoken dialogue systems via human interaction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[94]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[95]  Pierre Lison Model-based Bayesian reinforcement learning for dialogue management , 2013, INTERSPEECH.

[96]  Milica Gasic,et al.  POMDP-Based Statistical Spoken Dialog Systems: A Review , 2013, Proceedings of the IEEE.

[97]  Andrea Lockerd Thomaz,et al.  Policy Shaping: Integrating Human Feedback with Reinforcement Learning , 2013, NIPS.

[98]  David L. Roberts,et al.  A Strategy-Aware Technique for Learning Behaviors from Discrete Human Feedback , 2014, AAAI.

[99]  Gökhan Tür,et al.  Understanding Spoken Language , 2014, Computing Handbook, 3rd ed..

[100]  Ryuichiro Higashinaka,et al.  Towards an open-domain conversational system fully based on natural language processing , 2014, COLING.

[101]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[102]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[103]  Alaa Hassan Mahmoud,et al.  Speech To Text Conversion , 2014 .

[104]  P. König,et al.  Primary Visual Cortex Represents the Difference Between Past and Present , 2013, Cerebral cortex.

[105]  Thomas B. Schön,et al.  From Pixels to Torques: Policy Learning with Deep Dynamical Models , 2015, ICML 2015.

[106]  Imed Zitouni,et al.  Automatic Online Evaluation of Intelligent Assistants , 2015, WWW.

[107]  Fabrice Lefèvre,et al.  Reinforcement-learning based dialogue system for human-robot interactions with socially-inspired rewards , 2015, Comput. Speech Lang..

[108]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[109]  P. Pilarski Prosthetic Devices as Goal-Seeking Agents , 2015 .

[110]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[111]  Shimon Whiteson,et al.  Learning to Communicate with Deep Multi-Agent Reinforcement Learning , 2016, NIPS.

[112]  Dilek Z. Hakkani-Tür,et al.  Interactive reinforcement learning for task-oriented dialogue management , 2016 .

[113]  Jianfeng Gao,et al.  Deep Reinforcement Learning for Dialogue Generation , 2016, EMNLP.

[114]  Gabriel Skantze,et al.  Real-Time Coordination in Human-Robot Interaction Using Face and Voice , 2017, AI Mag..

[115]  Marc'Aurelio Ranzato,et al.  Sequence Level Training with Recurrent Neural Networks , 2015, ICLR.

[116]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[117]  Rob Fergus,et al.  Learning Multiagent Communication with Backpropagation , 2016, NIPS.

[118]  Patrick M. Pilarski,et al.  Face valuing: Training user interfaces with facial expressions and reinforcement learning , 2016, ArXiv.

[119]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Natural Language Action Space , 2015, ACL.

[120]  Alexander I. Rudnicky,et al.  An Intelligent Assistant for High-Level Task Understanding , 2016, IUI.

[121]  Jianfeng Gao,et al.  A User Simulator for Task-Completion Dialogues , 2016, ArXiv.

[122]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[123]  Yelong Shen,et al.  ReasoNet: Learning to Stop Reading in Machine Comprehension , 2016, CoCo@NIPS.

[124]  Patrick M. Pilarski,et al.  Simultaneous Control and Human Feedback in the Training of a Robotic Agent with Actor-Critic Reinforcement Learning , 2016, ArXiv.

[125]  Angeliki Lazaridou,et al.  Towards Multi-Agent Communication-Based Language Learning , 2016, ArXiv.

[126]  Maxine Eskénazi,et al.  Towards End-to-End Learning for Dialog State Tracking and Management using Deep Reinforcement Learning , 2016, SIGDIAL Conference.

[127]  David Vandyke,et al.  A Network-based End-to-End Trainable Task-oriented Dialogue System , 2016, EACL.

[128]  David Vandyke,et al.  Dialogue manager domain adaptation using Gaussian process reinforcement learning , 2016, Comput. Speech Lang..

[129]  Jason Weston,et al.  Learning through Dialogue Interactions by Asking Questions , 2016, ICLR.

[130]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[131]  Li Zhou,et al.  End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient , 2017, ArXiv.

[132]  Gökhan Tür,et al.  Towards Zero-Shot Frame Semantic Parsing for Domain Scaling , 2017, INTERSPEECH.

[133]  Tom Schaul,et al.  The Predictron: End-To-End Learning and Planning , 2016, ICML.

[134]  Jason Weston,et al.  Dialogue Learning With Human-In-The-Loop , 2016, ICLR.

[135]  Learning Robust Dialog Policies in Noisy Environments , 2017, ArXiv.

[136]  Chris Sauer,et al.  Beating Atari with Natural Language Guided Reinforcement Learning , 2017, ArXiv.

[137]  Stefan Ultes,et al.  Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management , 2017, SIGDIAL Conference.

[138]  Joelle Pineau,et al.  A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues , 2016, AAAI.

[139]  Joelle Pineau,et al.  A Deep Reinforcement Learning Chatbot , 2017, ArXiv.

[140]  Jianfeng Gao,et al.  End-to-End Task-Completion Neural Dialogue Systems , 2017, IJCNLP.

[141]  Yann Dauphin,et al.  Deal or No Deal? End-to-End Learning of Negotiation Dialogues , 2017, EMNLP.

[142]  Stefan Ultes,et al.  Sub-domain Modelling for Dialogue Management with Hierarchical Reinforcement Learning , 2017, SIGDIAL Conference.

[143]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[144]  Bing Liu,et al.  Iterative policy learning in end-to-end trainable task-oriented neural dialog models , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[145]  Pieter Abbeel,et al.  Autonomous Helicopter Flight Using Reinforcement Learning , 2010, Encyclopedia of Machine Learning.

[146]  Geoffrey Zweig,et al.  Hybrid Code Networks: practical and efficient end-to-end dialog control with supervised and reinforcement learning , 2017, ACL.

[147]  Dilek Z. Hakkani-Tür,et al.  Scalable multi-domain dialogue state tracking , 2017, 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[148]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[149]  Kam-Fai Wong,et al.  Composite Task-Completion Dialogue Policy Learning via Hierarchical Deep Reinforcement Learning , 2017, EMNLP.

[150]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[151]  Seunghak Yu,et al.  Scaling up deep reinforcement learning for multi-domain dialogue systems , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[152]  Bing Liu,et al.  End-to-End Optimization of Task-Oriented Dialogue Model with Deep Reinforcement Learning , 2017, ArXiv.

[153]  Patrick M. Pilarski,et al.  Communicative Capital for Prosthetic Agents , 2017, ArXiv.

[154]  Bing Liu,et al.  Multi-Domain Adversarial Learning for Slot Filling in Spoken Language Understanding , 2017, ArXiv.

[155]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[156]  Jianfeng Gao,et al.  Towards End-to-End Reinforcement Learning of Dialogue Agents for Information Access , 2016, ACL.

[157]  Yu Wu,et al.  Towards Explainable and Controllable Open Domain Dialogue Generation with Dialogue Acts , 2018, ArXiv.

[158]  Marc Peter Deisenroth,et al.  Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[159]  Jianfeng Gao,et al.  Discriminative Deep Dyna-Q: Robust Planning for Dialogue Policy Learning , 2018, EMNLP.

[160]  Kallirroi Georgila,et al.  Conversational Image Editing: Incremental Intent Identification in a New Dialogue Task , 2018, SIGDIAL Conference.

[161]  Zhou Yu,et al.  Multimodal Hierarchical Reinforcement Learning Policy for Task-Oriented Visual Dialog , 2018, SIGDIAL Conference.

[162]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[163]  Pieter Abbeel,et al.  Emergence of Grounded Compositional Language in Multi-Agent Populations , 2017, AAAI.

[164]  Zhi Chen,et al.  Policy Adaptation for Deep Reinforcement Learning-Based Dialogue Management , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[165]  Chong Wang,et al.  Subgoal Discovery for Hierarchical Dialogue Policy Learning , 2018, EMNLP.

[166]  Xuanjing Huang,et al.  Toward Diverse Text Generation with Inverse Reinforcement Learning , 2018, IJCAI.

[167]  Bing Liu,et al.  Incorporating the Structure of the Belief State in End-to-End Task-Oriented Dialogue Systems , 2018 .

[168]  Erik Talvitie,et al.  The Effect of Planning Shape on Dyna-style Planning in High-dimensional State Spaces , 2018, ArXiv.

[169]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[170]  Jianfeng Gao,et al.  BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.

[171]  Matthew B Hoy Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants , 2018, Medical reference services quarterly.

[172]  Mike Lewis,et al.  Hierarchical Text Generation and Planning for Strategic Dialogue , 2017, ICML.

[173]  Gökhan Tür,et al.  Building a Conversational Agent Overnight with Dialogue Self-Play , 2018, ArXiv.

[174]  Pei-Hao Su,et al.  Sample Efficient Deep Reinforcement Learning for Dialogue Systems With Large Action Spaces , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[175]  Dilek Z. Hakkani-Tür,et al.  Dialogue Learning with Human Teaching and Feedback in End-to-End Trainable Task-Oriented Dialogue Systems , 2018, NAACL.

[176]  Fabio Viola,et al.  Learning and Querying Fast Generative Models for Reinforcement Learning , 2018, ArXiv.

[177]  Demis Hassabis,et al.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play , 2018, Science.

[178]  Jürgen Schmidhuber,et al.  World Models , 2018, ArXiv.

[179]  Prasoon Goyal,et al.  Using Natural Language for Reward Shaping in Reinforcement Learning , 2019, IJCAI.

[180]  Jason Weston,et al.  ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons , 2019, ArXiv.

[181]  Peng Zhang,et al.  CASA-NLU: Context-Aware Self-Attentive Natural Language Understanding for Task-Oriented Chatbots , 2019, EMNLP/IJCNLP.

[182]  Natasha Jaques,et al.  Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog , 2019, ArXiv.

[183]  Xiaodong Gu,et al.  DialogWAE: Multimodal Response Generation with Conditional Wasserstein Auto-Encoder , 2018, ICLR.

[184]  Yiming Yang,et al.  Switch-based Active Deep Dyna-Q: Efficient Adaptive Planning for Task-Completion Dialogue Policy Learning , 2018, AAAI.

[185]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[186]  Mingyang Zhou,et al.  Building Task-Oriented Visual Dialog Systems Through Alternative Optimization Between Dialog Policy and Language Generation , 2019, EMNLP/IJCNLP.

[187]  Pascale Fung,et al.  HappyBot: Generating Empathetic Dialogue Responses by Improving User Experience Look-ahead , 2019, ArXiv.

[188]  Song Liu,et al.  Personalized Dialogue Generation with Diversified Traits , 2019, ArXiv.

[189]  Lihong Li,et al.  Neural Approaches to Conversational AI , 2019, Found. Trends Inf. Retr..

[190]  Luke Metz,et al.  Learning to Predict Without Looking Ahead: World Models Without Forward Prediction , 2019, NeurIPS.

[191]  L. Lastras,et al.  Doc2Dial: a Framework for Dialogue Composition Grounded in Business Documents , 2019 .

[192]  Jorge Armando Mendez Mendez,et al.  Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings , 2022, ArXiv.

[193]  Zhuoxuan Jiang,et al.  Towards End-to-End Learning for Efficient Dialogue Agent by Modeling Looking-ahead Ability , 2019, SIGdial.

[194]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[195]  Pascale Fung,et al.  Attention over Parameters for Dialogue Systems , 2020, ArXiv.

[196]  Sergey Levine,et al.  Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[197]  Rosalind W. Picard,et al.  Hierarchical Reinforcement Learning for Open-Domain Dialog , 2019, AAAI.

[198]  Zheng Zhang,et al.  Recent advances and challenges in task-oriented dialog systems , 2020, Science China Technological Sciences.

[199]  Demis Hassabis,et al.  Mastering Atari, Go, chess and shogi by planning with a learned model , 2019, Nature.

[200]  Michael Bowling,et al.  Sample-Efficient Model-based Actor-Critic for an Interactive Dialogue Task , 2020, ArXiv.

[201]  Kee-Eung Kim,et al.  Bayes-Adaptive Monte-Carlo Planning and Learning for Goal-Oriented Dialogues , 2020, AAAI.

[202]  Harry Shum,et al.  The Design and Implementation of XiaoIce, an Empathetic Social Chatbot , 2018, CL.

[203]  Mary Williamson,et al.  Can You Put it All Together: Evaluating Conversational Agents’ Ability to Blend Skills , 2020, ACL.

[204]  Jianfeng Gao,et al.  Challenges in Building Intelligent Open-domain Dialog Systems , 2019, ACM Trans. Inf. Syst..

[205]  Rui Zhang,et al.  Dynamic Reward-Based Dueling Deep Dyna-Q: Robust Policy Learning in Noisy Environments , 2020, AAAI.

[206]  Gokhan Tur,et al.  Plato Dialogue System: A Flexible Conversational AI Research Platform , 2020, ArXiv.

[207]  Guangxu Xun,et al.  HSCJN: A Holistic Semantic Constraint Joint Network for Diverse Response Generation , 2019, Comput. Speech Lang..