Stochastic Curiosity Exploration for Dialogue Systems

Traditionally, task-oriented dialogue system is built by an autonomous agent which can be trained by reinforcement learning where the reward from environment is maximized. The agent is learned by updating the policy when the goal state is observed. However, in real world, the extrinsic reward is usually sparse or missing. The training efficiency is bounded. The system performance is degraded. It is challenging to tackle the issue of sample efficiency in sparse reward scenario for spoken dialogues. Accordingly, a dialogue agent needs additional information to update its policy even in the period when reward is absent in the environment. This paper presents a new dialogue agent which is learned by incorporating the intrinsic reward based on the information-theoretic approach via stochastic curiosity exploration. This agent encourages the exploration for future diversity based on a latent dynamic architecture which consists of encoder network, curiosity network, information network and policy network. The latent states and actions are drawn to predict stochastic transition for future. The curiosity learning are implemented with intrinsic reward in a metric of mutual information and prediction error in the predicted states and actions. Experiments on dialogue management using PyDial demonstrate the benefit by using the stochastic curiosity exploration.

[1]  Jen-Tzung Chien,et al.  Joint acoustic and language modeling for speech recognition , 2010, Speech Commun..

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Jen-Tzung Chien,et al.  Meta Learning for Hyperparameter Optimization in Dialogue System , 2019, INTERSPEECH.

[4]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[5]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[6]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[7]  Richard S. Sutton,et al.  Temporal credit assignment in reinforcement learning , 1984 .

[8]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[9]  Marc Pollefeys,et al.  Episodic Curiosity through Reachability , 2018, ICLR.

[10]  Stefan Ultes,et al.  A Benchmarking Environment for Reinforcement Learning Based Task Oriented Dialogue Management , 2017, ArXiv.

[11]  Man-Wai Mak,et al.  Variational Domain Adversarial Learning With Mutual Information Maximization for Speaker Verification , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  David Vandyke,et al.  PyDial: A Multi-domain Statistical Dialogue System Toolkit , 2017, ACL.

[13]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[14]  Jen-Tzung Chien,et al.  Deep Bayesian Learning and Understanding , 2018, COLING.

[15]  Maxine Eskénazi,et al.  Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation , 2018, ACL.

[16]  Jen-Tzung Chien,et al.  Deep reinforcement learning for automated radiation adaptation in lung cancer , 2017, Medical physics.

[17]  Jen-Tzung Chien,et al.  Deep Bayesian Natural Language Processing , 2019, ACL.

[18]  Po-Chien Hsu,et al.  Stochastic Curiosity Maximizing Exploration , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[19]  Jianfeng Gao,et al.  BBQ-Networks: Efficient Exploration in Deep Reinforcement Learning for Task-Oriented Dialogue Systems , 2016, AAAI.

[20]  Jen-Tzung Chien,et al.  Bayesian Speech and Language Processing , 2015 .

[21]  Jen-Tzung Chien,et al.  Association pattern language modeling , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Nicolas Macris,et al.  Entropy and mutual information in models of deep neural networks , 2018, NeurIPS.

[23]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[25]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[26]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[27]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[28]  Issam El-Naqa,et al.  Exploring State Transition Uncertainty in Variational Reinforcement Learning , 2021, 2020 28th European Signal Processing Conference (EUSIPCO).

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[31]  Peter Stone,et al.  Intrinsically motivated model learning for developing curious robots , 2017, Artif. Intell..

[32]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[33]  S. Varadhan,et al.  Asymptotic evaluation of certain Markov process expectations for large time , 1975 .

[34]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.