Implicit Generative Modeling for Efficient Exploration

Efficient exploration remains a challenging problem in reinforcement learning, especially for those tasks where rewards from environments are sparse. A commonly used approach for exploring such environments is to introduce some "intrinsic" reward. In this work, we focus on model uncertainty estimation as an intrinsic reward for efficient exploration. In particular, we introduce an implicit generative modeling approach to estimate a Bayesian uncertainty of the agent's belief of the environment dynamics. Each random draw from our generative model is a neural network that instantiates the dynamic function, hence multiple draws would approximate the posterior, and the variance in the future prediction based on this posterior is used as an intrinsic reward for exploration. We design a training algorithm for our generative model based on the amortized Stein Variational Gradient Descent. In experiments, we compare our implementation with state-of-the-art intrinsic reward-based exploration approaches, including two recent approaches based on an ensemble of dynamic models. In challenging exploration tasks, our implicit generative model consistently outperforms competing approaches regarding data efficiency in exploration.

[1]  Marcin Andrychowicz,et al.  Parameter Space Noise for Exploration , 2017, ICLR.

[2]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[3]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[4]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[5]  Luca Ambrogioni,et al.  Wasserstein Variational Gradient Descent: From Semi-Discrete Optimal Transport to Ensemble Variational Inference , 2018, ArXiv.

[6]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[7]  K. Chaloner,et al.  Bayesian Experimental Design: A Review , 1995 .

[8]  Fuxin Li,et al.  HyperGAN: A Generative Model for Diverse, Performant Neural Networks , 2019, ICML.

[9]  Marc G. Bellemare,et al.  Count-Based Exploration with Neural Density Models , 2017, ICML.

[10]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[11]  Benjamin Van Roy,et al.  Deep Exploration via Bootstrapped DQN , 2016, NIPS.

[12]  Albin Cassirer,et al.  Randomized Prior Functions for Deep Reinforcement Learning , 2018, NeurIPS.

[13]  Sebastian Nowozin,et al.  Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift , 2019, NeurIPS.

[14]  Tom Schaul,et al.  Unifying Count-Based Exploration and Intrinsic Motivation , 2016, NIPS.

[15]  Alexei A. Efros,et al.  Large-Scale Study of Curiosity-Driven Learning , 2018, ICLR.

[16]  Quoc V. Le,et al.  Swish: a Self-Gated Activation Function , 2017, 1710.05941.

[17]  Wojciech Jaskowski,et al.  Model-Based Active Exploration , 2018, ICML.

[18]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[21]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[22]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[23]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Le Song,et al.  Provable Bayesian Inference via Particle Mirror Descent , 2015, AISTATS.

[26]  Quoc V. Le,et al.  Searching for Activation Functions , 2018, arXiv.

[27]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[28]  Sergey Levine,et al.  Deep Reinforcement Learning in a Handful of Trials using Probabilistic Dynamics Models , 2018, NeurIPS.

[29]  Shane Legg,et al.  Noisy Networks for Exploration , 2017, ICLR.

[30]  Jürgen Schmidhuber,et al.  Exploring the predictable , 2003 .

[31]  Amos J. Storkey,et al.  Exploration by Random Network Distillation , 2018, ICLR.

[32]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[33]  Zheng Wen,et al.  Deep Exploration via Randomized Value Functions , 2017, J. Mach. Learn. Res..

[34]  Christoph Salge,et al.  Empowerment - an Introduction , 2013, ArXiv.

[35]  Yi Sun,et al.  Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments , 2011, AGI.

[36]  P. Diaconis,et al.  Use of exchangeable pairs in the analysis of simulations , 2004 .

[37]  Filip De Turck,et al.  VIME: Variational Information Maximizing Exploration , 2016, NIPS.

[38]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[39]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[40]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[41]  Deepak Pathak,et al.  Self-Supervised Exploration via Disagreement , 2019, ICML.

[42]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[43]  Richard E. Turner,et al.  Gradient Estimators for Implicit Models , 2017, ICLR.

[44]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.