Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning

The mutual information is a core statistical quantity that has applications in all areas of machine learning, whether this is in training of density models over multiple data modalities, in maximising the efficiency of noisy transmission channels, or when learning behaviour policies for exploration by artificial agents. Most learning algorithms that involve optimisation of the mutual information rely on the Blahut-Arimoto algorithm --- an enumerative algorithm with exponential complexity that is not suitable for modern machine learning applications. This paper provides a new approach for scalable optimisation of the mutual information by merging techniques from variational inference and deep learning. We develop our approach by focusing on the problem of intrinsically-motivated learning, where the mutual information forms the definition of a well-known internal drive known as empowerment. Using a variational lower bound on the mutual information, combined with convolutional networks for handling visual input streams, we develop a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions.

[1]  Thomas M. Cover,et al.  Elements of Information Theory , 1991 .

[2]  C. Cruz,et al.  Improving the Mean Field Approximation via the Use of Mixture Distributions , 1998 .

[3]  Nicolas Brunel,et al.  Mutual Information, Fisher Information, and Population Coding , 1998, Neural Computation.

[4]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[5]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[6]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[7]  Alexander J. Smola,et al.  The kernel mutual information , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[9]  Nuttapong Chentanez,et al.  Intrinsically Motivated Reinforcement Learning , 2004, NIPS.

[10]  Jonathan D. Nelson Finding useful questions: on Bayesian diagnosticity, probability, impact, and information gain. , 2005, Psychological review.

[11]  Chrystopher L. Nehaniv,et al.  Empowerment: a universal agent-centric measure of control , 2005, 2005 IEEE Congress on Evolutionary Computation.

[12]  Pierre-Yves Oudeyer,et al.  How can we define intrinsic motivation , 2008 .

[13]  R. Yeung The Blahut-Arimoto Algorithms , 2008 .

[14]  Pierre-Yves Oudeyer,et al.  How can we define intrinsic motivation , 2008 .

[15]  Pierre Baldi,et al.  Bayesian surprise attracts human attention , 2005, Vision Research.

[16]  Jürgen Schmidhuber,et al.  Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[17]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[18]  Doina Precup,et al.  An information-theoretic approach to curiosity-driven reinforcement learning , 2012, Theory in Biosciences.

[19]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[20]  Peter Stone,et al.  Empowerment for continuous agent—environment systems , 2011, Adapt. Behav..

[21]  Naftali Tishby,et al.  Trading Value and Information in MDPs , 2012 .

[22]  Joachim M. Buhmann,et al.  Information Theoretic Model Selection for Pattern Analysis , 2011, ICML Unsupervised and Transfer Learning.

[23]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[24]  Friedrich T. Sommer,et al.  Learning and exploration in action-perception loops , 2013, Front. Neural Circuits.

[25]  Christoph Salge,et al.  Empowerment - an Introduction , 2013, ArXiv.

[26]  A D Wissner-Gross,et al.  Causal entropic forces. , 2013, Physical review letters.

[27]  Diederik P. Kingma,et al.  Stochastic Gradient VB and the Variational Auto-Encoder , 2013 .

[28]  Jürgen Schmidhuber,et al.  Evolving deep unsupervised convolutional networks for vision-based reinforcement learning , 2014, GECCO.

[29]  Christoph Salge,et al.  Changing the Environment Based on Empowerment as Intrinsic Motivation , 2014, Entropy.

[30]  Daan Wierstra,et al.  Stochastic Back-propagation and Variational Inference in Deep Latent Gaussian Models , 2014, ArXiv.

[31]  Aram Galstyan,et al.  Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.

[32]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.