INFO RMATION P RIORITIZATION THROUGH E M POWER MENT IN V ISUAL M ODEL -B ASED RL

Model-based algorithms designed for handling complex visual typically learn some sort of latent representation, implicitly. Standard methods of this do functionally relevant aspects of the state and irrelevant distractors, instead aiming to represent all available information We propose a modified objective for model-based RL that, in combination with mutual information maximization, allows us to learn representations and dynamics for visual model-based RL without reconstruction in a way that explicitly prioritizes functionally relevant factors. The key principle behind our design is to integrate a term inspired by variational empowerment into a state-space model based on mutual information. This term prioritizes information that is correlated with action, thus ensur-ing that functionally relevant factors are captured first. Furthermore, the same empowerment term also promotes faster exploration during the RL process, es-pecially for sparse-reward tasks where the reward signal is insufficient to drive exploration in the early stages of learning. We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds, and show that the proposed prioritized information objective outperforms state-of-the-art model based RL approaches with higher sample efficiency and episodic returns.

[1]  Pulkit Agrawal,et al.  Learning Task Informed Abstractions , 2021, ICML.

[2]  Sergey Levine,et al.  Which Mutual-Information Representation Learning Objectives are Sufficient for Control? , 2021, NeurIPS.

[3]  Stefano Ermon,et al.  Temporal Predictive Coding For Model-Based Planning In Latent Space , 2021, ICML.

[4]  Ofir Nachum,et al.  Provable Representation Learning for Imitation with Contrastive Fourier Features , 2021, NeurIPS.

[5]  Florian Shkurti,et al.  Latent Skill Planning for Exploration and Transfer , 2020, ICLR.

[6]  Xiaolong Wang,et al.  Generalization in Reinforcement Learning by Soft Data Augmentation , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Yoshinori Takei,et al.  Analysis of the Convergence Speed of the Arimoto-Blahut Algorithm by the Second-Order Recurrence Formula , 2020, IEEE Transactions on Information Theory.

[8]  S. Levine,et al.  Learning Invariant Representations for Reinforcement Learning without Reconstruction , 2020, ICLR.

[9]  R. Fergus,et al.  Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels , 2020, ICLR.

[10]  R. Fergus,et al.  Automatic Data Augmentation for Generalization in Reinforcement Learning , 2021, Neural Information Processing Systems.

[11]  Xiao Ma,et al.  Contrastive Variational Model-Based Reinforcement Learning for Complex Observations , 2020, ArXiv.

[12]  R. Devon Hjelm,et al.  Representation Learning with Video Deep InfoMax , 2020, ArXiv.

[13]  Honglak Lee,et al.  Predictive Information Accelerates Learning in RL , 2020, NeurIPS.

[14]  P. Abbeel,et al.  Reinforcement Learning with Augmented Data , 2020, NeurIPS.

[15]  Pieter Abbeel,et al.  CURL: Contrastive Unsupervised Representations for Reinforcement Learning , 2020, ICML.

[16]  Stefano Ermon,et al.  Predictive Coding for Locally-Linear Control , 2020, ICML.

[17]  Jimmy Ba,et al.  Dream to Control: Learning Behaviors by Latent Imagination , 2019, ICLR.

[18]  Michael Tschannen,et al.  On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[19]  Sergey Levine,et al.  Dynamics-Aware Unsupervised Discovery of Skills , 2019, ICLR.

[20]  Sergey Levine,et al.  Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[21]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[22]  Jordi Grau-Moya,et al.  A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment , 2019, NeurIPS.

[23]  Aaron van den Oord,et al.  Shaping Belief States with Generative Environment Models for RL , 2019, NeurIPS.

[24]  Marc G. Bellemare,et al.  DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[25]  Alexander A. Alemi,et al.  On Variational Bounds of Mutual Information , 2019, ICML.

[26]  Ruben Villegas,et al.  Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[27]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[28]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[29]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[30]  Joelle Pineau,et al.  Decoupling Dynamics and Reward for Transfer Learning , 2018, ICLR.

[31]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[32]  Sergey Levine,et al.  Stochastic Variational Video Prediction , 2017, ICLR.

[33]  Fabio Viola,et al.  The Kinetics Human Action Video Dataset , 2017, ArXiv.

[34]  Alexei A. Efros,et al.  Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Trevor Darrell,et al.  Loss is its own Reward: Self-Supervision for Reinforcement Learning , 2016, ICLR.

[36]  Daan Wierstra,et al.  Variational Intrinsic Control , 2016, ICLR.

[37]  Sergey Levine,et al.  Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Jitendra Malik,et al.  Learning to Poke by Poking: Experiential Learning of Intuitive Physics , 2016, NIPS.

[39]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[40]  Martin A. Riedmiller,et al.  Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[41]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[42]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[43]  Susanne Still,et al.  Information-theoretic approach to interactive learning , 2007, 0709.1948.

[44]  Chrystopher L. Nehaniv,et al.  Keep Your Options Open: An Information-Based Driving Principle for Sensorimotor Systems , 2008, PloS one.

[45]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[46]  Richard S. Sutton,et al.  Dyna, an integrated architecture for learning, planning, and reacting , 1990, SGAR.

[47]  Richard E. Blahut,et al.  Computation of channel capacity and rate-distortion functions , 1972, IEEE Trans. Inf. Theory.