论文信息 - COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration

COBRA: Data-Efficient Model-Based RL through Unsupervised Object Discovery and Curiosity-Driven Exploration

Data efficiency and robustness to task-irrelevant perturbations are long-standing challenges for deep reinforcement learning algorithms. Here we introduce a modular approach to addressing these challenges in a continuous control environment, without using hand-crafted or supervised information. Our Curious Object-Based seaRch Agent (COBRA) uses task-free intrinsically motivated exploration and unsupervised learning to build object-based models of its environment and action space. Subsequently, it can learn a variety of tasks through model-based search in very few steps and excel on structured hold-out tests of policy robustness.

[1] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[2] Anna Chadwick. The Scientist in the Crib -- Minds, Brains, and How Children Learn , 2001 .

[3] Pieter Abbeel,et al. Learning Plannable Representations with Causal InfoGAN , 2018, NeurIPS.

[4] Jakub W. Pachocki,et al. Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[5] Andre Cohen,et al. An object-oriented representation for efficient reinforcement learning , 2008, ICML '08.

[6] Joshua B. Tenenbaum,et al. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[7] Marc G. Bellemare,et al. Distributional Reinforcement Learning with Quantile Regression , 2017, AAAI.

[8] Yuval Tassa,et al. Maximum a Posteriori Policy Optimisation , 2018, ICLR.

[9] Sergey Levine,et al. Learning Powerful Policies by Using Consistent Dynamics Model , 2019, ArXiv.

[10] Alexei A. Efros,et al. Curiosity-Driven Exploration by Self-Supervised Prediction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11] Dileep George,et al. Beyond imitation: Zero-shot task transfer on robots by learning concepts as cognitive programs , 2018, Science Robotics.

[12] Stefanie Tellex,et al. Deep Abstract Q-Networks , 2017, AAMAS.

[13] Marc G. Bellemare,et al. A Distributional Perspective on Reinforcement Learning , 2017, ICML.

[14] Andrea Lockerd Thomaz,et al. Object focused q-learning for autonomous agents , 2013, AAMAS.

[15] J. Schmidhuber. Making the world differentiable: on using self supervised fully recurrent neural networks for dynamic reinforcement learning and planning in non-stationary environments , 1990, Forschungsberichte, TU Munich.

[16] Jürgen Schmidhuber,et al. World Models , 2018, ArXiv.

[17] Jürgen Schmidhuber,et al. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990–2010) , 2010, IEEE Transactions on Autonomous Mental Development.

[18] Rémi Munos,et al. Implicit Quantile Networks for Distributional Reinforcement Learning , 2018, ICML.

[19] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[20] Thomas Lukasiewicz,et al. Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards , 2019, AAAI.

[21] Sergey Levine,et al. Model-Based Reinforcement Learning for Atari , 2019, ICLR.

[22] Razvan Pascanu,et al. Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[23] Murray Shanahan,et al. Towards Deep Symbolic Reinforcement Learning , 2016, ArXiv.

[24] Donald W. Bouldin,et al. A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26] Carl E. Rasmussen,et al. PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[27] Sergey Levine,et al. SOLAR: Deep Structured Latent Representations for Model-Based Reinforcement Learning , 2018, ArXiv.

[28] Razvan Pascanu,et al. A simple neural network module for relational reasoning , 2017, NIPS.

[29] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30] Tom Schaul,et al. FeUdal Networks for Hierarchical Reinforcement Learning , 2017, ICML.

[31] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[32] Pierre-Yves Oudeyer,et al. Curiosity Driven Exploration of Learned Disentangled Goal Spaces , 2018, CoRL.

[33] Emma Brunskill,et al. Strategic Object Oriented Reinforcement Learning , 2018, ArXiv.

[34] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35] Jürgen Schmidhuber,et al. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[36] Matthew Botvinick,et al. MONet: Unsupervised Scene Decomposition and Representation , 2019, ArXiv.

[37] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[38] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[39] Sergey Levine,et al. Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).