论文信息 - MELD: Meta-Reinforcement Learning from Images via Latent State Models

MELD: Meta-Reinforcement Learning from Images via Latent State Models

Meta-reinforcement learning algorithms can enable autonomous agents, such as robots, to quickly acquire new behaviors by leveraging prior experience in a set of related training tasks. However, the onerous data requirements of meta-training compounded with the challenge of learning from sensory inputs such as images have made meta-RL challenging to apply to real robotic systems. Latent state models, which learn compact state representations from a sequence of observations, can accelerate representation learning from visual inputs. In this paper, we leverage the perspective of meta-learning as task inference to show that latent state models can \emph{also} perform meta-learning given an appropriately defined observation space. Building on this insight, we develop meta-RL with latent dynamics (MELD), an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills given observations and rewards. MELD outperforms prior meta-RL methods on several simulated image-based robotic control problems, and enables a real WidowX robotic arm to insert an Ethernet cable into new locations given a sparse task completion signal after only $8$ hours of real world meta-training. To our knowledge, MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.

[1] V. Gullapalli,et al. Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[2] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[3] J. Tenenbaum. A Bayesian framework for concept learning , 1999 .

[4] Heinz Wörn,et al. Robot Manipulation of Deformable Objects: Advanced Manufacturing , 2000 .

[5] Vijay Kumar,et al. Robotic grasping and contact: a review , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[6] Wyatt S. Newman,et al. Interpretation of force and moment signals for compliant peg-in-hole assembly , 2001, Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164).

[7] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[8] Pietro Perona,et al. A Bayesian approach to unsupervised one-shot learning of object categories , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9] Vijay Kumar,et al. Decentralized Algorithms for Multi-Robot Manipulation via Caging , 2004, Int. J. Robotics Res..

[10] Narri Yadaiah,et al. Neural Network Based State Estimation of Dynamical Systems , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[11] Michael I. Jordan,et al. Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[12] Joelle Pineau,et al. A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes , 2011, J. Mach. Learn. Res..

[13] Jan Peters,et al. Solving Nonlinear Continuous State-Action-Observation POMDPs for Mechanical Systems with Gaussian Noise , 2012, EWRL 2012.

[14] Martin A. Riedmiller,et al. Autonomous reinforcement learning on raw visual input data in a real world application , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[15] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[17] Jan Peters,et al. Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[18] A. Billard,et al. Task Transfer via Collaborative Manipulation for Insertion Assembly , 2014 .

[19] Martin A. Riedmiller,et al. Embed to Control: A Locally Linear Latent Dynamics Model for Control from Raw Images , 2015, NIPS.

[20] David Silver,et al. Memory-based control with recurrent neural networks , 2015, ArXiv.

[21] Peter Stone,et al. Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[22] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[23] Sergey Levine,et al. End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[24] Sergey Levine,et al. Deep spatial autoencoders for visuomotor learning , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[25] Pieter Abbeel,et al. Meta-Learning with Temporal Convolutions , 2017, ArXiv.

[26] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[27] Sergey Levine,et al. One-Shot Visual Imitation Learning via Meta-Learning , 2017, CoRL.

[28] Maximilian Karl,et al. Deep Variational Bayes Filters: Unsupervised Learning of State Space Models from Raw Data , 2016, ICLR.

[29] David Hsu,et al. QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.

[30] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[31] Dieter Fox,et al. Self-Supervised Visual Descriptor Learning for Dense Correspondence , 2017, IEEE Robotics and Automation Letters.

[32] Danica Kragic,et al. Deep predictive policy training using reinforcement learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33] Dieter Fox,et al. Deep Object Pose Estimation for Semantic Robotic Grasping of Household Objects , 2018, CoRL.

[34] Sergey Levine,et al. Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[35] Andrew J. Davison,et al. Task-Embedded Control Networks for Few-Shot Imitation Learning , 2018, CoRL.

[36] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[37] Shimon Whiteson,et al. Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[38] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.

[39] Jitendra Malik,et al. Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies , 2018 .

[40] Sergey Levine,et al. Few-Shot Goal Inference for Visuomotor Learning and Planning , 2018, CoRL.

[41] Sergey Levine,et al. One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning , 2018, Robotics: Science and Systems.

[42] Sergey Levine,et al. End-to-End Robotic Reinforcement Learning without Reward Engineering , 2019, Robotics: Science and Systems.

[43] Marc G. Bellemare,et al. DeepMDP: Learning Continuous Latent Space Models for Representation Learning , 2019, ICML.

[44] Dieter Fox,et al. Contextual Reinforcement Learning of Visuo-tactile Multi-fingered Grasping Policies , 2019, ArXiv.

[45] Sergey Levine,et al. Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables , 2019, ICML.

[46] S. Levine,et al. Guided Meta-Policy Search , 2019, NeurIPS.

[47] Yee Whye Teh,et al. Meta reinforcement learning as task inference , 2019, ArXiv.

[48] Karol Gregor,et al. Temporal Difference Variational Auto-Encoder , 2018, ICLR.

[49] Tamim Asfour,et al. ProMP: Proximal Meta-Policy Search , 2018, ICLR.

[50] Silvio Savarese,et al. Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[51] Sergey Levine,et al. SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning , 2018, ICML.

[52] Sergey Levine,et al. Deep Reinforcement Learning for Industrial Insertion Tasks with Visual Inputs and Natural Reward Signals , 2019 .

[53] Ruben Villegas,et al. Learning Latent Dynamics for Planning from Pixels , 2018, ICML.

[54] Ville Kyrki,et al. Meta Reinforcement Learning for Sim-to-real Domain Adaptation , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[55] Chelsea Finn,et al. Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[56] Luisa M. Zintgraf,et al. VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning , 2019, ICLR.

[57] Andrew J. Davison,et al. Learning One-Shot Imitation From Humans Without Humans , 2019, IEEE Robotics and Automation Letters.

[58] Russ Tedrake,et al. Self-Supervised Correspondence in Visuomotor Policy Learning , 2019, IEEE Robotics and Automation Letters.

[59] Felipe Petroski Such,et al. Generalized Hidden Parameter MDPs Transferable Model-based RL in a Handful of Trials , 2020, AAAI.

[60] Sergey Levine,et al. Stochastic Latent Actor-Critic: Deep Reinforcement Learning with a Latent Variable Model , 2019, NeurIPS.

[61] Joelle Pineau,et al. Improving Sample Efficiency in Model-Free Reinforcement Learning from Images , 2019, AAAI.