论文信息 - Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations

Zero-Shot Skill Composition and Simulation-to-Real Transfer by Learning Task Representations

Simulation-to-real transfer is an important strategy for making reinforcement learning practical with real robots. Successful sim-to-real transfer systems have difficulty producing policies which generalize across tasks, despite training for thousands of hours equivalent real robot time. To address this shortcoming, we present a novel approach to efficiently learning new robotic skills directly on a real robot, based on model-predictive control (MPC) and an algorithm for learning task representations. In short, we show how to reuse the simulation from the pre-training step of sim-to-real methods as a tool for foresight, allowing the sim-to-real policy adapt to unseen tasks. Rather than end-to-end learning policies for single tasks and attempting to transfer them, we first use simulation to simultaneously learn (1) a continuous parameterization (i.e. a skill embedding or latent) of task-appropriate primitive skills, and (2) a single policy for these skills which is conditioned on this representation. We then directly transfer our multi-skill policy to a real robot, and actuate the robot by choosing sequences of skill latents which actuate the policy, with each latent corresponding to a pre-learned primitive skill controller. We complete unseen tasks by choosing new sequences of skill latents to control the robot using MPC, where our MPC model is composed of the pre-trained skill policy executed in the simulation environment, run in parallel with the real robot. We discuss the background and principles of our method, detail its practical implementation, and evaluate its performance by using our method to train a real Sawyer Robot to achieve motion tasks such as drawing and block pushing.

[1] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[2] Peter Stone,et al. Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.

[3] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[4] Marc Peter Deisenroth,et al. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[5] Jan Peters,et al. Extracting low-dimensional control variables for movement primitives , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[7] Arnoud Visser,et al. Closing the gap between simulation and reality in the sensor and motion models of an autonomous AR.Drone , 2011 .

[8] Stefan Schaal,et al. Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[9] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[10] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[12] Gaurav S. Sukhatme,et al. Scaling simulation-to-real transfer by learning composable robot skills , 2018, ISER.

[13] Sergey Levine,et al. Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[15] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[16] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[17] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[18] Katja Hofmann,et al. Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[19] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[20] Inman Harvey,et al. Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[21] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.