论文信息 - Simulator Predictive Control: Using Learned Task Representations and MPC for Zero-Shot Generalization and Sequencing

Simulator Predictive Control: Using Learned Task Representations and MPC for Zero-Shot Generalization and Sequencing

Simulation-to-real transfer is an important strategy for making reinforcement learning practical with real robots. Successful sim-to-real transfer systems have difficulty producing policies which generalize across tasks, despite training for thousands of hours equivalent real robot time. To address this challenge, we present a novel approach to efficiently performing new robotic tasks directly on a real robot, based on model-predictive control (MPC) and learned task representations. Rather than end-to-end learning policies for single tasks in simulation and attempting to transfer them, we use simulation to learn (1) an embedding function encoding a latent representation of task components (skills), and (2) a single latent-conditioned policy for all tasks, and directly transfer the frozen policy to the real robot. We then use MPC to perform new tasks without any exploration in the real environment, by choosing latent skill vectors to feed to the frozen policy, controlling the real system in skill latent space. Our MPC model is the frozen skill latent-conditioned policy, executed in the simulation environment, run in parallel with the real robot. In short, we show how to reuse the simulation from the pre-training step of sim-to-real methods as a tool for foresight, allowing the sim-to-real policy adapt to unseen tasks. We discuss the background and principles of our method, detail its practical implementation, and evaluate its performance by using our method to train a real Sawyer Robot to achieve motion tasks such as drawing and block pushing.

[1] Shane Legg,et al. Human-level control through deep reinforcement learning , 2015, Nature.

[2] Inman Harvey,et al. Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[3] Marcin Andrychowicz,et al. Hindsight Experience Replay , 2017, NIPS.

[4] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[5] Karol Hausman,et al. Learning an Embedding Space for Transferable Robot Skills , 2018, ICLR.

[6] Sergey Levine,et al. Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings , 2018, ICML.

[7] Pieter Abbeel,et al. Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[8] Gaurav S. Sukhatme,et al. Scaling simulation-to-real transfer by learning composable robot skills , 2018, ISER.

[9] Sergey Levine,et al. Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[10] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11] Peter Stone,et al. Transfer learning for reinforcement learning on a physical robot , 2010, AAMAS 2010.

[12] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[13] Sergey Levine,et al. Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[14] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[15] Jan Peters,et al. Extracting low-dimensional control variables for movement primitives , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[16] Katja Hofmann,et al. Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[17] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[18] Demis Hassabis,et al. Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[19] Marc Peter Deisenroth,et al. Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control , 2017, AISTATS.

[20] Stefan Schaal,et al. Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).