论文信息 - A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

Learning effective visuomotor policies for robots purely from data is challenging, but also appealing since a learning-based system should not require manual tuning or calibration. In the case of a robot operating in a real environment the training process can be costly, time-consuming, and even dangerous since failures are common at the start of training. For this reason, it is desirable to be able to leverage simulation and off-policy data to the extent possible to train the robot. In this work, we introduce a robust framework that plans in simulation and transfers well to the real environment. Our model incorporates a gradient-descent based planning module, which, given the initial image and goal image, encodes the images to a lower dimensional latent state and plans a trajectory to reach the goal. The model, consisting of the encoder and planner modules, is first trained through a meta-learning strategy in simulation. We subsequently perform adversarial domain transfer on the encoder by using a bank of unlabelled but random images from the simulation and real environments to enable the encoder to map images from the real and simulated environments to a similarly distributed latent representation. By fine tuning the entire model (encoder + planner) with only a few real world expert demonstrations, we show successful planning performances in different navigation tasks.

[1] Jonathan P. How,et al. Reinforcement learning with multi-fidelity simulators , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2] Sergey Levine,et al. Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Peter I. Corke,et al. Towards Vision-Based Deep Reinforcement Learning for Robotic Motion Control , 2015, ICRA 2015.

[5] Peter I. Corke,et al. Vision-Based Reaching Using Modular Deep Networks: from Simulation to the Real World , 2016, ArXiv.

[6] Jonas Buchli,et al. Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots , 2016 .

[7] Sergey Levine,et al. Adapting Deep Visuomotor Representations with Weak Pairwise Constraints , 2015, WAFR.

[8] Stephen James,et al. 3D Simulation for Robot Arm Control with Deep Q-Learning , 2016, ArXiv.

[9] Pieter Abbeel,et al. Value Iteration Networks , 2016, NIPS.

[10] Peter L. Bartlett,et al. RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[11] Daan Wierstra,et al. One-shot Learning with Memory-Augmented Neural Networks , 2016, ArXiv.

[12] Kate Saenko,et al. Learning a visuomotor controller for real world robotic grasping using simulated depth images , 2017, CoRL.

[13] Jonathan P. How,et al. Duckietown: An open, inexpensive and flexible platform for autonomy education and research , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14] Zeb Kurth-Nelson,et al. Learning to reinforcement learn , 2016, CogSci.

[15] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Kate Saenko,et al. Learning a visuomotor controller for real world robotic grasping using easily simulated depth images , 2017, ArXiv.

[17] Razvan Pascanu,et al. Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[18] Wojciech Zaremba,et al. Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19] Sergey Levine,et al. Deep reinforcement learning for tensegrity robot locomotion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[20] Tomas Pfister,et al. Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Andrew J. Davison,et al. Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[22] Sergey Levine,et al. Deep visual foresight for planning robot motion , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[23] Stephen Tyree,et al. Sim-to-Real Transfer of Accurate Grasping with Eye-In-Hand Observations and Continuous Control , 2017, ArXiv.

[24] Christopher Burgess,et al. DARLA: Improving Zero-Shot Transfer in Reinforcement Learning , 2017, ICML.

[25] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[26] Sergey Levine,et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[27] Kostas E. Bekris,et al. Model Identification via Physics Engines for Improved Policy Search , 2017, ArXiv.

[28] Sergey Levine,et al. (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[29] Marcin Andrychowicz,et al. Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[30] Razvan Pascanu,et al. Memory-based Parameter Adaptation , 2018, ICLR.

[31] Sergey Levine,et al. Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[32] Thomas L. Griffiths,et al. Recasting Gradient-Based Meta-Learning as Hierarchical Bayes , 2018, ICLR.

[33] Sergey Levine,et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[34] Atil Iscen,et al. Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[35] Joan Bruna,et al. Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[36] Yoshua Bengio,et al. Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[37] Wojciech Zaremba,et al. Domain Randomization and Generative Models for Robotic Grasping , 2017, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[38] Allan Jabri,et al. Universal Planning Networks , 2018, ICML.

[39] Pieter Abbeel,et al. Evolved Policy Gradients , 2018, NeurIPS.

[40] Marcin Andrychowicz,et al. Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[41] Michael Milford,et al. Adversarial discriminative sim-to-real transfer of visuo-motor policies , 2017, Int. J. Robotics Res..