Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience

We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simopt.

[1]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[2]  Inman Harvey,et al.  Noise and the Reality Gap: The Use of Simulation in Evolutionary Robotics , 1995, ECAL.

[3]  E. Bai,et al.  Block Oriented Nonlinear System Identification , 2010 .

[4]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[5]  Stéphane Doncieux,et al.  Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[6]  Carl E. Rasmussen,et al.  Learning to Control a Low-Cost Manipulator using Data-Efficient Reinforcement Learning , 2011, Robotics: Science and Systems.

[7]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[8]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[9]  Patrick MacAlpine,et al.  Humanoid robots learning to walk faster: from the real world to simulation and back , 2013, AAMAS.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[13]  Emanuel Todorov,et al.  Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Sergey Levine,et al.  Towards Adapting Deep Visuomotor Representations from Simulated to Real Environments , 2015, ArXiv.

[16]  Emanuel Todorov,et al.  Physically consistent state estimation and system identification for contacts , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[17]  Wojciech Zaremba,et al.  Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model , 2016, ArXiv.

[18]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[19]  Byron Boots,et al.  Simulation-based design of dynamic controllers for humanoid balancing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Abhinav Gupta,et al.  Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Sergey Levine,et al.  Collective robot reinforcement learning with distributed asynchronous guided policy search , 2016, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[23]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[25]  Andrew J. Davison,et al.  Transferring End-to-End Visuomotor Control from Simulation to Real World for a Multi-Stage Task , 2017, CoRL.

[26]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[27]  Gaurav S. Sukhatme,et al.  Multi-Modal Imitation Learning from Unstructured Demonstrations using Generative Adversarial Nets , 2017, NIPS.

[28]  Pieter Abbeel,et al.  Mutual Alignment Transfer Learning , 2017, CoRL.

[29]  Danica Kragic,et al.  Reinforcement Learning for Pivoting Task , 2017, ArXiv.

[30]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[31]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[32]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[33]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Sergey Levine,et al.  Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection , 2016, Int. J. Robotics Res..

[35]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[36]  Jan Peters,et al.  Domain Randomization for Simulation-Based Policy Optimization with Transferability Assessment , 2018, CoRL.

[37]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Dieter Fox,et al.  GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning , 2018, CoRL.

[39]  Emanuel Todorov,et al.  Reinforcement learning for non-prehensile manipulation: Transfer from simulation to physical system , 2018, 2018 IEEE International Conference on Simulation, Modeling, and Programming for Autonomous Robots (SIMPAR).

[40]  Kostas E. Bekris,et al.  Fast Model Identification via Physics Engines for Data-Efficient Policy Search , 2017, IJCAI.

[41]  Marcin Andrychowicz,et al.  Asymmetric Actor Critic for Image-Based Robot Learning , 2017, Robotics: Science and Systems.

[42]  Sergey Levine,et al.  QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation , 2018, CoRL.

[43]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[44]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[45]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.