SimGAN: Hybrid Simulator Identification for Domain Adaptation via Adversarial Reinforcement Learning

As learning-based approaches progress towards automating robot controllers design, transferring learned policies to new domains with different dynamics (e.g. sim-to-real transfer) still demands manual effort. This paper introduces SimGAN, a framework to tackle domain adaptation by identifying a hybrid physics simulator to match the simulated trajectories to the ones from the target domain, using a learned discriminative loss to address the limitations associated with manual loss design. Our hybrid simulator combines neural networks and traditional physics simulaton to balance expressiveness and generalizability, and alleviates the need for a carefully selected parameter set in System ID. Once the hybrid simulator is identified via adversarial reinforcement learning, it can be used to refine policies for the target domain, without the need to collect more data. We show that our approach outperforms multiple strong baselines on six robotic locomotion tasks for domain adaptation.

[1]  Mohi Khansari,et al.  RL-CycleGAN: Reinforcement Learning Aware Simulation-to-Real , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[3]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Patrick MacAlpine,et al.  Humanoid robots learning to walk faster: from the real world to simulation and back , 2013, AAMAS.

[5]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Byron Boots,et al.  Simulation-based design of dynamic controllers for humanoid balancing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[8]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[9]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10]  Jackie Kay,et al.  Modelling Generalized Forces with Reinforcement Learning for Sim-to-Real Transfer , 2019, ArXiv.

[11]  Sergey Levine,et al.  Sim-To-Real via Sim-To-Sim: Data-Efficient Robotic Grasping via Randomized-To-Canonical Adaptation Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[13]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[14]  Sergey Levine,et al.  (CAD)$^2$RL: Real Single-Image Flight without a Single Real Image , 2016, Robotics: Science and Systems.

[15]  Timothy Hospedales,et al.  Adversarial Generation of Informative Trajectories for Dynamics System Identification , 2020, ArXiv.

[16]  M. Gautier,et al.  Exciting Trajectories for the Identification of Base Inertial Parameters of Robots , 1991, [1991] Proceedings of the 30th IEEE Conference on Decision and Control.

[17]  Kuan-Ting Yu,et al.  More than a million ways to be pushed. A high-fidelity experimental dataset of planar pushing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Ruslan Salakhutdinov,et al.  Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers , 2020, ArXiv.

[19]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[20]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[21]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[22]  Atil Iscen,et al.  Data Efficient Reinforcement Learning for Legged Robots , 2019, CoRL.

[23]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[24]  Sicun Gao,et al.  Provably Efficient Model-based Policy Adaptation , 2020, ICML.

[25]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[26]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[27]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[28]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[29]  Christopher Joseph Pal,et al.  Active Domain Randomization , 2019, CoRL.

[30]  Gentiane Venture,et al.  Identifiability and identification of inertial parameters using the underactuated base-link dynamics for legged multibody systems , 2014, Int. J. Robotics Res..

[31]  Sergey Levine,et al.  Learning to Adapt in Dynamic, Real-World Environments through Meta-Reinforcement Learning , 2018, ICLR.

[32]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[33]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Wenbo Gao,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, ArXiv.

[35]  Sergey Levine,et al.  Using Simulation and Domain Adaptation to Improve Efficiency of Deep Robotic Grasping , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[37]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[38]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[39]  Jan Peters,et al.  Bayesian Domain Randomization for Sim-to-Real Transfer , 2020, ArXiv.

[40]  Manmohan Krishna Chandraker,et al.  Learning To Simulate , 2018, ICLR.

[41]  Kostas E. Bekris,et al.  Fast Model Identification via Physics Engines for Data-Efficient Policy Search , 2017, IJCAI.