Policy Transfer via Kinematic Domain Randomization and Adaptation

Transferring reinforcement learning policies trained in physics simulation to the real hardware remains a challenge, known as the "sim-to-real" gap. Domain randomization is a simple yet effective technique to address dynamics discrepancies across source and target domains, but its success generally depends on heuristics and trial-and-error. In this work we investigate the impact of randomized parameter selection on policy transferability across different types of domain discrepancies. Contrary to common practice in which kinematic parameters are carefully measured while dynamic parameters are randomized, we found that virtually randomizing kinematic parameters (e.g., link lengths) during training in simulation generally outperforms dynamic randomization. Based on this finding, we introduce a new domain adaptation algorithm that utilizes simulated kinematic parameters variation. Our algorithm, Multi-Policy Bayesian Optimization, trains an ensemble of universal policies conditioned on virtual kinematic parameters and efficiently adapts to the target environment using a limited number of target domain rollouts. We showcase our findings on a simulated quadruped robot in five different target environments covering different aspects of domain discrepancies.

[1]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[3]  Alan Fern,et al.  Learning Memory-Based Control for Human-Scale Bipedal Locomotion , 2020, Robotics: Science and Systems.

[4]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[5]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[9]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[10]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11]  C. Karen Liu,et al.  Policy Transfer with Strategy Optimization , 2018, ICLR.

[12]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[13]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[14]  Jonas Buchli,et al.  Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots , 2016 .

[15]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[16]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[17]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[18]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[19]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[20]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[21]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Chelsea Finn,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).