Policy Transfer via Kinematic Domain Randomization and Adaptation

Transferring reinforcement learning policies trained in physics simulation to the real hardware remains a challenge, known as the "sim-to-real" gap. Domain randomization is a simple yet effective technique to address dynamics discrepancies across source and target domains, but its success generally depends on heuristics and trial-and-error. In this work we investigate the impact of randomized parameter selection on policy transferability across different types of domain discrepancies. Contrary to common practice in which kinematic parameters are carefully measured while dynamic parameters are randomized, we found that virtually randomizing kinematic parameters (e.g., link lengths) during training in simulation generally outperforms dynamic randomization. Based on this finding, we introduce a new domain adaptation algorithm that utilizes simulated kinematic parameters variation. Our algorithm, Multi-Policy Bayesian Optimization, trains an ensemble of universal policies conditioned on virtual kinematic parameters and efficiently adapts to the target environment using a limited number of target domain rollouts. We showcase our findings on a simulated quadruped robot in five different target environments covering different aspects of domain discrepancies.

[1]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[2]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[3]  C. Karen Liu,et al.  Policy Transfer with Strategy Optimization , 2018, ICLR.

[4]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[7]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[8]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[9]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[10]  Peter Stone,et al.  Stochastic Grounded Action Transformation for Robot Learning in Simulation , 2017, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[12]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  Alan Fern,et al.  Learning Memory-Based Control for Human-Scale Bipedal Locomotion , 2020, Robotics: Science and Systems.

[15]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.

[16]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[17]  Jonas Buchli,et al.  Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots , 2016 .

[18]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[20]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[21]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[22]  Wenbo Gao,et al.  Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning , 2020, ArXiv.