Policy Transfer with Strategy Optimization

Computer simulation provides an automatic and safe way for training robotic control policies to achieve complex tasks such as locomotion. However, a policy trained in simulation usually does not transfer directly to the real hardware due to the differences between the two environments. Transfer learning using domain randomization is a promising approach, but it usually assumes that the target environment is close to the distribution of the training environments, thus relying heavily on accurate system identification. In this paper, we present a different approach that leverages domain randomization for transferring control policies to unknown environments. The key idea that, instead of learning a single policy in the simulation, we simultaneously learn a family of policies that exhibit different behaviors. When tested in the target environment, we directly search for the best policy in the family based on the task performance, without the need to identify the dynamic parameters. We evaluate our method on five simulated robotic control problems with different discrepancies in the training and testing environment and demonstrate that our method can overcome larger modeling errors compared to training a robust policy or an adaptive policy.

[1]  Razvan Pascanu,et al.  Sim-to-Real Robot Learning from Pixels with Progressive Nets , 2016, CoRL.

[2]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[3]  Pieter Abbeel,et al.  Mutual Alignment Transfer Learning , 2017, CoRL.

[4]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[5]  Sehoon Ha,et al.  Reducing hardware experiments for model learning and policy optimization , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[8]  Balaraman Ravindran,et al.  EPOpt: Learning Robust Neural Network Policies Using Model Ensembles , 2016, ICLR.

[9]  Greg Turk,et al.  Preparing for the Unknown: Learning a Universal Policy with Online System Identification , 2017, Robotics: Science and Systems.

[10]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[11]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[13]  Olivier Sigaud,et al.  Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[14]  Tao Chen,et al.  Hardware Conditioned Policies for Multi-Robot Transfer Learning , 2018, NeurIPS.

[15]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[16]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[17]  Nikolaus Hansen,et al.  On the Adaptation of Arbitrary Normal Mutation Distributions in Evolution Strategies: The Generating Set Adaptation , 1995, ICGA.

[18]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[19]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[20]  Jonas Buchli,et al.  Why off-the-shelf physics simulators fail in evaluating feedback controller performance - a case study for quadrupedal robots , 2016 .

[21]  Sergey Levine,et al.  Diversity is All You Need: Learning Skills without a Reward Function , 2018, ICLR.

[22]  Yang Yu,et al.  Learning Environmental Calibration Actions for Policy Self-Evolution , 2018, IJCAI.

[23]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[25]  Thomas Bräunl,et al.  Leveraging multiple simulators for crossing the reality gap , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[26]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[27]  Stéphane Doncieux,et al.  Crossing the reality gap in evolutionary robotics by promoting transferable controllers , 2010, GECCO '10.

[28]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[29]  Pieter Abbeel,et al.  Exploration and apprenticeship learning in reinforcement learning , 2005, ICML.

[30]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[31]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[32]  Byron Boots,et al.  Simulation-based design of dynamic controllers for humanoid balancing , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[33]  C. Karen Liu,et al.  Controlling physics-based characters using soft contacts , 2011, ACM Trans. Graph..