Protective Policy Transfer

Being able to transfer existing skills to new situations is a key capability when training robots to operate in unpredictable real-world environments. A successful transfer algorithm should not only minimize the number of samples that the robot needs to collect in the new environment, but also prevent the robot from damaging itself or the surrounding environment during the transfer process. In this work, we introduce a policy transfer algorithm for adapting robot motor skills to novel scenarios while minimizing serious failures. Our algorithm trains two control policies in the training environment: a task policy that is optimized to complete the task of interest, and a protective policy that is dedicated to keep the robot from unsafe events (e.g. falling to the ground). To decide which policy to use during execution, we learn a safety estimator model in the training environment that estimates a continuous safety level of the robot. When used with a set of thresholds, the safety estimator becomes a classifier for switching between the protective policy and the task policy. We evaluate our approach on four simulated robot locomotion problems and a 2D navigation problem and show that our method can achieve successful transfer to notably different environments while taking the robot's safety into consideration.

[1]  C. Karen Liu,et al.  Policy Transfer with Strategy Optimization , 2018, ICLR.

[2]  Safety-Guided Deep Reinforcement Learning via Online Gaussian Process Estimation , 2019, ArXiv.

[3]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Marcin Andrychowicz,et al.  Solving Rubik's Cube with a Robot Hand , 2019, ArXiv.

[5]  S. Shankar Sastry,et al.  Provably safe and robust learning-based model predictive control , 2011, Autom..

[6]  Shie Mannor,et al.  Reward Constrained Policy Optimization , 2018, ICLR.

[7]  Paulo Tabuada,et al.  Control Barrier Functions: Theory and Applications , 2019, 2019 18th European Control Conference (ECC).

[8]  Andrew G. Barto,et al.  Lyapunov Design for Safe Reinforcement Learning , 2003, J. Mach. Learn. Res..

[9]  Javier García,et al.  Safe Exploration of State and Action Spaces in Reinforcement Learning , 2012, J. Artif. Intell. Res..

[10]  C. Karen Liu,et al.  Sim-to-Real Transfer for Biped Locomotion , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[13]  Pieter Abbeel,et al.  Constrained Policy Optimization , 2017, ICML.

[14]  Gábor Orosz,et al.  End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.

[15]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[16]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[17]  Hongyuan Zha,et al.  Single Episode Policy Transfer in Reinforcement Learning , 2019, ICLR.

[18]  Alan Fern,et al.  Learning Memory-Based Control for Human-Scale Bipedal Locomotion , 2020, Robotics: Science and Systems.

[19]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Shuuji Kajita,et al.  Towards an Optimal Falling Motion for a Humanoid Robot , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[21]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[22]  Sehoon Ha,et al.  Learning a unified control policy for safe falling , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[24]  Sehoon Ha,et al.  Multiple contact planning for minimizing damage of humanoid falls , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[25]  Kazuhito Yokoi,et al.  UKEMI: falling motion control to minimize damage to biped humanoid robot , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[27]  Yevgen Chebotar,et al.  Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[28]  Yiming Zhang,et al.  First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning , 2020, ArXiv.

[29]  Sehoon Ha,et al.  Learning Fast Adaptation With Meta Strategy Optimization , 2020, IEEE Robotics and Automation Letters.

[30]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[31]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[32]  Steffen Udluft,et al.  Safe exploration for reinforcement learning , 2008, ESANN.

[33]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[34]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[35]  Andreas Krause,et al.  Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[36]  Jie Tan,et al.  Learning Agile Robotic Locomotion Skills by Imitating Animals , 2020, RSS 2020.