Reinforcement Learning with Randomized Physical Parameters for Fault-Tolerant Robots

In reinforcement learning, for cost and safety reasons, the policy is usually learned in simulation environments, after which it is applied to the real world. However, the learned policy cannot often adapt because real world disturbances and robot failures lead to gaps between the two environments. To narrow such gaps, policies that can adapt to various scenarios are needed. In this study, we propose a reinforcement learning method for acquiring a robust policy against robot failures. In the proposed method, failure is represented by adjusting the physical parameters of the robot. Reinforcement learning under various faults takes place by randomizing the physical parameters during learning. In the experiments, we demonstrate that the robot that learned using the proposed method has higher average rewards than a conventional robot for quadruped walking tasks in a simulation environment with/without robot failures.

[1]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[2]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[3]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[4]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[7]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Jakub W. Pachocki,et al.  Learning dexterous in-hand manipulation , 2018, Int. J. Robotics Res..

[9]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.