Learning a unified control policy for safe falling

Being able to fall safely is a necessary motor skill for humanoids performing highly dynamic tasks, such as running and jumping. We propose a new method to learn a policy that minimizes the maximal impulse during the fall. The optimization solves for both a discrete contact planning problem and a continuous optimal control problem. Once trained, the policy can compute the optimal next contacting body part (e.g. left foot, right foot, or hands), contact location and timing, and the required joint actuation. We represent the policy as a mixture of actor-critic neural network, which consists of n control policies and the corresponding value functions. Each pair of actor-critic is associated with one of the n possible contacting body parts. During execution, the policy corresponding to the highest value function will be executed while the associated body part will be the next contact with the ground. With this mixture of actor-critic architecture, the discrete contact sequence planning is solved through the selection of the best critics while the continuous control problem is solved by the optimization of actors. We show that our policy can achieve comparable, sometimes even higher, rewards than a recursive search of the action space using dynamic programming, while enjoying 50 to 400 times of speed gain during online execution.

[1]  Sehoon Ha,et al.  Multiple contact planning for minimizing damage of humanoid falls , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Umashankar Nagarajan,et al.  Direction-changing fall control of humanoid robots: theory and experiments , 2014, Auton. Robots.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Shuuji Kajita,et al.  Towards an Optimal Falling Motion for a Humanoid Robot , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[7]  Ambarish Goswami,et al.  Tripod fall: Concept and experiments of a novel approach to humanoid robot fall damage reduction , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[9]  Shuuji Kajita,et al.  An optimal planning of falling motions of a humanoid robot , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Mike Stilman,et al.  Whole-body trajectory optimization for humanoid falling , 2012, 2012 American Control Conference (ACC).

[11]  Robin Deits,et al.  Footstep planning on uneven terrain with mixed-integer convex optimization , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[12]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[13]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[14]  Javier Ruiz-del-Solar,et al.  Fall detection and management in biped humanoid robots , 2010, 2010 IEEE International Conference on Robotics and Automation.

[15]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[16]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[17]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[18]  Kazuhito Yokoi,et al.  UKEMI: falling motion control to minimize damage to biped humanoid robot , 2002, IEEE/RSJ International Conference on Intelligent Robots and Systems.

[19]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[20]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[21]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.