Learning Whole-Body Motor Skills for Humanoids

This paper presents a hierarchical framework for Deep Reinforcement Learning that acquires motor skills for a variety of push recovery and balancing behaviors, i.e., ankle, hip, foot tilting, and stepping strategies. The policy is trained in a physics simulator with realistic setting of robot model and low-level impedance control that are easy to transfer the learned skills to real robots. The advantage over traditional methods is the integration of high-level planner and feedback control all in one single coherent policy network, which is generic for learning versatile balancing and recovery motions against unknown perturbations at arbitrary locations (e.g., legs, torso). Furthermore, the proposed framework allows the policy to be learned quickly by many state-of-the-art learning algorithms. By comparing our learned results to studies of preprogrammed, special-purpose controllers in the literature, self-learned skills are comparable in terms of disturbance rejection but with additional advantages of producing a wide range of adaptive, versatile and robust behaviors.

[1]  Taku Komura,et al.  Simulating pathological gait using the enhanced linear inverted pendulum model , 2005, IEEE Transactions on Biomedical Engineering.

[2]  Pierre-Brice Wieber,et al.  Online adaptation of reference trajectories for the control of walking systems , 2006, Robotics Auton. Syst..

[3]  Sergey V. Drakunov,et al.  Capture Point: A Step toward Humanoid Push Recovery , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[4]  Pierre-Brice Wieber,et al.  Trajectory Free Linear Model Predictive Control for Stable Walking in the Presence of Strong Perturbations , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[5]  Benjamin J. Stephens,et al.  Humanoid push recovery , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[6]  Christopher G. Atkeson,et al.  Push Recovery by stepping for humanoid robots with force controlled joints , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[7]  Masayuki Inaba,et al.  Online decision of foot placement using singular LQ preview regulation , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[8]  Darwin G. Caldwell,et al.  On Global Optimization of Walking Gaits for the Compliant Humanoid Robot, COMAN Using Reinforcement Learning , 2012 .

[9]  Nikolaos G. Tsagarakis,et al.  Stabilizing humanoids on slopes using terrain inclination estimation , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[11]  Aaron D. Ames,et al.  Valkyrie: NASA's First Bipedal Humanoid Robot , 2015, J. Field Robotics.

[12]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[13]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[14]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[15]  Twan Koolen,et al.  Balance control using center of mass height variation: Limitations imposed by unilateral contact , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[16]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[17]  Yiming Yang,et al.  Robust foot placement control for dynamic walking using online parameter estimation , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[18]  Rong Xiong,et al.  Humanoid Balancing Behavior Featured by Underactuated Foot Motion , 2017, IEEE Transactions on Robotics.

[19]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[20]  Antonio Bicchi,et al.  Approximate hybrid model predictive control for multi-contact push recovery in complex environments , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[21]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[22]  Weiqiao Han,et al.  Feedback design for multi-contact push recovery via LMI approximation of the Piecewise-Affine Quadratic Regulator , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[23]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[24]  Yuval Tassa,et al.  Learning human behaviors from motion capture by adversarial imitation , 2017, ArXiv.

[25]  Taku Komura,et al.  Emergence of human-comparable balancing behaviours by deep reinforcement learning , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[26]  Andrew G. Lonsberry,et al.  Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking , 2018, Living Machines.

[27]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[28]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[29]  Zhibin Li,et al.  An Improved Formulation for Model Predictive Control of Legged Robots for Gait Planning and Feedback Control , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Wenbin Hu,et al.  Comparison Study of Nonlinear Optimization of Step Durations and Foot Placement for Dynamic Walking , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).