Emergence of human-comparable balancing behaviours by deep reinforcement learning

This paper presents a hierarchical framework based on deep reinforcement learning that naturally acquires control policies that are capable of performing balancing behaviours such as ankle push-offs for humanoid robots, without explicit human design of controllers. Only the reward for training the neural network is specifically formulated based on the physical principles and quantities, and hence explainable. The successful emergence of human-comparable behaviours through the deep reinforcement learning demonstrates the feasibility of using an AI-based approach for humanoid motion control in a unified framework. Moreover, the balance strategies learned by reinforcement learning provides a larger range of disturbance rejection than that of the zero moment point based methods, suggesting a research direction of using learning-based controls to explore the optimal performance.

[1]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[2]  Sergey Levine,et al.  Continuous Deep Q-Learning with Model-based Acceleration , 2016, ICML.

[3]  Christopher G. Atkeson,et al.  Dynamic Balance Force Control for compliant humanoid robots , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[5]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[6]  Nikolaos G. Tsagarakis,et al.  Stabilization for the compliant humanoid robot COMAN exploiting intrinsic and controlled compliance , 2012, 2012 IEEE International Conference on Robotics and Automation.

[7]  Nikolaos G. Tsagarakis,et al.  Compliance control for stabilizing the humanoid on the changing slope based on terrain inclination estimation , 2016, Auton. Robots.

[8]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[9]  Sergey V. Drakunov,et al.  Capture Point: A Step toward Humanoid Push Recovery , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[10]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[11]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[12]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[13]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[14]  Rieko Osu,et al.  Integration of multi-level postural balancing on humanoid robots , 2009, 2009 IEEE International Conference on Robotics and Automation.

[15]  S. Collins,et al.  The advantages of a rolling foot in human walking , 2006, Journal of Experimental Biology.

[16]  Yuval Tassa,et al.  Learning and Transfer of Modulated Locomotor Controllers , 2016, ArXiv.

[17]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[18]  Shane Legg,et al.  Deep Reinforcement Learning from Human Preferences , 2017, NIPS.

[19]  Rong Xiong,et al.  Humanoid Balancing Behavior Featured by Underactuated Foot Motion , 2017, IEEE Transactions on Robotics.

[20]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[21]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[22]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[23]  Nikolaos G. Tsagarakis,et al.  Active control of under-actuated foot tilting for humanoid push recovery , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[25]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[26]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[27]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[28]  Nikolaos G. Tsagarakis,et al.  Fast bipedal walk using large strides by modulating hip posture and toe-heel motion , 2010, 2010 IEEE International Conference on Robotics and Biomimetics.