Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking

A control system for simulated two-dimensional bipedal walking was developed. The biped model was built based on anthropometric data. At the core of the control is a Deep Deterministic Policy Gradients (DDPG) neural network that is trained in GAZEBO, a physics simulator, to predict the ideal foot location to maintain stable walking under external impulse load. Additional controllers for hip joint movement during stance phase, and ankle joint torque during toe-off, help to stabilize the robot during walking. The simulated robot can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 N-s impulse applied at the torso. This work implement DDPG algorithm to solve biped walking control problem. The complexity of DDPG network is decreased through carefully selected state variables and distributed control system.

[1]  Roger D. Quinn,et al.  Deep Dynamic Programming: Optimal Control with Continuous Model Learning of a Nonlinear Muscle Actuated Arm , 2017, Living Machines.

[2]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[3]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[4]  Katsu Yamane Systematic derivation of simplified dynamics for humanoid robots , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[7]  Jeffrey M. Hausdorff,et al.  Is walking a random walk? Evidence for long-range correlations in stride interval of human gait. , 1995, Journal of applied physiology.

[8]  Kazuhito Yokoi,et al.  Experimental Study of Biped Locomotion of Humanoid Robot HRP-1S , 2002, ISER.

[9]  C L Vaughan,et al.  A neural network representation of electromyography and joint dynamics in human gait. , 1993, Journal of biomechanics.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[12]  山田 祐,et al.  Open Dynamics Engine を用いたスノーボードロボットシミュレータの開発 , 2007 .

[13]  M A Townsend,et al.  Biped gait stabilization via foot placement. , 1985, Journal of biomechanics.

[14]  Roger D. Quinn,et al.  A muscle-driven approach to restore stepping with an exoskeleton for individuals with paraplegia , 2017, Journal of NeuroEngineering and Rehabilitation.

[15]  John T. McConville,et al.  INVESTIGATION OF INERTIAL PROPERTIES OF THE HUMAN BODY , 1975 .

[16]  M. Goldfarb,et al.  Preliminary Evaluation of a Powered Lower Limb Orthosis to Aid Walking in Paraplegic Individuals , 2011, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[17]  Natàlia Hurtós,et al.  ROSPlan: Planning in the Robot Operating System , 2015, ICAPS.

[18]  Ren C. Luo,et al.  Towards active actuated natural walking humanoid robot legs , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[19]  Jun-Ho Oh,et al.  Realization of dynamic walking for the humanoid robot platform KHR-1 , 2004, Adv. Robotics.

[20]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[21]  Hartmut Geyer,et al.  Evaluation of a Neuromechanical Walking Control Model Using Disturbance Experiments , 2017, Front. Comput. Neurosci..

[22]  Christopher G. Atkeson,et al.  Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[23]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[24]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[25]  Eduardo F. Morales,et al.  An Introduction to Reinforcement Learning , 2011 .

[26]  Scott Kuindersma,et al.  Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2015, Autonomous Robots.

[27]  Kate Smith-Miles Exploratory data analysis , 2011 .

[28]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[29]  Masayuki Inaba,et al.  Online decision of foot placement using singular LQ preview regulation , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[30]  Mike Stilman,et al.  Whole-body trajectory optimization for humanoid falling , 2012, 2012 American Control Conference (ACC).

[31]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.