Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking

A control system for bipedal walking in the sagittal plane was developed in simulation. The biped model was built based on anthropometric data for a 1.8 m tall male of average build. At the core of the controller is a deep deterministic policy gradient (DDPG) neural network that was trained in GAZEBO, a physics simulator, to predict the ideal foot placement to maintain stable walking despite external disturbances. The complexity of the DDPG network was decreased through carefully selected state variables and a distributed control system. Additional controllers for the hip joints during their stance phases and the ankle joint during toe-off phase help to stabilize the biped during walking. The simulated biped can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 kg·m/s impulse applied forward on the torso or a 40 kg·m/s impulse applied rearward. It also maintains stable walking with a 10 kg backpack or a 25 kg front pack. The controller was trained on a 1.8 m tall model, but also stabilizes models 1.4–2.3 m tall with no changes.

[1]  Christopher G. Atkeson,et al.  Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[2]  Roger D. Quinn,et al.  Deep Dynamic Programming: Optimal Control with Continuous Model Learning of a Nonlinear Muscle Actuated Arm , 2017, Living Machines.

[3]  Roger D. Quinn,et al.  A muscle-driven approach to restore stepping with an exoskeleton for individuals with paraplegia , 2017, Journal of NeuroEngineering and Rehabilitation.

[4]  Hartmut Geyer,et al.  Evaluation of a Neuromechanical Walking Control Model Using Disturbance Experiments , 2017, Front. Comput. Neurosci..

[5]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[6]  Scott Kuindersma,et al.  Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2015, Autonomous Robots.

[7]  Natàlia Hurtós,et al.  ROSPlan: Planning in the Robot Operating System , 2015, ICAPS.

[8]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  Katsu Yamane Systematic derivation of simplified dynamics for humanoid robots , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[13]  Mike Stilman,et al.  Whole-body trajectory optimization for humanoid falling , 2012, 2012 American Control Conference (ACC).

[14]  Masayuki Inaba,et al.  Online decision of foot placement using singular LQ preview regulation , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[15]  M. Goldfarb,et al.  Preliminary Evaluation of a Powered Lower Limb Orthosis to Aid Walking in Paraplegic Individuals , 2011, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[16]  Ren C. Luo,et al.  Towards active actuated natural walking humanoid robot legs , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[17]  Kate Smith-Miles,et al.  Exploratory Data Analysis , 2011, International Encyclopedia of Statistical Science.

[18]  Morgan Quigley,et al.  ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[19]  山田 祐,et al.  Open Dynamics Engine を用いたスノーボードロボットシミュレータの開発 , 2007 .

[20]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[21]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[22]  Jun-Ho Oh,et al.  Realization of dynamic walking for the humanoid robot platform KHR-1 , 2004, Adv. Robotics.

[23]  Kazuhito Yokoi,et al.  Experimental Study of Biped Locomotion of Humanoid Robot HRP-1S , 2002, ISER.

[24]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[25]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[26]  Jeffrey M. Hausdorff,et al.  Is walking a random walk? Evidence for long-range correlations in stride interval of human gait. , 1995, Journal of applied physiology.

[27]  L. C. Baird,et al.  Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[28]  C L Vaughan,et al.  A neural network representation of electromyography and joint dynamics in human gait. , 1993, Journal of biomechanics.

[29]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[30]  M A Townsend,et al.  Biped gait stabilization via foot placement. , 1985, Journal of biomechanics.

[31]  John T. McConville,et al.  INVESTIGATION OF INERTIAL PROPERTIES OF THE HUMAN BODY , 1975 .