论文信息 - Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking

Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking

A control system for bipedal walking in the sagittal plane was developed in simulation. The biped model was built based on anthropometric data for a 1.8 m tall male of average build. At the core of the controller is a deep deterministic policy gradient (DDPG) neural network that was trained in GAZEBO, a physics simulator, to predict the ideal foot placement to maintain stable walking despite external disturbances. The complexity of the DDPG network was decreased through carefully selected state variables and a distributed control system. Additional controllers for the hip joints during their stance phases and the ankle joint during toe-off phase help to stabilize the biped during walking. The simulated biped can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 kg·m/s impulse applied forward on the torso or a 40 kg·m/s impulse applied rearward. It also maintains stable walking with a 10 kg backpack or a 25 kg front pack. The controller was trained on a 1.8 m tall model, but also stabilizes models 1.4–2.3 m tall with no changes.

Andrew G. Lonsberry | R. Quinn | M. Audu | M. Nandor | Chujun Liu

[1] Christopher G. Atkeson,et al. Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[2] Roger D. Quinn,et al. Deep Dynamic Programming: Optimal Control with Continuous Model Learning of a Nonlinear Muscle Actuated Arm , 2017, Living Machines.

[3] Roger D. Quinn,et al. A muscle-driven approach to restore stepping with an exoskeleton for individuals with paraplegia , 2017, Journal of NeuroEngineering and Rehabilitation.

[4] Hartmut Geyer,et al. Evaluation of a Neuromechanical Walking Control Model Using Disturbance Experiments , 2017, Front. Comput. Neurosci..

[5] Glen Berseth,et al. Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[6] Scott Kuindersma,et al. Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2015, Autonomous Robots.

[7] Natàlia Hurtós,et al. ROSPlan: Planning in the Robot Operating System , 2015, ICAPS.

[8] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[9] Guy Lever,et al. Deterministic Policy Gradient Algorithms , 2014, ICML.

[10] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11] Alex Graves,et al. Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12] Katsu Yamane. Systematic derivation of simplified dynamics for humanoid robots , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[13] Mike Stilman,et al. Whole-body trajectory optimization for humanoid falling , 2012, 2012 American Control Conference (ACC).

[14] Masayuki Inaba,et al. Online decision of foot placement using singular LQ preview regulation , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[15] M. Goldfarb,et al. Preliminary Evaluation of a Powered Lower Limb Orthosis to Aid Walking in Paraplegic Individuals , 2011, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[16] Ren C. Luo,et al. Towards active actuated natural walking humanoid robot legs , 2011, 2011 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM).

[17] Kate Smith-Miles,et al. Exploratory Data Analysis , 2011, International Encyclopedia of Statistical Science.

[18] Morgan Quigley,et al. ROS: an open-source Robot Operating System , 2009, ICRA 2009.

[19] 山田祐,et al. Open Dynamics Engine を用いたスノーボードロボットシミュレータの開発 , 2007 .

[20] Andrew Howard,et al. Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[21] Miomir Vukobratovic,et al. Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[22] Jun-Ho Oh,et al. Realization of dynamic walking for the humanoid robot platform KHR-1 , 2004, Adv. Robotics.

[23] Kazuhito Yokoi,et al. Experimental Study of Biped Locomotion of Humanoid Robot HRP-1S , 2002, ISER.

[24] Jun Morimoto,et al. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[25] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[26] Jeffrey M. Hausdorff,et al. Is walking a random walk? Evidence for long-range correlations in stride interval of human gait. , 1995, Journal of applied physiology.

[27] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[28] C L Vaughan,et al. A neural network representation of electromyography and joint dynamics in human gait. , 1993, Journal of biomechanics.

[29] George Cybenko,et al. Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[30] M A Townsend,et al. Biped gait stabilization via foot placement. , 1985, Journal of biomechanics.

[31] John T. McConville,et al. INVESTIGATION OF INERTIAL PROPERTIES OF THE HUMAN BODY , 1975 .