Bipedal Walking Robot using Deep Deterministic Policy Gradient

Machine learning algorithms have found several applications in the field of robotics and control systems. The control systems community has started to show interest towards several machine learning algorithms from the sub-domains such as supervised learning, imitation learning and reinforcement learning to achieve autonomous control and intelligent decision making. Amongst many complex control problems, stable bipedal walking has been the most challenging problem. In this paper, we present an architecture to design and simulate a planar bipedal walking robot(BWR) using a realistic robotics simulator, Gazebo. The robot demonstrates successful walking behaviour by learning through several of its trial and errors, without any prior knowledge of itself or the world dynamics. The autonomous walking of the BWR is achieved using reinforcement learning algorithm called Deep Deterministic Policy Gradient(DDPG). DDPG is one of the algorithms for learning controls in continuous action spaces. After training the model in simulation, it was observed that, with a proper shaped reward function, the robot achieved faster walking or even rendered a running gait with an average speed of 0.83 m/s. The gait pattern of the bipedal walker was compared with the actual human walking pattern. The results show that the bipedal walking pattern had similar characteristics to that of a human walking pattern. The video presenting our experiment is available at this https URL.

[1]  Qiang Huang,et al.  Humanoid walk control with feedforward dynamic pattern and feedback sensory reflection , 2001, Proceedings 2001 IEEE International Symposium on Computational Intelligence in Robotics and Automation (Cat. No.01EX515).

[2]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[3]  W. Art Chaovalitwongse,et al.  Machine Learning Algorithms in Bipedal Robot Control , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[4]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[5]  Javier Ruiz-del-Solar,et al.  Learning to fall: Designing low damage fall sequences for humanoid soccer robots , 2009, Robotics Auton. Syst..

[6]  Jun Morimoto,et al.  A simple reinforcement learning algorithm for biped walking , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[7]  Brian Scassellati,et al.  Humanoid Robots: A New Kind of Tool , 2000, IEEE Intell. Syst..

[8]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[9]  Nicolas Van der Noot,et al.  Zero-Moment Point on a bipedal robot under bio-inspired walking control , 2014, MELECON 2014 - 2014 17th IEEE Mediterranean Electrotechnical Conference.

[10]  Marc H. Raibert,et al.  Legged Robots That Balance , 1986, IEEE Expert.

[11]  D. J. Bruemmer,et al.  Expanding frontiers of humanoid robotics [Guest Editor's Introduction] , 2000 .

[12]  Andy Ruina,et al.  A Bipedal Walking Robot with Efficient and Human-Like Gait , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[13]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[14]  Kazuhito Yokoi,et al.  The first human-size humanoid that can fall over safely and stand-up again , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[15]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[16]  Chuanyu Yang,et al.  Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge , 2017, 2018 15th International Conference on Control, Automation, Robotics and Vision (ICARCV).

[17]  Houman Dallali,et al.  Modelling and dynamic stabilisation of a compliant humanoid robot, CoMan , 2012 .

[18]  K. Tanie Humanoid robot and its application possibility , 2003, IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, 2003. Proceedings. 2003.

[19]  Takayuki Kanda,et al.  Interacting with a human or a humanoid robot? , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Ryo Kurazume,et al.  Straight legged walking of a biped robot , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[21]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[22]  K. Tanie Humanoid robot and its application possibility , 2003, Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, MFI2003..

[23]  David J. Bruemmer,et al.  Expanding Frontiers of Humanoid Robotics , 2000 .

[24]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[25]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.