Hybrid control algorithm for humanoid robots walking based on episodic reinforcement learning

This paper presents a hybrid dynamic control approach to acquire biped walking of humanoid robots focussed on policy gradient episodic reinforcement learning with fuzzy evaluative feedback. The proposed structure of controller involves two feedback loops: conventional computed torque controller and episodic reinforcement learning controller. Reinforcement learning part includes fuzzy information about Zero-Moment Point errors. To demonstrate the effectiveness of our method, simulation tests using middle-size 36 DOFs humanoid robot MEXONE are performed.

[1]  Verena Heidrich-Meisner,et al.  Neuroevolution strategies for episodic reinforcement learning , 2009, J. Algorithms.

[2]  M Vukobratović,et al.  Contribution to the synthesis of biped gait. , 1969, IEEE transactions on bio-medical engineering.

[3]  Cong Dehong,et al.  Biped robot control strategy and open-closed-loop iterative learning control , 2007 .

[4]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[5]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[6]  Jun Morimoto,et al.  A framework for learning biped locomotion with dynamical movement primitives , 2004, 4th IEEE/RAS International Conference on Humanoid Robots, 2004..

[7]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[8]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[9]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[10]  Yutaka Nakamura,et al.  Reinforcement Learning of Stable Trajectory for Quasi-Passive Dynamic Walking of an Unstable Biped Robot , 2007 .

[11]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[12]  Jan Peters,et al.  Machine Learning for Robotics: Learning Methods for Robot Motor Skills , 2013 .

[13]  Miomir Vukobratovic,et al.  Hybrid Dynamic Control Algorithm for Humanoid Robots Based on Reinforcement Learning , 2008, J. Intell. Robotic Syst..

[14]  Shin Ishii,et al.  Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.

[15]  Changjiu Zhou,et al.  Reinforcement learning with fuzzy evaluative feedback for a biped robot , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[16]  Masa-aki Sato,et al.  Reinforcement Learning for Biped Robot , 2003 .