Reinforcement Learning of Stable Trajectory for Quasi-Passive Dynamic Walking of an Unstable Biped Robot

Biped walking is one of the major research targets in recent humanoid robotics, and many researchers are now interested in Passive Dynamic Walking (PDW) [McGeer (1990)] rather than that by the conventional Zero Moment Point (ZMP) criterion [Vukobratovic (1972)]. The ZMP criterion is usually used for planning a desired trajectory to be tracked by a feedback controller, but the continuous control to maintain the trajectory consumes a large amount of energy [Collins, et al. (2005)]. On the other hand, PDW enables completely unactuated walking on a gentle downslope, but PDW is generally sensitive to the robot's initial posture, speed, and disturbances incurred when a foot touches the ground. To overcome this sensitivity problem, ``Quasi-PDW'' [Wisse & Frankenhuyzen (2003); Sugimoto & Osuka (2003); Takuma, et al. (2004)] methods, in which some actuators are activated supplementarily to handle disturbances, have been proposed. Because Quasi-PDW is a modification of the PDW, this control method consumes much less power than control methods based on the ZMP criterion. In the previous studies of Quasi-PDW, however, parameters of an actuator had to be tuned based on try-and-error by a designer or on

[1]  Martijn Wisse,et al.  Design and Construction of MIKE; a 2-D Autonomous Biped Based on Passive Dynamic Walking , 2006 .

[2]  V. Dietz,et al.  Locomotor activity in spinal man: significance of afferent input from joint and load receptors. , 2002, Brain : a journal of neurology.

[3]  Shin Ishii,et al.  Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Critic , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Tad McGeer,et al.  Passive Dynamic Walking , 1990, Int. J. Robotics Res..

[5]  M. Vukobratovic,et al.  On the stability of anthropomorphic systems , 1972 .

[6]  Shigenobu Kobayashi,et al.  A Policy Representation Using Weighted Multiple Normal Distribution , 2003 .

[7]  Shigenobu Kobayashi,et al.  Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .

[8]  Robert J. Wood,et al.  Towards a 3g crawling robot through the integration of microrobot technologies , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[9]  Andy Ruina,et al.  A Bipedal Walking Robot with Efficient and Human-Like Gait , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[10]  C. Miller,et al.  Determination of the step duration of gait initiation using a mechanical energy analysis. , 1996, Journal of biomechanics.

[11]  Koichi Osuka,et al.  Motion Generation and Control of Quasi Passsive Dynamic Walking Based on the Concept of Delayed Feedback Control , 2006 .

[12]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[13]  M. Asada,et al.  Design of self-contained biped walker with pneumatic actuators , 2004, SICE 2004 Annual Conference.

[14]  N. A. Bernshteĭn The co-ordination and regulation of movements , 1967 .

[15]  K. Newell,et al.  Dimensional change in motor learning. , 2001, Human movement science.

[16]  Russ Tedrake,et al.  Efficient Bipedal Robots Based on Passive-Dynamic Walkers , 2005, Science.

[17]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[18]  Shin Ishii,et al.  Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot , 2004, PPSN.

[19]  J V Basmajian The human bicycle: an ultimate biological convenience. , 1976, The Orthopedic clinics of North America.