Poincaré-Map-Based Reinforcement Learning For Biped Walking

We propose a model-based reinforcement learning algorithm for biped walking in which the robot learns to appropriately modulate an observed walking pattern. Viapoints are detected from the observed walking trajectories using the minimum jerk criterion. The learning algorithm modulates the via-points as control actions to improve walking trajectories. This decision is based on a learned model of the Poincaré map of the periodic walking pattern. The model maps from a state in the single support phase and the control actions to a state in the next single support phase. We applied this approach to both a simulated robot model and an actual biped robot. We show that successful walking policies are acquired.

[1]  M. Ciletti,et al.  The computation and theory of optimal control , 1972 .

[2]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[3]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[4]  Marc H. Raibert,et al.  Legged Robots That Balance , 1986, IEEE Expert.

[5]  Atsuo Takanishi,et al.  Development of a biped walking robot compensating for three-axis moment by trunk motion , 1993, Proceedings of 1993 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS '93).

[6]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[7]  Christopher G. Atkeson,et al.  Constructive Incremental Learning from Only Local Information , 1998, Neural Computation.

[8]  T. Takenaka,et al.  The development of Honda humanoid robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[9]  Doina Precup,et al.  Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[10]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[11]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[12]  Chee-Meng Chew,et al.  Dynamic bipedal walking assisted by learning , 2002, Robotica.

[13]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[14]  Jun Morimoto,et al.  Robust low-torque biped walking using differential dynamic programming with a minimax criterion , 2002 .

[15]  Masa-aki Sato,et al.  Reinforcement Learning for Biped Robot , 2003 .

[16]  Shinya Aoi,et al.  Locomotion control of a biped locomotion robot using nonlinear oscillators , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[17]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[18]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[19]  Jun Morimoto,et al.  A simple reinforcement learning algorithm for biped walking , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[20]  Mitsuo Kawato,et al.  A theory for cursive handwriting based on the minimization principle , 1995, Biological Cybernetics.

[21]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.