Learning ankle-tilt and foot-placement control for flat-footed bipedal balancing and walking

We learn a controller for a flat-footed bipedal robot to optimally respond to both (1) external disturbances caused by, for example, stepping on objects or being pushed, and (2) rapid acceleration, such as reversal of demanded walk direction. The reinforcement learning method employed learns an optimal policy by actuating the ankle joints to assert pressure at different points along the support foot, and to determine the next swing foot placement. The controller is learnt in simulation using an inverted pendulum model and the control policy transferred and tested on two small physical humanoid robots.

[1]  Jun Morimoto,et al.  A simple reinforcement learning algorithm for biped walking , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[2]  Chee-Meng Chew,et al.  Dynamic bipedal walking assisted by learning , 2002, Robotica.

[3]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[4]  Kazuhito Yokoi,et al.  Biped walking stabilization based on linear inverted pendulum tracking , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Jun Morimoto,et al.  Poincaré-Map-Based Reinforcement Learning For Biped Walking , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[6]  Robert Babuska,et al.  Reinforcement Learning Control for Biped Robot Walking on Uneven Surfaces , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[7]  A Closed-loop 3 D-LIPM Gait for the RoboCup Standard Platform League Humanoid , 2010 .

[8]  Ashwin Ram,et al.  Experiments with Reinforcement Learning in Problems with Continuous State and Action Spaces , 1997, Adapt. Behav..

[9]  Kazuhito Yokoi,et al.  Biped walking pattern generation by using preview control of zero-moment point , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[10]  Shin Ishii,et al.  Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.

[11]  E. Westervelt,et al.  Feedback Control of Dynamic Bipedal Robot Locomotion , 2007 .

[12]  Balaraman Ravindran,et al.  SMDP Homomorphisms: An Algebraic Approach to Abstraction in Semi-Markov Decision Processes , 2003, IJCAI.

[13]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[14]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).