Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization

We present a learning-based approach for minimizing the electric energy consumption during walking of a passively-compliant bipedal robot. The energy consumption is reduced by learning a varying-height center-of-mass trajectory which uses efficiently the robot's passive compliance. To do this, we propose a reinforcement learning method which evolves the policy parameterization dynamically during the learning process and thus manages to find better policies faster than by using fixed parameterization. The method is first tested on a function approximation task, and then applied to the humanoid robot COMAN where it achieves significant energy reduction.

[1]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[2]  Masayuki Inaba,et al.  A Fast Dynamically Equilibrated Walking Trajectory Generation Method of Humanoid Robot , 2002, Auton. Robots.

[3]  Y. Wada,et al.  A reinforcement learning scheme for acquisition of via-point representation of human motion , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[4]  Andrew W. Moore,et al.  The Parti-game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-spaces , 1993, Machine Learning.

[5]  Jun Morimoto,et al.  Reinforcement learning with via-point representation , 2004, Neural Networks.

[6]  C. T. Farley,et al.  Minimizing center of mass vertical movement increases metabolic cost in walking. , 2005, Journal of applied physiology.

[7]  P. Komi,et al.  Muscle-tendon interaction and elastic energy usage in human walking. , 2005, Journal of applied physiology.

[8]  Michael T. Rosenstein,et al.  Learning at the level of synergies for a robot weightlifter , 2006, Robotics Auton. Syst..

[9]  Jun Morimoto,et al.  Learning Biped Locomotion , 2007, IEEE Robotics & Automation Magazine.

[10]  KasabovNikola,et al.  2008 Special issue , 2008 .

[11]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[12]  Sadao Kawamura,et al.  Generation of energy saving motion for biped walking robot through resonance-based control method , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Atsuo Kawamura,et al.  A unified control frame for stable bipedal walking , 2009, 2009 35th Annual Conference of IEEE Industrial Electronics.

[14]  Andrey Bernstein,et al.  Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains , 2010, Machine Learning.

[15]  Atsuo Kawamura,et al.  Energy and torque efficient ZMP-based bipedal walking with varying center of mass height , 2010, 2010 11th IEEE International Workshop on Advanced Motion Control (AMC).

[16]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[18]  Giorgio Metta,et al.  Learning the skill of archery by a humanoid robot iCub , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[19]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[20]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[21]  Barkan Ugurlu,et al.  Compliant joint modification and real-time dynamic walking implementation on bipedal robot cCub , 2011, 2011 IEEE International Conference on Mechatronics.

[22]  Darwin G. Caldwell,et al.  Upper-body kinesthetic teaching of a free-standing humanoid robot , 2011, 2011 IEEE International Conference on Robotics and Automation.