Adapting Biped Locomotion to Sloped Environments

In this work, reinforcement learning techniques are implemented and compared to address biped locomotion optimization. Central Pattern Generators (CPGs) and Dynamic Movement Primitives (DMPs) were combined to easily produce complex trajectories for the joints of a simulated DARwIn-OP humanoid robot. Two reinforcement learning algorithms, Policy Learning by Weighting Exploration with the Returns (PoWER) and Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) were implemented in the simulated DARwIn-OP to seek optimal DMP parameters that maximize frontal velocity when facing different situations which demand adaptation from the controller in order to successfully walk in different types of slopes. Additionally, elitism was introduced in PI2-CMA in order to improve the performance of the algorithm. Results show that these approaches enabled easy adaptation of DARwIn-OP to new situations. The results are very promising and demonstrate flexibility at generating or adapting new trajectories for locomotion.

[1]  Jun Nakanishi,et al.  Dynamical Movement Primitives: Learning Attractor Models for Motor Behaviors , 2013, Neural Computation.

[2]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[3]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[4]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[5]  Stefan Schaal,et al.  From dynamic movement primitives to associative skill memories , 2013, Robotics Auton. Syst..

[6]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[7]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[8]  Jun Morimoto,et al.  Learning Biped Locomotion , 2007, IEEE Robotics & Automation Magazine.

[9]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[10]  Darwin G. Caldwell,et al.  Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning , 2013, Robotics Auton. Syst..

[11]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[12]  Jean-Baptiste Mouret,et al.  Optimization of humanoid walking controller: Crossing the reality gap , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[13]  Miomir Vukobratovic,et al.  Zero-Moment Point - Thirty Five Years of its Life , 2004, Int. J. Humanoid Robotics.

[14]  Jun Morimoto,et al.  Learning Sensory Feedback to CPG with Policy Gradient for Biped Locomotion , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.