Evolutionary optimization for parameterized whole-body dynamic motor skills

Learning a parameterized skill is essential for autonomous robots operating in an unpredictable environment. Previous techniques learned a policy for each example task individually and constructed a regression model to map between task and policy parameter spaces. However, these techniques have less success when applied to whole-body dynamic skills, such as jumping or walking, which involve the challenges of handling discrete contacts and balancing an under-actuated system under gravity. This paper introduces an evolutionary optimization algorithm for learning parameterized skills to achieve whole-body dynamic tasks. Our algorithm simultaneously learns policies for a range of tasks instead of learning each policy individually. The problem can be formulated as a nonconvex optimization whose solution is a closed segment of curve instead of a point in the policy parameter space. We develop a new optimization algorithm which maintains a parameterized probability distribution for the entire range of tasks and iteratively updates the distribution using selected elite samples. Our algorithm is able to better exploit each sample, greatly reducing the number of samples required to optimize a parameterized skill for all the tasks in the range of interest.

[1]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[2]  Peter Stone,et al.  Learning Powerful Kicks on the Aibo ERS-7: The Quest for a Striker , 2010, RoboCup.

[3]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[4]  Jan Peters,et al.  Information-Theoretic Motor Skill Learning , 2013, AAAI 2013.

[5]  Ponnuthurai Nagaratnam Suganthan,et al.  Problem Definitions and Evaluation Criteria for CEC 2015 Special Session on Bound Constrained Single-Objective Computationally Expensive Numerical Optimization , 2015 .

[6]  Christian Igel,et al.  A computational efficient covariance matrix update and a (1+1)-CMA for evolution strategies , 2006, GECCO.

[7]  Bruno Castro da Silva,et al.  Learning Parameterized Skills , 2012, ICML.

[8]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[9]  Sehoon Ha,et al.  Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques , 2014, ACM Trans. Graph..

[10]  Bruno Castro da Silva,et al.  Active Learning of Parameterized Skills , 2014, ICML.

[11]  Jun Morimoto,et al.  Learning parametric dynamic movement primitives from multiple demonstrations , 2011, Neural Networks.

[12]  Steven Dubowsky,et al.  A coordinated Jacobian transpose control for mobile multi-limbed robotic systems , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[13]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[14]  Karen Liu Dynamic Animation and Robotics Toolkit , 2014 .

[15]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[16]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[17]  Olivier Sigaud,et al.  Learning compact parameterized skills with a single regression , 2013, 2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids).

[18]  Jehee Lee,et al.  Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[19]  Jun Morimoto,et al.  Improving humanoid locomotive performance with learnt approximated dynamics via Gaussian processes for regression , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  C. Karen Liu,et al.  Articulated swimming creatures , 2011, ACM Trans. Graph..

[21]  Jan Peters,et al.  Learning table tennis with a Mixture of Motor Primitives , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[22]  Jun Morimoto,et al.  On-line motion synthesis and adaptation using a trajectory database , 2012, Robotics Auton. Syst..

[23]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.

[24]  Bruno Castro da Silva,et al.  Learning parameterized motor skills on a humanoid robot , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..