Learning a Humanoid Kick with Controlled Distance

We investigate the learning of a flexible humanoid robot kick controller, i.e., the controller should be applicable for multiple contexts, such as different kick distances, initial robot position with respect to the ball or both. Current approaches typically tune or optimise the parameters of the biped kick controller for a single context, such as a kick with longest distance or a kick with a specific distance. Hence our research question is that, how can we obtain a flexible kick controller that controls the robot (near) optimally for a continuous range of kick distances? The goal is to find a parametric function that given a desired kick distance, outputs the (near) optimal controller parameters. We achieve the desired flexibility of the controller by applying a contextual policy search method. With such a contextual policy search algorithm, we can generalize the robot kick controller for different distances, where the desired distance is described by a real-valued vector. We will also show that the optimal parameters of the kick controller is a non-linear function of the desired distances and a linear function will fail to properly generalize the kick controller over desired kick distances.

[1]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[2]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[3]  Cord Niehaus,et al.  Gait Optimization on a Humanoid Robot using Particle Swarm Optimization , 2007 .

[4]  Olivier Sigaud,et al.  Path Integral Policy Improvement with Covariance Matrix Adaptation , 2012, ICML.

[5]  Tom Schaul,et al.  Efficient natural evolution strategies , 2009, GECCO.

[6]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[7]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[8]  Luís Paulo Reis,et al.  Contextual Policy Search for Generalizing a Parameterized Biped Walking Controller , 2015, 2015 IEEE International Conference on Autonomous Robot Systems and Competitions.

[9]  Luís Paulo Reis,et al.  Regularized covariance estimation for weighted maximum likelihood policy search methods , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[10]  Patrick MacAlpine,et al.  Keyframe Sampling, Optimization, and Behavior Integration: Towards Long-Distance Kicking in the RoboCup 3D Simulation League , 2014, RoboCup.

[11]  David J. Fleet,et al.  Optimizing walking controllers , 2009, ACM Trans. Graph..

[12]  Luís Paulo Reis,et al.  Development of an Omnidirectional Kick for a NAO Humanoid Robot , 2012, IBERAMIA.

[13]  Jan Peters,et al.  Hierarchical Relative Entropy Policy Search , 2014, AISTATS.

[14]  Jan Peters,et al.  Data-Efficient Generalization of Robot Skills with Contextual Policy Search , 2013, AAAI.