Efficient reinforcement learning for humanoid whole-body control

Whole-body control of humanoid robots permits the execution of multiple simultaneous tasks but combining tasks can often result in unexpected overall behaviors. These discrepancies arise from a variety of internal and external factors and modeling them explicitly would be impractical. Reinforcement learning can be used to eliminate the effects of the deleterious factors through trial and error but generally requires many trials to converge on a solution. In humanoid robotics such improvidence can be costly. In this paper we show how the efficiency of the learning can be improved through use of Bayesian optimization. This is accomplished by intelligently exploring a model of the latent cost function derived from the quality of the task executions. We demonstrate the efficacy of the technique through two different simulated scenarios where various factors impede the robot from accomplishing its objectives.

[1]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[2]  Prabhat,et al.  Scalable Bayesian Optimization Using Deep Neural Networks , 2015, ICML.

[3]  Olivier Sigaud,et al.  Policy Improvement Methods: Between Black-Box Optimization and Episodic Reinforcement Learning , 2012 .

[4]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[5]  D. Dennis,et al.  A statistical method for global optimization , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[7]  Oussama Khatib,et al.  Springer Handbook of Robotics , 2007, Springer Handbooks.

[8]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[9]  Nando de Freitas,et al.  Analysis of Particle Methods for Simultaneous Robot Localization and Mapping and a New Algorithm: Marginal-SLAM , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[10]  Darwin G. Caldwell,et al.  A task-parameterized probabilistic model with minimal intervention control , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[11]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[12]  Olivier Sigaud,et al.  Variance modulated task prioritization in Whole-Body Control , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[13]  Vincent Padois,et al.  Synthesis of complex humanoid whole-body behavior: A focus on sequencing and tasks transitions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[15]  Peter Englert,et al.  Combined Optimization and Reinforcement Learning for Manipulation Skills , 2016, Robotics: Science and Systems.

[16]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[17]  Andrew Howard,et al.  Design and use paradigms for Gazebo, an open-source multi-robot simulator , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[18]  Oussama Khatib,et al.  Whole-Body Dynamic Behavior and Control of Human-like Robots , 2004, Int. J. Humanoid Robotics.

[19]  François Keith,et al.  Optimization of tasks warping and scheduling for smooth sequencing of robotic actions , 2009, IROS.

[20]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[21]  Alexander Dietrich,et al.  An overview of null space projections for redundant, torque-controlled robots , 2015, Int. J. Robotics Res..

[22]  Olivier Sigaud,et al.  Multiple task optimization using dynamical movement primitives for whole-body reactive control , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[23]  Pierre-Brice Wieber,et al.  Hierarchical quadratic programming: Fast online humanoid-robot motion generation , 2014, Int. J. Robotics Res..

[24]  Jochen J. Steil,et al.  Task-level imitation learning using variance-based movement optimization , 2009, 2009 IEEE International Conference on Robotics and Automation.

[25]  F. Stulp,et al.  Policy Improvement : Between Black-Box Optimization and Episodic Reinforcement Learning , 2013 .