Optimizing Task Feasibility using Model-Free Policy Search and Model-Based Whole-Body Control

—Producing feasible motions for highly redundant robots, such as humanoids, is a complicated and high-dimensional problem. Model-based whole-body control of such robots, can generate complex dynamic behaviors through the simultaneous execution of multiple tasks. Unfortunately, tasks are generally planned without close consideration for the underlying controller being used, or the other tasks being executed, and are often infeasible when executed on the robot. Consequently, there is no guarantee that the motion will be accomplished. In this work, we develop an optimization loop which automatically improves task feasibility using model-free policy search in conjunction with model-based whole-body control. This combination allows problems to be solved, which would be otherwise intractable using simply one or the other. Through experiments on both the simulated and real iCub humanoid robot, we show that by optimizing task feasibility, initially infeasible complex dynamic motions can be realized — specifically, a sit-to-stand transition. These experiments can be viewed in the accompanying video.

[1]  D. Dennis,et al.  A statistical method for global optimization , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[2]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[3]  Giuseppe Oriolo,et al.  Learning soft task priorities for control of redundant robots , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Vincent Padois,et al.  Synthesis of complex humanoid whole-body behavior: A focus on sequencing and tasks transitions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Abderrahmane Kheddar,et al.  Using a multi-objective controller to synthesize simulated humanoid robot motion with changing contact configurations , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Pierre-Brice Wieber,et al.  Hierarchical quadratic programming: Fast online humanoid-robot motion generation , 2014, Int. J. Robotics Res..

[7]  Abderrahmane Kheddar,et al.  Humanoid Robot Locomotion and Manipulation Step Planning , 2012, Adv. Robotics.

[8]  Abderrahmane Kheddar,et al.  On Weight-Prioritized Multitask Control of Humanoid Robots , 2018, IEEE Transactions on Automatic Control.

[9]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[10]  Peter Englert,et al.  Combined Optimization and Reinforcement Learning for Manipulation Skills , 2016, Robotics: Science and Systems.

[11]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[12]  Adrien Escande,et al.  Geometric and Numerical Aspects of Redundancy , 2017 .

[13]  Olivier Stasse,et al.  Whole-body model-predictive control applied to the HRP-2 humanoid , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Quang-Cuong Pham,et al.  A General, Fast, and Robust Implementation of the Time-Optimal Path Parameterization Algorithm , 2013, IEEE Transactions on Robotics.

[15]  Christopher G. Atkeson,et al.  Sample efficient optimization for learning controllers for bipedal locomotion , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[16]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[17]  François Keith,et al.  Dynamic Whole-Body Motion Generation Under Rigid Contacts and Other Unilateral Constraints , 2013, IEEE Transactions on Robotics.

[18]  Daniele Pucci,et al.  Highly dynamic balancing via force control , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[19]  Alexander Dietrich,et al.  An overview of null space projections for redundant, torque-controlled robots , 2015, Int. J. Robotics Res..

[20]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[21]  Xu Ye,et al.  Advances in estimation of distribution algorithms , 2012 .

[22]  Mike Stilman,et al.  Time-Optimal Trajectory Generation for Path Following with Bounded Acceleration and Velocity , 2012, Robotics: Science and Systems.

[23]  Olivier Sigaud,et al.  Variance modulated task prioritization in Whole-Body Control , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[25]  Olivier Sigaud,et al.  Efficient reinforcement learning for humanoid whole-body control , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[26]  Oussama Khatib,et al.  Whole-Body Dynamic Behavior and Control of Human-like Robots , 2004, Int. J. Humanoid Robotics.

[27]  Jan Peters,et al.  Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[28]  Vincent Padois,et al.  Emergence of humanoid walking behaviors from mixed-integer model predictive control , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[29]  Gabriele Nava,et al.  Stability analysis and design of momentum-based controllers for humanoid robots , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).