Task Feasibility Maximization Using Model-Free Policy Search and Model-Based Whole-Body Control

Producing feasible motions for highly redundant robots, such as humanoids, is a complicated and high-dimensional problem. Model-based whole-body control of such robots can generate complex dynamic behaviors through the simultaneous execution of multiple tasks. Unfortunately, tasks are generally planned without close consideration for the underlying controller being used, or the other tasks being executed, and are often infeasible when executed on the robot. Consequently, there is no guarantee that the motion will be accomplished. In this work, we develop a proof-of-concept optimization loop which automatically improves task feasibility using model-free policy search in conjunction with model-based whole-body control. This combination allows problems to be solved, which would be otherwise intractable using simply one or the other. Through experiments on both the simulated and real iCub humanoid robot, we show that by optimizing task feasibility, initially infeasible complex dynamic motions can be realized—specifically, a sit-to-stand transition. These experiments can be viewed in the accompanying Video S1.

[1]  Peter Englert,et al.  Combined Optimization and Reinforcement Learning for Manipulation Skills , 2016, Robotics: Science and Systems.

[2]  Abderrahmane Kheddar,et al.  Humanoid Robot Locomotion and Manipulation Step Planning , 2012, Adv. Robotics.

[3]  Abderrahmane Kheddar,et al.  Using a multi-objective controller to synthesize simulated humanoid robot motion with changing contact configurations , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[5]  Christopher G. Atkeson,et al.  Sample efficient optimization for learning controllers for bipedal locomotion , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[6]  Giuseppe Oriolo,et al.  Learning soft task priorities for control of redundant robots , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[7]  N. Roy,et al.  Time-Optimal Trajectory Generation for Path Following with Bounded Acceleration and Velocity , 2013 .

[8]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[9]  Olivier Sigaud,et al.  Efficient reinforcement learning for humanoid whole-body control , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[10]  Stefan Schaal,et al.  Learning variable impedance control , 2011, Int. J. Robotics Res..

[11]  Olivier Stasse,et al.  Using a Memory of Motion to Efficiently Warm-Start a Nonlinear Predictive Controller , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Quang-Cuong Pham,et al.  A General, Fast, and Robust Implementation of the Time-Optimal Path Parameterization Algorithm , 2013, IEEE Transactions on Robotics.

[13]  Abderrahmane Kheddar,et al.  On Weight-Prioritized Multitask Control of Humanoid Robots , 2018, IEEE Transactions on Automatic Control.

[14]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[15]  Oussama Khatib,et al.  Whole-Body Dynamic Behavior and Control of Human-like Robots , 2004, Int. J. Humanoid Robotics.

[16]  Jan Peters,et al.  Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[17]  Vincent Padois,et al.  Emergence of humanoid walking behaviors from mixed-integer model predictive control , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[18]  Adrien Escande,et al.  Geometric and Numerical Aspects of Redundancy , 2017 .

[19]  Xu Ye,et al.  Advances in estimation of distribution algorithms , 2012 .

[20]  Vincent Padois,et al.  Optimization-Based Control Approaches to Humanoid Balancing , 2018, Humanoid Robotics: A Reference.

[21]  Olivier Stasse,et al.  Whole-body model-predictive control applied to the HRP-2 humanoid , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Giulio Sandini,et al.  The iCub humanoid robot: an open platform for research in embodied cognition , 2008, PerMIS.

[23]  Pierre-Brice Wieber,et al.  Hierarchical quadratic programming: Fast online humanoid-robot motion generation , 2014, Int. J. Robotics Res..

[24]  Jing Pan,et al.  Humanoid robot locomotion , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[25]  Vincent Padois,et al.  Synthesis of complex humanoid whole-body behavior: A focus on sequencing and tasks transitions , 2011, 2011 IEEE International Conference on Robotics and Automation.

[26]  D. Dennis,et al.  A statistical method for global optimization , 1992, [Proceedings] 1992 IEEE International Conference on Systems, Man, and Cybernetics.

[27]  Ryan Lober,et al.  Task Compatibility and Feasibility Maximization for Whole-Body Control. (La Maximisation de Compatibilité et Faisabilité des Tâches pour la Commande Corps-Complet) , 2017 .

[28]  Olivier Sigaud,et al.  Variance modulated task prioritization in Whole-Body Control , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[29]  Sergey Levine,et al.  Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic , 2016, ICLR.

[30]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[31]  Mike Stilman,et al.  Time-Optimal Trajectory Generation for Path Following with Bounded Acceleration and Velocity , 2012, Robotics: Science and Systems.

[32]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[33]  Richard M. Murray,et al.  Optimization-Based Control , 2010 .

[34]  Gabriele Nava,et al.  Stability analysis and design of momentum-based controllers for humanoid robots , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[35]  François Keith,et al.  Dynamic Whole-Body Motion Generation Under Rigid Contacts and Other Unilateral Constraints , 2013, IEEE Transactions on Robotics.

[36]  Daniele Pucci,et al.  Highly dynamic balancing via force control , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[37]  Alexander Dietrich,et al.  An overview of null space projections for redundant, torque-controlled robots , 2015, Int. J. Robotics Res..