Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the “pendubot” swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.

[1]  T. Rowan Functional stability analysis of numerical algorithms , 1990 .

[2]  M. Gautier,et al.  Exciting Trajectories for the Identification of Base Inertial Parameters of Robots , 1991, [1991] Proceedings of the 30th IEEE Conference on Decision and Control.

[3]  Mark W. Spong,et al.  The Pendubot: a mechatronic system for control research and education , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[4]  M. Köppen,et al.  The Curse of Dimensionality , 2010 .

[5]  Nikolaus Hansen,et al.  Completely Derandomized Self-Adaptation in Evolution Strategies , 2001, Evolutionary Computation.

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Shie Mannor,et al.  Reinforcement learning with Gaussian processes , 2005, ICML.

[8]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[9]  J.M. Hollerbach,et al.  A Modular and High-Precision Motion Control System With an Integrated Motor , 2007, IEEE/ASME Transactions on Mechatronics.

[10]  Dieter Fox,et al.  Gaussian Processes and Reinforcement Learning for Identification and Control of an Autonomous Blimp , 2007, Proceedings 2007 IEEE International Conference on Robotics and Automation.

[11]  Wisama Khalil,et al.  Model Identification , 2019, Springer Handbook of Robotics, 2nd Ed..

[12]  Tom Schaul,et al.  Natural Evolution Strategies , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[13]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[16]  Jan Peters,et al.  Imitation and Reinforcement Learning , 2010, IEEE Robotics & Automation Magazine.

[17]  Jan Peters,et al.  Using model knowledge for learning inverse dynamics , 2010, 2010 IEEE International Conference on Robotics and Automation.

[18]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[19]  Jan Peters,et al.  Model learning for robot control: a survey , 2011, Cognitive Processing.

[20]  Antoine Cully,et al.  Behavioral repertoire learning in robotics , 2013, GECCO '13.

[21]  Olivier Sigaud,et al.  Robot Skill Learning: From Reinforcement Learning to Evolution Strategies , 2013, Paladyn J. Behav. Robotics.

[22]  Martin A. Riedmiller,et al.  Optimization of Gaussian process hyperparameters using Rprop , 2013, ESANN.

[23]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[24]  Carl E. Rasmussen,et al.  Policy search for learning robot control using sparse data , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[25]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[26]  Dmitry Berenson,et al.  No falls, no resets: Reliable humanoid behavior in the DARPA robotics challenge , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[27]  Jean-Baptiste Mouret,et al.  Illuminating search spaces by mapping elites , 2015, ArXiv.

[28]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[29]  Carl E. Rasmussen,et al.  Gaussian Processes for Data-Efficient Learning in Robotics and Control , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Jonathan P. How,et al.  Efficient reinforcement learning for robots using informative simulated priors , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[32]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[33]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[34]  Christopher G. Atkeson,et al.  Sample efficient optimization for learning controllers for bipedal locomotion , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[35]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[36]  Antoine Cully,et al.  Limbo: A Fast and Flexible Library for Bayesian Optimization , 2016, ArXiv.

[37]  Sergey Levine,et al.  Model-based reinforcement learning with parametrized physical models and optimism-driven exploration , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[38]  Giorgio Metta,et al.  Incremental semiparametric inverse dynamics learning , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[39]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Pietro Falco,et al.  Data-efficient control policy search using residual dynamics learning , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[41]  Athanasios S. Polydoros,et al.  Survey of Model-Based Reinforcement Learning: Applications on Robotics , 2017, J. Intell. Robotic Syst..

[42]  Jean-Baptiste Mouret,et al.  20 years of reality gap: a few thoughts about simulators in evolutionary robotics , 2017, GECCO.

[43]  Ai Poh Loh,et al.  Model-based contextual policy search for data-efficient generalization of robot skills , 2017, Artif. Intell..

[44]  Siddhartha S. Srinivasa,et al.  GP-ILQG: Data-driven Robust Optimal Control for Uncertain Nonlinear Dynamical Systems , 2017, ArXiv.

[45]  Russ Tedrake,et al.  Funnel libraries for real-time robust feedback motion planning , 2016, Int. J. Robotics Res..

[46]  Jean-Baptiste Mouret,et al.  Black-box data-efficient policy search for robotics , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Jean-Baptiste Mouret,et al.  Using Centroidal Voronoi Tessellations to Scale Up the Multidimensional Archive of Phenotypic Elites Algorithm , 2016, IEEE Transactions on Evolutionary Computation.

[48]  Jean-Baptiste Mouret,et al.  Reset-free Trial-and-Error Learning for Robot Damage Recovery , 2016, Robotics Auton. Syst..

[49]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..