Using Simulation to Improve Sample-Efficiency of Bayesian Optimization for Bipedal Robots

Learning for control can acquire controllers for novel robotic tasks, paving the path for autonomous agents. Such controllers can be expert-designed policies, which typically require tuning of parameters for each task scenario. In this context, Bayesian optimization (BO) has emerged as a promising approach for automatically tuning controllers. However, when performing BO on hardware for high-dimensional policies, sample-efficiency can be an issue. Here, we develop an approach that utilizes simulation to map the original parameter space into a domain-informed space. During BO, similarity between controllers is now calculated in this transformed space. Experiments on the ATRIAS robot hardware and another bipedal robot simulation show that our approach succeeds at sample-efficiently learning controllers for multiple robots. Another question arises: What if the simulation significantly differs from hardware? To answer this, we create increasingly approximate simulators and study the effect of increasing simulation-hardware mismatch on the performance of Bayesian optimization. We also compare our approach to other approaches from literature, and find it to be more reliable, especially in cases of high mismatch. Our experiments show that our approach succeeds across different controller types, bipedal robot models and simulator fidelity levels, making it applicable to a wide range of bipedal locomotion problems.

[1]  J. Saunders,et al.  The major determinants in normal and pathological gait. , 1953, The Journal of bone and joint surgery. American volume.

[2]  D. Winter,et al.  EMG profiles during normal human walking: stride-to-stride and inter-subject variability. , 1987, Electroencephalography and clinical neurophysiology.

[3]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[4]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[5]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[6]  Nando de Freitas,et al.  A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[7]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[8]  Hartmut Geyer,et al.  A Muscle-Reflex Model That Encodes Principles of Legged Mechanics Produces Human Walking Dynamics and Muscle Activities , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[9]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[10]  Carl E. Rasmussen,et al.  PILCO: A Model-Based and Data-Efficient Approach to Policy Search , 2011, ICML.

[11]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[13]  Alan Fern,et al.  Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..

[14]  Jasper Snoek,et al.  Input Warping for Bayesian Optimization of Non-Stationary Functions , 2014, ICML.

[15]  Matt J. Kusner,et al.  Bayesian Optimization with Inequality Constraints , 2014, ICML.

[16]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[17]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[18]  Christopher G. Atkeson,et al.  Optimization‐based Full Body Control for the DARPA Robotics Challenge , 2015, J. Field Robotics.

[19]  Hartmut Geyer,et al.  Toward a virtual neuromuscular control for robust walking in bipedal robots , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Albert Wu,et al.  Robust spring mass model running for a physical bipedal robot , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[21]  Seungmoon Song,et al.  A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion , 2015, The Journal of physiology.

[22]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[23]  Auke Jan Ijspeert,et al.  Experimental validation of a bio-inspired controller for dynamic walking with a humanoid robot , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[24]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[25]  Nitish Thatte,et al.  Toward Balance Recovery With Leg Prostheses Using Neuromuscular Model Control , 2016, IEEE Transactions on Biomedical Engineering.

[26]  Christopher G. Atkeson,et al.  Sample efficient optimization for learning controllers for bipedal locomotion , 2016, 2016 IEEE-RAS 16th International Conference on Humanoid Robots (Humanoids).

[27]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[28]  Alexander Spröwitz,et al.  ATRIAS: Design and validation of a tether-free 3D-capable spring-mass bipedal robot , 2016, Int. J. Robotics Res..

[29]  Peter Englert,et al.  Combined Optimization and Reinforcement Learning for Manipulation Skills , 2016, Robotics: Science and Systems.

[30]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[31]  Scott Kuindersma,et al.  Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot , 2015, Autonomous Robots.

[32]  Carl E. Rasmussen,et al.  Manifold Gaussian Processes for regression , 2014, 2016 International Joint Conference on Neural Networks (IJCNN).

[33]  Andreas Krause,et al.  Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Kirthevasan Kandasamy,et al.  Multi-fidelity Bayesian Optimisation with Continuous Approximations , 2017, ICML.

[35]  Christopher G. Atkeson,et al.  Deep Kernels for Optimizing Locomotion Controllers , 2017, CoRL.

[36]  Christopher G. Atkeson,et al.  Bayesian Optimization Using Domain Knowledge on the ATRIAS Biped , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[37]  Ruben Martinez-Cantin,et al.  Funneled Bayesian Optimization for Design, Tuning and Control of Autonomous Systems , 2016, IEEE Transactions on Cybernetics.