论文信息 - Sample efficient optimization for learning controllers for bipedal locomotion

Sample efficient optimization for learning controllers for bipedal locomotion

Learning policies for bipedal locomotion can be difficult, as experiments are expensive and simulation does not usually transfer well to hardware. To counter this, we need algorithms that are sample efficient and inherently safe. Bayesian Optimization is a powerful sample-efficient tool for optimizing non-convex black-box functions. However, its performance can degrade in higher dimensions. We develop a distance metric for bipedal locomotion that enhances the sample-efficiency of Bayesian Optimization and use it to train a 16 dimensional neuromuscular model for planar walking. This distance metric reflects some basic gait features of healthy walking and helps us quickly eliminate a majority of unstable controllers. With our approach we can learn policies for walking in less than 100 trials for a range of challenging settings. In simulation, we show results on two different costs and on various terrains including rough ground and ramps, sloping upwards and downwards. We also perturb our models with unknown inertial disturbances analogous with differences between simulation and hardware. These results are promising, as they indicate that this method can potentially be used to learn control policies on hardware.

[1] Matt J. Kusner,et al. Bayesian Optimization with Inequality Constraints , 2014, ICML.

[2] Christopher K. I. Williams,et al. Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[3] Jan Peters,et al. Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[4] Seungmoon Song,et al. A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion , 2015, The Journal of physiology.

[5] Daniel J. Schneck. Mechanics of Muscle , 1991 .

[6] J. Saunders,et al. The major determinants in normal and pathological gait. , 1953, The Journal of bone and joint surgery. American volume.

[7] Auke Jan Ijspeert,et al. Biped gait controller for large speed variations, combining reflexes and a central pattern generator in a neuromuscular model , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[8] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[9] Andreas Krause,et al. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[10] Hartmut Geyer,et al. Robust swing leg placement under large disturbances , 2012, 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[11] Nando de Freitas,et al. A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning , 2010, ArXiv.

[12] Nando de Freitas,et al. A Bayesian exploration-exploitation approach for optimal online sensing and planning with a visually guided mobile robot , 2009, Auton. Robots.

[13] Paul Bratley,et al. Algorithm 659: Implementing Sobol's quasirandom sequence generator , 1988, TOMS.

[14] Jan Peters,et al. Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[15] M. van de Panne,et al. Generalized biped walking control , 2010, ACM Trans. Graph..

[16] KrauseAndreas,et al. Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008 .

[17] J B Morrison,et al. The mechanics of muscle function in locomotion. , 1970, Journal of biomechanics.

[18] Bernd Faust,et al. Model-Based Control of a Robot Manipulator , 1988 .

[19] Nando de Freitas,et al. Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[20] Tao Wang,et al. Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[21] Alan Fern,et al. Using trajectory data to improve bayesian optimization for reinforcement learning , 2014, J. Mach. Learn. Res..

[22] KangKang Yin,et al. SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[23] Nikolaus Hansen,et al. The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[24] Hartmut Geyer,et al. Swing-leg retraction: a simple control model for stable running , 2003, Journal of Experimental Biology.

[25] C. Charalambous. The Major Determinants in Normal and Pathological Gait , 2014 .

[26] D. Winter,et al. EMG profiles during normal human walking: stride-to-stride and inter-subject variability. , 1987, Electroencephalography and clinical neurophysiology.

[27] Hartmut Geyer,et al. A Muscle-Reflex Model That Encodes Principles of Legged Mechanics Produces Human Walking Dynamics and Muscle Activities , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[28] Nitish Thatte,et al. Toward Balance Recovery With Leg Prostheses Using Neuromuscular Model Control , 2016, IEEE Transactions on Biomedical Engineering.