Bayesian Optimization Meets Hybrid Zero Dynamics: Safe Parameter Learning for Bipedal Locomotion Control

In this paper, we propose a multi-domain control parameter learning framework that combines Bayesian Optimization (BO) and Hybrid Zero Dynamics (HZD) for locomotion control of bipedal robots. We leverage BO to learn the control parameters used in the HZD-based controller. The learning process is firstly deployed in simulation to optimize different control parameters for a large repertoire of gaits. Next, to tackle the discrepancy between the simulation and the real world, the learning process is applied on the physical robot to learn for corrections to the control parameters learned in simulation while also respecting a safety constraint for gait stability. This method empowers an efficient sim-to-real transition with a small number of samples in the real world, and does not require a valid controller to initialize the training in simulation. Our proposed learning framework is experimentally deployed and validated on a bipedal robot Cassie to perform versatile locomotion skills with improved performance on smoothness of walking gaits and reduction of steady-state tracking errors.

[1]  J. Mockus Bayesian Approach to Global Optimization: Theory and Applications , 1989 .

[2]  Donald R. Jones,et al.  A Taxonomy of Global Optimization Methods Based on Response Surfaces , 2001, J. Glob. Optim..

[3]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[6]  Christine Chevallereau,et al.  3D Bipedal Robotic Walking: Models, Feedback Control, and Open Problems , 2010 .

[7]  Adam D. Bull,et al.  Convergence Rates of Efficient Global Optimization Algorithms , 2011, J. Mach. Learn. Res..

[8]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[9]  Jasper Snoek,et al.  Bayesian Optimization with Unknown Constraints , 2014, UAI.

[10]  Duy Nguyen-Tuong,et al.  Safe Exploration for Active Learning with Gaussian Processes , 2015, ECML/PKDD.

[11]  Jan Peters,et al.  Bayesian optimization for learning gaits under uncertainty , 2015, Annals of Mathematics and Artificial Intelligence.

[12]  Alkis Gotovos,et al.  Safe Exploration for Optimization with Gaussian Processes , 2015, ICML.

[13]  Koushil Sreenath,et al.  Dynamic Walking on Stepping Stones with Gait Library and Control Barrier Functions , 2016, WAFR.

[14]  Jessy W. Grizzle,et al.  From 2D Design of Underactuated Bipedal Gaits to 3D Implementation: Walking With Speed Tracking , 2016, IEEE Access.

[15]  Andreas Krause,et al.  Safe controller optimization for quadrotors with Gaussian processes , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Koushil Sreenath,et al.  Dynamic Walking on Randomly-Varying Discrete Terrain with One-step Preview , 2017, Robotics: Science and Systems.

[17]  Aaron D. Ames,et al.  FROST∗: Fast robot optimization and simulation toolkit , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Jessy W. Grizzle,et al.  Rapid Bipedal Gait Design Using C-FROST with Illustration on a Cassie-series Robot , 2018, ArXiv.

[19]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[20]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[21]  Jessy W. Grizzle,et al.  Feedback Control of a Cassie Bipedal Robot: Walking, Standing, and Riding a Segway , 2018, 2019 American Control Conference (ACC).

[22]  Jessy W. Grizzle,et al.  Rapid Trajectory optimization Using C-FROST with Illustration on a Cassie-Series Dynamic Walking Biped , 2018, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[23]  Aaron D. Ames,et al.  Preference-Based Learning for Exoskeleton Gait Optimization , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Sertac Karaman,et al.  Multi-Fidelity Black-Box Optimization for Time-Optimal Quadrotor Maneuvers , 2020, RSS 2020.

[25]  Koushil Sreenath,et al.  Animated Cassie: A Dynamic Relatable Robotic Character , 2020, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Aaron D. Ames,et al.  Control Lyapunov Functions for Compliant Hybrid Zero Dynamic Walking , 2021, ArXiv.

[27]  A. Ames,et al.  Learning Controller Gains on Bipedal Walking Robots via User Preferences , 2021, 2022 International Conference on Robotics and Automation (ICRA).

[28]  L. Righetti,et al.  Robot Learning With Crash Constraints , 2020, IEEE Robotics and Automation Letters.

[29]  Bowen Weng,et al.  Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[30]  Koushil Sreenath,et al.  Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[31]  Andreas Krause,et al.  Bayesian optimization with safety constraints: safe and automatic parameter tuning in robotics , 2016, Machine Learning.

[32]  K. Sreenath,et al.  Vision-Aided Autonomous Navigation of Underactuated Bipedal Robots in Height-Constrained Environments , 2021, 2109.05714.

[33]  Alan Fern,et al.  Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).