Linear Policies are Sufficient to Realize Robust Bipedal Walking on Challenging Terrains

In this work, we demonstrate robust walking in the bipedal robot Digit on uneven terrains by just learning a single linear policy. In particular, we propose a new control pipeline, wherein the high-level trajectory modulator shapes the end-foot ellipsoidal trajectories, and the low-level gait controller regulates the torso and ankle orientation. The foot-trajectory modulator uses a linear policy and the regulator uses a linear PD control law. As opposed to neural network based policies, the proposed linear policy has only 13 learnable parameters, thereby not only guaranteeing sample efficient learning but also enabling simplicity and interpretability of the policy. This is achieved with no loss of performance on challenging terrains like slopes, stairs and outdoor landscapes. We first demonstrate robust walking in the custom simulation environment, MuJoCo, and then directly transfer to hardware with no modification of the control pipeline. We subject the biped to a series of pushes and terrain height changes, both indoors and outdoors, thereby validating the presented work.

[1]  Alan Fern,et al.  Blind Bipedal Stair Traversal via Sim-to-Real Reinforcement Learning , 2021, Robotics: Science and Systems.

[2]  Ashish Joglekar,et al.  Robust Quadrupedal Locomotion on Sloped Terrains: A Linear Policy Approach , 2020, CoRL.

[3]  Koushil Sreenath,et al.  Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[4]  Taesoo Kwon,et al.  Fast and flexible multilegged locomotion using learned centroidal dynamics , 2020, ACM Trans. Graph..

[5]  Bowen Weng,et al.  Robust Feedback Motion Policy Design Using Reinforcement Learning on a 3D Digit Bipedal Robot , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[6]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[7]  Shuuji Kajita,et al.  Legged Robots , 2008, Springer Handbook of Robotics.

[8]  Daniel E. Koditschek,et al.  Hybrid zero dynamics of planar biped walkers , 2003, IEEE Trans. Autom. Control..

[9]  Jessy W. Grizzle,et al.  One-Step Ahead Prediction of Angular Momentum about the Contact Point for Control of Bipedal Locomotion: Validation in a LIP-inspired Controller , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[10]  Xingye Da,et al.  Combining trajectory optimization, supervised machine learning, and model structure for mitigating the curse of dimensionality in the control of bipedal robots , 2017, Int. J. Robotics Res..

[11]  Alan Fern,et al.  Learning Memory-Based Control for Human-Scale Bipedal Locomotion , 2020, Robotics: Science and Systems.

[12]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[13]  Benjamin Recht,et al.  Simple random search of static linear policies is competitive for reinforcement learning , 2018, NeurIPS.

[14]  Jessy W. Grizzle,et al.  Feedback Control of a Cassie Bipedal Robot: Walking, Standing, and Riding a Segway , 2018, 2019 American Control Conference (ACC).

[15]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[16]  Matthew L. Elwin,et al.  Linear Policies are Sufficient to Enable Low-Cost Quadrupedal Robots to Traverse Rough Terrain , 2021, IEEE/RJS International Conference on Intelligent RObots and Systems.

[17]  Aaron D. Ames,et al.  FROST∗: Fast robot optimization and simulation toolkit , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[18]  Bowen Weng,et al.  Hybrid Zero Dynamics Inspired Feedback Control Policy Design for 3D Bipedal Locomotion using Reinforcement Learning , 2019, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Shishir Kolathaya,et al.  Learning Linear Policies for Robust Bipedal Locomotion on Terrains with Varying Slopes , 2021, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[20]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.