Iterative Reinforcement Learning Based Design of Dynamic Locomotion Skills for Cassie

Deep reinforcement learning (DRL) is a promising approach for developing legged locomotion skills. However, the iterative design process that is inevitable in practice is poorly supported by the default methodology. It is difficult to predict the outcomes of changes made to the reward functions, policy architectures, and the set of tasks being trained on. In this paper, we propose a practical method that allows the reward function to be fully redefined on each successive design iteration while limiting the deviation from the previous iteration. We characterize policies via sets of Deterministic Action Stochastic State (DASS) tuples, which represent the deterministic policy state-action pairs as sampled from the states visited by the trained stochastic policy. New policies are trained using a policy gradient algorithm which then mixes RL-based policy gradients with gradient updates defined by the DASS tuples. The tuples also allow for robust policy distillation to new network architectures. We demonstrate the effectiveness of this iterative-design approach on the bipedal robot Cassie, achieving stable walking with different gait styles at various speeds. We demonstrate the successful transfer of policies learned in simulation to the physical robot without any dynamics randomization, and that variable-speed walking policies for the physical robot can be represented by a small dataset of 5-10k tuples.

[1]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[2]  Dean Pomerleau,et al.  ALVINN, an autonomous land vehicle in a neural network , 2015 .

[3]  John Folkesson,et al.  Deep Reinforcement Learning to Acquire Navigation Skills for Wheel-Legged Robots in Complex Environments , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[4]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[5]  Jun Nakanishi,et al.  Control, Planning, Learning, and Imitation with Dynamic Movement Primitives , 2003 .

[6]  Kris Hauser,et al.  A data-driven indirect method for nonlinear optimal control , 2019 .

[7]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[8]  Sergey Levine,et al.  Learning to Adapt: Meta-Learning for Model-Based Control , 2018, ArXiv.

[9]  Philip Bachman,et al.  Deep Reinforcement Learning that Matters , 2017, AAAI.

[10]  T. Takenaka,et al.  The development of Honda humanoid robot , 1998, Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No.98CH36146).

[11]  Kris Hauser,et al.  Learning Trajectories for Real- Time Optimal Control of Quadrotors , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[12]  Kazuhito Yokoi,et al.  The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[13]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[14]  Jessy W. Grizzle,et al.  Feedback Control of a Cassie Bipedal Robot: Walking, Standing, and Riding a Segway , 2018, 2019 American Control Conference (ACC).

[15]  Jessy W. Grizzle,et al.  Rapid Bipedal Gait Design Using C-FROST with Illustration on a Cassie-series Robot , 2018, ArXiv.

[16]  J. Andrew Bagnell,et al.  Efficient Reductions for Imitation Learning , 2010, AISTATS.

[17]  Scott Kuindersma,et al.  A closed-form solution for real-time ZMP gait generation and feedback stabilization , 2015, 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids).

[18]  Glen Berseth,et al.  Feedback Control For Cassie With Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[19]  Christopher G. Atkeson,et al.  Using Deep Reinforcement Learning to Learn High-Level Policies on the ATRIAS Biped , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[20]  Anca D. Dragan,et al.  DART: Noise Injection for Robust Imitation Learning , 2017, CoRL.

[21]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[22]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[23]  Aaron D. Ames,et al.  Dynamic Humanoid Locomotion: A Scalable Formulation for HZD Gait Optimization , 2018, IEEE Transactions on Robotics.

[24]  Christopher G. Atkeson,et al.  Biped walking control using a trajectory library , 2013, Robotica.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[27]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[28]  Glen Berseth,et al.  Progressive Reinforcement Learning with Distillation for Multi-Skilled Motion Control , 2018, ICLR.

[29]  Yee Whye Teh,et al.  Neural probabilistic motor primitives for humanoid control , 2018, ICLR.

[30]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[31]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[32]  Masashi Sugiyama,et al.  Active deep Q-learning with demonstration , 2018, Machine Learning.

[33]  Scott Kuindersma,et al.  Optimization and stabilization of trajectories for constrained dynamical systems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[34]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[35]  Aaron D. Ames,et al.  Coupling Reduced Order Models via Feedback Control for 3D Underactuated Bipedal Robotic Walking , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[36]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[37]  Geoffrey J. Gordon,et al.  A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning , 2010, AISTATS.

[38]  Glen Berseth,et al.  DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..

[39]  Jessy W. Grizzle,et al.  Supervised learning for stabilizing underactuated bipedal robot locomotion, with outdoor experiments on the wave field , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Taylor Apgar,et al.  Fast Online Trajectory Optimization for the Bipedal Robot Cassie , 2018, Robotics: Science and Systems.

[41]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[42]  Martijn Wisse,et al.  The design of LEO: A 2D bipedal walking robot for online autonomous Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.