CPG-ACTOR: Reinforcement Learning for Central Pattern Generators

Central Pattern Generators (CPGs) have several properties desirable for locomotion: they generate smooth trajectories, are robust to perturbations and are simple to implement. Although conceptually promising, we argue that the full potential of CPGs has so far been limited by insufficient sensory-feedback information. This paper proposes a new methodology that allows tuning CPG controllers through gradient-based optimisation in a Reinforcement Learning (RL) setting. To the best of our knowledge, this is the first time CPGs have been trained in conjunction with a Multilayer Perceptron (MLP) network in a Deep-RL context. In particular, we show how CPGs can directly be integrated as the Actor in an Actor-Critic formulation. Additionally, we demonstrate how this change permits us to integrate highly non-linear feedback directly from sensory perception to reshape the oscillators’ dynamics. Our results on a locomotion task using a single-leg hopper demonstrate that explicitly using the CPG as the Actor rather than as part of the environment results in a significant increase in the reward gained over time (6x more) compared with previous approaches. Furthermore, we show that our method without feedback reproduces results similar to prior work with feedback. Finally, we demonstrate how our closedloop CPG progressively improves the hopping behaviour for longer training epochs relying only on basic reward functions.

[1]  Auke Jan Ijspeert,et al.  Central Pattern Generators augmented with virtual model control for quadruped rough terrain locomotion , 2013, 2013 IEEE International Conference on Robotics and Automation.

[2]  Eiichi Yoshida,et al.  Automatic locomotion pattern generation for modular robots , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[3]  Auke Jan Ijspeert,et al.  Towards dynamic trot gait locomotion: Design, control, and experiments with Cheetah-cub, a compliant quadruped robot , 2013, Int. J. Robotics Res..

[4]  Loredana Zollo,et al.  Hierarchical reinforcement learning and central pattern generators for modeling the development of rhythmic manipulation skills , 2011, 2011 IEEE International Conference on Development and Learning (ICDL).

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Shin Ishii,et al.  Reinforcement learning for a biped robot based on a CPG-actor-critic method , 2007, Neural Networks.

[7]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[8]  Ludovic Righetti,et al.  Pattern generators with sensory feedback for the control of quadruped locomotion , 2008, 2008 IEEE International Conference on Robotics and Automation.

[9]  Ioannis Havoutis,et al.  RLOC: Terrain-Aware Legged Locomotion using Reinforcement Learning and Optimal Control , 2020, ArXiv.

[10]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[11]  Auke Jan Ijspeert,et al.  Automatic generation of reduced CPG control networks for locomotion of arbitrary modular robot structures , 2014, Robotics: Science and Systems.

[12]  Roland Siegwart,et al.  Reinforcement learning of single legged locomotion , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Stefan Schaal,et al.  Learning, planning, and control for quadruped locomotion over challenging terrain , 2011, Int. J. Robotics Res..

[14]  Auke Jan Ijspeert,et al.  Learning robot gait stability using neural networks as sensory feedback function for Central Pattern Generators , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[15]  Peter Fankhauser,et al.  ANYmal - a highly mobile and dynamic quadrupedal robot , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Youngjin Choi,et al.  Adaptation to environmental change using reinforcement learning for robotic salamander , 2019, Intelligent Service Robotics.

[17]  Shin Ishii,et al.  Reinforcement learning for a snake-like robot controlled by a central pattern generator , 2004, IEEE Conference on Robotics, Automation and Mechatronics, 2004..

[18]  Darwin G. Caldwell,et al.  Slip Detection and Recovery for Quadruped Robots , 2015, ISRR.

[19]  A. Ijspeert,et al.  From Swimming to Walking with a Salamander Robot Driven by a Spinal Cord Model , 2007, Science.

[20]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[21]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[22]  Auke Jan Ijspeert,et al.  Central pattern generators for locomotion control in animals and robots: A review , 2008, Neural Networks.

[23]  Marco Hutter,et al.  Dynamic Locomotion Through Online Nonlinear Motion Optimization for Quadrupedal Robots , 2018, IEEE Robotics and Automation Letters.

[24]  Auke Jan Ijspeert,et al.  Modular control of limit cycle locomotion over unperceived rough terrain , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[25]  Darwin G. Caldwell,et al.  Probabilistic Contact Estimation and Impact Detection for State Estimation of Quadruped Robots , 2017, IEEE Robotics and Automation Letters.

[26]  Alexander Herzog,et al.  On Time Optimization of Centroidal Momentum Dynamics , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).