Energy-efficient and damage-recovery slithering gait design for a snake-like robot based on reinforcement learning and inverse reinforcement learning

Similar to real snakes in nature, the flexible trunks of snake-like robots enhance their movement capabilities and adaptabilities in diverse environments. However, this flexibility corresponds to a complex control task involving highly redundant degrees of freedom, where traditional model-based methods usually fail to propel the robots energy-efficiently and adaptively to unforeseeable joint damage. In this work, we present an approach for designing an energy-efficient and damage-recovery slithering gait for a snake-like robot using the reinforcement learning (RL) algorithm and the inverse reinforcement learning (IRL) algorithm. Specifically, we first present an RL-based controller for generating locomotion gaits at a wide range of velocities, which is trained using the proximal policy optimization (PPO) algorithm. Then, by taking the RL-based controller as an expert and collecting trajectories from it, we train an IRL-based controller using the adversarial inverse reinforcement learning (AIRL) algorithm. For the purpose of comparison, a traditional parameterized gait controller is presented as the baseline and the parameter sets are optimized using the grid search and Bayesian optimization algorithm. Based on the analysis of the simulation results, we first demonstrate that this RL-based controller exhibits very natural and adaptive movements, which are also substantially more energy-efficient than the gaits generated by the parameterized controller. We then demonstrate that the IRL-based controller cannot only exhibit similar performances as the RL-based controller, but can also recover from the unpredictable damage body joints and still outperform the model-based controller, which has an undamaged body, in terms of energy efficiency. Videos can be viewed at https://videoviewsite.wixsite.com/rlsnake.

[1]  広瀬 茂男,et al.  Biologically inspired robots : snake-like locomotors and manipulators , 1993 .

[2]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[3]  Shugen Ma,et al.  Smooth transition for CPG-based body shape control of a snake-like robot , 2013, Bioinspiration & biomimetics.

[4]  Øyvind Stavdahl,et al.  Snake Robots: Modelling, Mechatronics, and Control , 2012 .

[5]  Wei-Min Shen,et al.  Using role-based control to produce locomotion in chain-type self-reconfigurable robots , 2002 .

[6]  Nikolaos G. Tsagarakis,et al.  Learning to exploit passive compliance for energy-efficient gait generation on a compliant humanoid , 2019, Auton. Robots.

[7]  Peter J. Bentley,et al.  An Evolutionary Approach to Damage Recovery of robot Motion with Muscles , 2003, ECAL.

[8]  Hongnian Yu,et al.  Modelling and analysis of dynamic frictional interactions of vibro-driven capsule systems with viscoelastic property , 2019, European Journal of Mechanics - A/Solids.

[9]  Andy Ruina,et al.  DESIGN AND CONTROL OF RANGER: AN ENERGY-EFFICIENT, DYNAMIC WALKING ROBOT , 2012 .

[10]  Howie Choset,et al.  Using response surfaces and expected improvement to optimize snake robot gait parameters , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[11]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[12]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[13]  Tetsuya Iwasaki,et al.  Serpentine locomotion with robotic snakes , 2002 .

[14]  Howie Choset,et al.  Simplifying Gait Design via Shape Basis Optimization , 2016, Robotics: Science and Systems.

[15]  Jean-Baptiste Mouret,et al.  Reset-free Trial-and-Error Learning for Robot Damage Recovery , 2016, Robotics Auton. Syst..

[16]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[17]  Jasmine A. Nirody,et al.  The mechanics of slithering locomotion , 2009, Proceedings of the National Academy of Sciences.

[18]  Kevin J. Dowling,et al.  Power Sources for Small Robots , 1997 .

[19]  Yuval Tassa,et al.  MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[20]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[21]  Tao Wang,et al.  Automatic Gait Optimization with Gaussian Process Regression , 2007, IJCAI.

[22]  Auke Jan Ijspeert,et al.  Online Optimization of Swimming and Crawling in an Amphibious Snake Robot , 2008, IEEE Transactions on Robotics.

[23]  Sergey Levine,et al.  Learning Complex Dexterous Manipulation with Deep Reinforcement Learning and Demonstrations , 2017, Robotics: Science and Systems.

[24]  Antoine Cully,et al.  Robots that can adapt like animals , 2014, Nature.

[25]  Alois Knoll,et al.  Towards autonomous locomotion: Slithering gait design of a snake-like robot for target observation and tracking , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[26]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[27]  S. Ma,et al.  Analysis of snake movement forms for realization of snake-like robots , 1999, Proceedings 1999 IEEE International Conference on Robotics and Automation (Cat. No.99CH36288C).

[28]  Xiaogang Jin,et al.  Quadruplet Network With One-Shot Learning for Fast Visual Object Tracking , 2017, IEEE Transactions on Image Processing.

[29]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[30]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[31]  Jan Peters,et al.  Bayesian Gait Optimization for Bipedal Locomotion , 2014, LION.

[32]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[33]  Jianbing Shen,et al.  Local Semantic Siamese Networks for Fast Tracking , 2019, IEEE Transactions on Image Processing.

[34]  Alois Knoll,et al.  Towards autonomous locomotion: CPG-based control of smooth 3D slithering gait transition of a snake-like robot , 2017, Bioinspiration & biomimetics.

[35]  Peter J. Bentley,et al.  Innately adaptive robotics through embodied evolution , 2006, Auton. Robots.

[36]  Hongnian Yu,et al.  Optimized adaptive tracking control for an underactuated vibro-driven capsule system , 2018, Nonlinear Dynamics.

[37]  Hao Zhang,et al.  Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[38]  V. Tucker The energetic cost of moving about. , 1975, American Scientist.

[39]  Alois Knoll,et al.  Energy-Efficient Slithering Gait Exploration for a Snake-like Robot based on Reinforcement Learning , 2019, IJCAI.

[40]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[41]  William T. B. Uther,et al.  Automatic Gait Optimisation for Quadruped Robots , 2003 .

[42]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[43]  Ling Shao,et al.  Dynamical Hyperparameter Optimization via Deep Reinforcement Learning in Tracking , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Manuela M. Veloso,et al.  An evolutionary approach to gait learning for four-legged robots , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[45]  Howie Choset,et al.  Parameterized and Scripted Gaits for Modular Snake Robots , 2009, Adv. Robotics.

[46]  Peter Stone,et al.  Machine Learning for Fast Quadrupedal Locomotion , 2004, AAAI.

[47]  Alois Knoll,et al.  CPG-based control of smooth transition for body shape and locomotion speed of a snake-like robot , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[48]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[49]  Michael C. Yip,et al.  Adversarial Imitation via Variational Inverse Reinforcement Learning , 2018, ICLR.

[50]  Jianbing Shen,et al.  Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient , 2020, IEEE Transactions on Neural Networks and Learning Systems.