Torque-based Deep Reinforcement Learning for Task-and-Robot Agnostic Learning on Bipedal Robots Using Sim-to-Real Transfer

In this paper, we review the question of which action space is best suited for controlling a real biped robot in combination with Sim2Real training. Position control has been popular as it has been shown to be more sample efficient and intuitive to combine with other planning algorithms. However, for position control gain tuning is required to achieve the best possible policy performance. We show that instead, using a torque-based action space enables task-and-robot agnostic learning with less parameter tuning and mitigates the sim-to-reality gap by taking advantage of torque control's inherent compliance. Also, we accelerate the torque-based-policy training process by pre-training the policy to remain upright by compensating for gravity. The paper showcases the first successful sim-to-real transfer of a torque-based deep reinforcement learning policy on a real human-sized biped robot. The video is available at https://youtu.be/CR6pTS39VRE.

[1]  M. Schwartz,et al.  Design of the Humanoid Robot TOCABI , 2022, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).

[2]  Jaeheung Park,et al.  Humanoid Balance Control using Centroidal Angular Momentum based on Hierarchical Quadratic Programming , 2022, 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  S. Levine,et al.  GenLoco: Generalized Locomotion Controllers for Quadrupedal Robots , 2022, CoRL.

[4]  S. Levine,et al.  A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning , 2022, ArXiv.

[5]  F. Kanehiro,et al.  Learning Bipedal Walking On Planned Footsteps For Humanoid Robots , 2022, 2022 IEEE-RAS 21st International Conference on Humanoid Robots (Humanoids).

[6]  Akshara Rai,et al.  Learning Torque Control for Quadrupedal Locomotion , 2022, 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids).

[7]  Lorenz Wellhausen,et al.  Learning robust perceptive locomotion for quadrupedal robots in the wild , 2022, Science Robotics.

[8]  M. Fallon,et al.  RLOC: Terrain-Aware Legged Locomotion Using Reinforcement Learning and Optimal Control , 2020, IEEE Transactions on Robotics.

[9]  K. Takahashi,et al.  Sim-to-Real Learning of Robust Compliant Bipedal Locomotion on Torque Sensor-Less Gear-Driven Humanoid , 2022, ArXiv.

[10]  Jitendra Malik,et al.  RMA: Rapid Motor Adaptation for Legged Robots , 2021, Robotics: Science and Systems.

[11]  Sven Behnke,et al.  DeepWalk: Omnidirectional Bipedal Gait by Deep Reinforcement Learning , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[12]  Koushil Sreenath,et al.  Reinforcement Learning for Robust Parameterized Locomotion Control of Bipedal Robots , 2021, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[13]  Alan Fern,et al.  Learning Task Space Actions for Bipedal Locomotion , 2020, 2021 IEEE International Conference on Robotics and Automation (ICRA).

[14]  IEEE International Conference on Robotics and Automation, ICRA 2021, Xi'an, China, May 30 - June 5, 2021 , 2021, ICRA.

[15]  Lorenz Wellhausen,et al.  Learning quadrupedal locomotion over challenging terrain , 2020, Science Robotics.

[16]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[17]  Michiel van de Panne,et al.  Learning Locomotion Skills for Cassie: Iterative Design and Sim-to-Real , 2019, CoRL.

[18]  Johannes Englsberger,et al.  Torque-Based Dynamic Walking - A Long Way from Simulation to Experiment , 2018, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[19]  Atil Iscen,et al.  Sim-to-Real: Learning Agile Locomotion For Quadruped Robots , 2018, Robotics: Science and Systems.

[20]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[21]  Glen Berseth,et al.  Feedback Control For Cassie With Deep Reinforcement Learning , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[22]  Marco Hutter,et al.  Gait and Trajectory Optimization for Legged Systems Through Phase-Based End-Effector Parameterization , 2018, IEEE Robotics and Automation Letters.

[23]  Marcin Andrychowicz,et al.  Sim-to-Real Transfer of Robotic Control with Dynamics Randomization , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[24]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[25]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[26]  Riccardo Muradore,et al.  A Review of Algorithms for Compliant Control of Stiff and Fixed-Compliance Robots , 2016, IEEE/ASME Transactions on Mechatronics.

[27]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[28]  Alin Albu-Schäffer,et al.  Overview of the torque-controlled humanoid robot TORO , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[29]  Robin Deits,et al.  Footstep planning on uneven terrain with mixed-integer convex optimization , 2014, 2014 IEEE-RAS International Conference on Humanoid Robots.

[30]  Darwin G. Caldwell,et al.  Torque-control based compliant actuation of a quadruped robot , 2012, 2012 12th IEEE International Workshop on Advanced Motion Control (AMC).

[31]  Stefan Schaal,et al.  Compliant quadruped locomotion over rough terrain , 2009, 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[32]  Oussama Khatib,et al.  Contact consistent control framework for humanoid robots , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..