A Disturbance Rejection Control Method Based on Deep Reinforcement Learning for a Biped Robot

The disturbance rejection performance of a biped robot when walking has long been a focus of roboticists in their attempts to improve robots. There are many traditional stabilizing control methods, such as modifying foot placements and the target zero moment point (ZMP), e.g., in model ZMP control. The disturbance rejection control method in the forward direction of the biped robot is an important technology, whether it comes from the inertia generated by walking or from external forces. The first step in solving the instability of the humanoid robot is to add the ability to dynamically adjust posture when the robot is standing still. The control method based on the model ZMP control is among the main methods of disturbance rejection for biped robots. We use the state-of-the-art deep-reinforcement-learning algorithm combined with model ZMP control in simulating the balance experiment of the cart–table model and the disturbance rejection experiment of the ASIMO humanoid robot standing still. Results show that our proposed method effectively reduces the probability of falling when the biped robot is subjected to an external force in the x-direction.

[1]  Chao Chen,et al.  Stability Control of a Biped Robot on a Dynamic Platform Based on Hybrid Reinforcement Learning , 2020, Sensors.

[2]  Nuno Lau,et al.  A Hybrid Biped Stabilizer System Based on Analytical Control and Learning of Symmetrical Residual Physics , 2020, ArXiv.

[3]  Gil Morales,et al.  Learning an Efficient Gait Cycle of a Biped Robot Based on Reinforcement Learning and Artificial Neural Networks , 2019, Applied Sciences.

[4]  Tom Ziemke,et al.  A novel approach to locomotion learning: Actor-Critic architecture using central pattern generators and dynamic motor primitives , 2014, Front. Neurorobot..

[5]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[6]  Alexander Herzog,et al.  Momentum control with hierarchical inverse dynamics on a torque-controlled humanoid , 2014, Autonomous Robots.

[7]  Zhao Guo,et al.  Intelligent controller for passivity-based biped robot using deep Q network , 2019, J. Intell. Fuzzy Syst..

[8]  Twan Koolen,et al.  Design of a Momentum-Based Control Framework and Application to the Humanoid Robot Atlas , 2016, Int. J. Humanoid Robotics.

[9]  Sami Haddadin,et al.  Model-Adaptive High-Speed Collision Detection for Serial-Chain Robot Manipulators , 2020, IEEE Robotics and Automation Letters.

[10]  Jun Morimoto,et al.  Learning CPG-based biped locomotion with a policy gradient method , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[11]  Weiguo Wu,et al.  Posture self-stabilizer of a biped robot based on training platform and reinforcement learning , 2017, Robotics Auton. Syst..