Learning Humanoid Robot Running Skills through Proximal Policy Optimization

In the current level of evolution of Soccer 3D, motion control is a key factor in team's performance. Recent works takes advantages of model-free approaches based on Machine Learning to exploit robot dynamics in order to obtain faster locomotion skills, achieving running policies and, therefore, opening a new research direction in the Soccer 3D environment. In this work, we present a methodology based on Deep Reinforcement Learning that learns running skills without any prior knowledge, using a neural network whose inputs are related to robot's dynamics. Our results outperformed the previous state-of-the-art sprint velocity reported in Soccer 3D literature by a significant margin. It also demonstrated improvement in sample efficiency, being able to learn how to run in just few hours. We reported our results analyzing the training procedure and also evaluating the policies in terms of speed, reliability and human similarity. Finally, we presented key factors that lead us to improve previous results and shared some ideas for future work.

[1]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[2]  Patrick MacAlpine,et al.  UT Austin Villa: RoboCup 2015 3D Simulation League Competition and Technical Challenges Champions , 2015, RoboCup.

[3]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[5]  Patrick MacAlpine,et al.  UT Austin Villa: RoboCup 2016 3D Simulation League Competition and Technical Challenges Champions , 2015, Robot Soccer World Cup.

[6]  Klaus Dorer,et al.  Learning to Use Toes in a Humanoid Robot , 2017, RoboCup.

[7]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[8]  Carlos H. C. Ribeiro,et al.  Keyframe Movement Optimization for Simulated Humanoid Robot Using a Parallel Optimization Framework , 2016, 2016 XIII Latin American Robotics Symposium and IV Brazilian Robotics Symposium (LARS/SBR).

[9]  Daniel Urieli,et al.  Design and Optimization of an Omnidirectional Humanoid Walk: A Winning Approach at the RoboCup 2011 3D Simulation Competition , 2012, AAAI.

[10]  Patrick MacAlpine,et al.  UT Austin Villa: RoboCup 2012 3D Simulation League Champion , 2012, RoboCup.

[11]  Daniel Urieli,et al.  On optimizing interdependent skills: a case study in simulated 3D humanoid robot soccer , 2011, AAMAS.

[12]  Pierre Blazevic,et al.  Mechatronic design of NAO humanoid , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Luís Paulo Reis,et al.  Learning to Run Faster in a Humanoid Robot Soccer Environment Through Reinforcement Learning , 2019, RoboCup.

[14]  Russ Tedrake,et al.  Efficient Bipedal Robots Based on Passive-Dynamic Walkers , 2005, Science.

[15]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[16]  Peter Stone,et al.  Layered Learning in Multiagent Systems , 1997, AAAI/IAAI.

[17]  Adilson Marques da Cunha,et al.  Learning Humanoid Robot Motions Through Deep Neural Networks , 2019, ArXiv.

[18]  Yuan Xu,et al.  SimSpark: An Open Source Robot Simulator Developed by the RoboCup Community , 2013, RoboCup.

[19]  Kazuhito Yokoi,et al.  The 3D linear inverted pendulum mode: a simple modeling for a biped walking pattern generation , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).