Learning Natural Locomotion Behaviors for Humanoid Robots Using Human Bias

This letter presents a new learning framework that leverages the knowledge from imitation learning, deep reinforcement learning, and control theories to achieve human-style locomotion that is natural, dynamic, and robust for humanoids. We proposed novel approaches to introduce human bias, i.e. motion capture data and a special Multi-Expert network structure. We used the Multi-Expert network structure to smoothly blend behavioral features, and used the augmented reward design for the task and imitation rewards. Our reward design is composable, tunable, and explainable by using fundamental concepts from conventional humanoid control. We rigorously validated and benchmarked the learning framework which consistently produced robust locomotion behaviors in various test scenarios. Further, we demonstrated the capability of learning robust and versatile policies in the presence of disturbances, such as terrain irregularities and external pushes.

[1]  Taku Komura,et al.  Learning Whole-Body Motor Skills for Humanoids , 2018, 2018 IEEE-RAS 18th International Conference on Humanoid Robots (Humanoids).

[2]  Rong Xiong,et al.  Humanoid Balancing Behavior Featured by Underactuated Foot Motion , 2017, IEEE Transactions on Robotics.

[3]  Sergey Levine,et al.  Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition , 2018, NeurIPS.

[4]  Taku Komura,et al.  Phase-functioned neural networks for character control , 2017, ACM Trans. Graph..

[5]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[6]  Vitaly Levdik,et al.  Time Limits in Reinforcement Learning , 2017, ICML.

[7]  Shin Ishii,et al.  Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.

[8]  John Schulman,et al.  Concrete Problems in AI Safety , 2016, ArXiv.

[9]  Jian Zhang,et al.  Structured Control Nets for Deep Reinforcement Learning , 2018, ICML.

[10]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[11]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[12]  Atil Iscen,et al.  Policies Modulating Trajectory Generators , 2018, CoRL.

[13]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[14]  Vitaly Levdik,et al.  Prioritizing Starting States for Reinforcement Learning , 2018, ArXiv.

[15]  Leslie Pack Kaelbling,et al.  Augmenting Physical Simulators with Stochastic Neural Networks: Case Study of Planar Pushing and Bouncing , 2018, 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16]  Sergey Levine,et al.  Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning , 2017, ICLR.

[17]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[18]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[19]  Ludovic Righetti,et al.  Programmable central pattern generators: an application to biped locomotion control , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[20]  Sylvain Calinon,et al.  A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials , 2018, IEEE Transactions on Robotics.

[21]  Nikolaos G. Tsagarakis,et al.  Walking trajectory generation for humanoid robots with compliant joints: Experimentation with COMAN humanoid , 2012, 2012 IEEE International Conference on Robotics and Automation.

[22]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[23]  Sergey Levine,et al.  Residual Reinforcement Learning for Robot Control , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[24]  Taku Komura,et al.  Emergence of human-comparable balancing behaviours by deep reinforcement learning , 2017, 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).

[25]  Joonho Lee,et al.  Learning agile and dynamic motor skills for legged robots , 2019, Science Robotics.

[26]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[27]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[28]  Nobutoshi Yamazaki,et al.  Generation of human bipedal locomotion by a bio-mimetic neuro-musculo-skeletal model , 2001, Biological Cybernetics.

[29]  Vitaly Levdik,et al.  Exploring Restart Distributions. , 2020 .