Terrain Adaptive Walking of Biped Neuromuscular Virtual Human Using Deep Reinforcement Learning

There have been some biomechanics-based control systems that have achieved better realistic virtual human motion. Yet their abilities to adapt the changing environments are weaker than the traditional control systems with characters driven by proportional derivative actuators directly. In our method, we build a hierarchical neuromuscular virtual human (NMVH) motion control system that consists of a low-level spine reflex layer and a high-level policy control layer. The spine reflex layer uses a feedback net to map sensory information to excitations, which stimulate muscles to generate joint torques. The policy control layer includes a deep neural network, which provides a learned action policy to spine reflex layer for achieving terrain-adaptive motion skills. The particle swarm optimization algorithm is used to optimize the gain factors of the feedback net for finding out a basic policy to make the virtual human walk on the flat terrain autonomously. The proximal policy optimization algorithm is employed to train the deep neural network in policy control layer for learning how to modulate the actions to adapt to the changing terrain. The simulation results in Matlab show that virtual human can walk smoothly and better adapt to the given terrain changes. It demonstrates that our control system improves the terrain-adaptive walking skill of the neuromuscular virtual human.

[1]  Nicolas Pronost,et al.  Interactive Character Animation Using Simulated Physics: A State‐of‐the‐Art Review , 2012, Comput. Graph. Forum.

[2]  Hartmut Geyer,et al.  Muscle-reflex control of robust swing leg placement , 2013, 2013 IEEE International Conference on Robotics and Automation.

[3]  E. Bizzi,et al.  Article history: , 2005 .

[4]  S. Delp,et al.  Crouched postures reduce the capacity of muscles to extend the hip and knee during the single-limb stance phase of gait. , 2008, Journal of biomechanics.

[5]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[6]  William Zev Rymer,et al.  Effects of changes in hip joint angle on H-reflex excitability in humans , 2002, Experimental Brain Research.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  André Seyfarth,et al.  Leg Force Control Through Biarticular Muscles for Human Walking Assistance , 2018, Front. Neurorobot..

[9]  Michiel van de Panne,et al.  Flexible muscle-based locomotion for bipedal creatures , 2013, ACM Trans. Graph..

[10]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[11]  Seungmoon Song,et al.  A neural circuitry that emphasizes spinal feedback generates diverse behaviours of human locomotion , 2015, The Journal of physiology.

[12]  Reinhard Blickhan,et al.  Positive force feedback in bouncing gaits? , 2003, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[13]  Hartmut Geyer,et al.  Robust swing leg placement under large disturbances , 2012, 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO).

[14]  E. Pierrot-Deseilligny,et al.  Monosynaptic Ia excitation and recurrent inhibition from quadriceps to ankle flexors and extensors in man. , 1990, The Journal of physiology.

[15]  Christie K. Ferreira,et al.  Effect of epidural stimulation of the lumbosacral spinal cord on voluntary movement, standing, and assisted stepping after motor complete paraplegia: a case study , 2011, The Lancet.

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Honglak Lee,et al.  Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning , 2014, NIPS.

[18]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[19]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[20]  Hartmut Geyer,et al.  A Muscle-Reflex Model That Encodes Principles of Legged Mechanics Produces Human Walking Dynamics and Muscle Activities , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[21]  David J. Fleet,et al.  Optimizing walking controllers , 2009, ACM Trans. Graph..

[22]  Nitish Thatte,et al.  Toward Balance Recovery With Leg Prostheses Using Neuromuscular Model Control , 2016, IEEE Transactions on Biomedical Engineering.

[23]  O. Schmitt The heat of shortening and the dynamic constants of muscle , 2017 .

[24]  R. Blickhan,et al.  Can Quick Release Experiments Reveal the Muscle Structure? A Bionic Approach , 2012 .

[25]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[26]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[27]  F.E. Zajac,et al.  An interactive graphics-based model of the lower extremity to study orthopaedic surgical procedures , 1990, IEEE Transactions on Biomedical Engineering.

[28]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[29]  M. V. D. Panne,et al.  SIMBICON: simple biped locomotion control , 2007, SIGGRAPH 2007.

[30]  Steve Berger,et al.  Energy consumption optimization and stumbling corrective response for bipedal walking gait , 2011 .

[31]  W. Rymer,et al.  Effects of changes in hip joint angle on H-reflex excitability in humans , 2002, Experimental Brain Research.

[32]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[33]  Koushil Sreenath,et al.  Dynamic Walking on Randomly-Varying Discrete Terrain with One-step Preview , 2017, Robotics: Science and Systems.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Thomas Sinkjær,et al.  Group II muscle afferents probably contribute to the medium latency soleus stretch reflex during walking in humans , 2001, The Journal of physiology.

[36]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.