Understanding the stability of deep control policies for biped locomotion

Achieving stability and robustness is the primary goal of biped locomotion control. Recently, deep reinforce learning (DRL) has attracted great attention as a general methodology for constructing biped control policies and demonstrated significant improvements over the previous state-of-the-art. Although deep control policies have advantages over previous controller design approaches, many questions remain unanswered. Are deep control policies as robust as human walking? Does simulated walking use similar strategies as human walking to maintain balance? Does a particular gait pattern similarly affect human and simulated walking? What do deep policies learn to achieve improved gait stability? The goal of this study is to answer these questions by evaluating the push-recovery stability of deep policies compared to human subjects and a previous feedback controller. We also conducted experiments to evaluate the effectiveness of variants of DRL algorithms.

[1]  C. K. Liu,et al.  Optimal feedback control for character animation using an abstract model , 2010, ACM Trans. Graph..

[2]  Tolga K. Çapin,et al.  Style-based biped walking control , 2018, The Visual Computer.

[3]  Aaron Hertzmann,et al.  Feature-based locomotion controllers , 2010, SIGGRAPH 2010.

[4]  K. H. Low,et al.  Robot-assisted gait rehabilitation: From exoskeletons to gait systems , 2011, 2011 Defense Science Research Conference and Expo (DSR).

[5]  Jehee Lee,et al.  Simulating biped behaviors from human motion data , 2007, SIGGRAPH 2007.

[6]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7]  Kyungmin Cho,et al.  Physics-based full-body soccer motion control for dribbling and shooting , 2019, ACM Trans. Graph..

[8]  R. C. Schafer,et al.  Clinical Biomechanics: Musculoskeletal Actions and Reactions , 1983 .

[9]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[10]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[11]  Michiel van de Panne,et al.  Learning locomotion skills using DeepRL: does the choice of action space matter? , 2016, Symposium on Computer Animation.

[12]  M. Woollacott,et al.  The interacting effects of cognitive demand and recovery of postural stability in balance-impaired elderly persons. , 2001, The journals of gerontology. Series A, Biological sciences and medical sciences.

[13]  Jungdam Won,et al.  Learning body shape variation in physics-based characters , 2019, ACM Trans. Graph..

[14]  Taesoo Kwon,et al.  Momentum-Mapped Inverted Pendulum Models for Controlling Dynamic Human Motions , 2017, ACM Trans. Graph..

[15]  Libin Liu,et al.  Learning basketball dribbling skills using trajectory optimization and deep reinforcement learning , 2018, ACM Trans. Graph..

[16]  David J. Fleet,et al.  Optimizing walking controllers for uncertain inputs and environments , 2010, SIGGRAPH 2010.

[17]  Daniel Holden,et al.  DReCon , 2019, ACM Trans. Graph..

[18]  Carol O'Sullivan,et al.  Push-recovery stability of biped locomotion , 2015, ACM Trans. Graph..

[19]  Michiel van de Panne,et al.  ALLSTEPS: Curriculum‐driven Learning of Stepping Stone Skills , 2020, Comput. Graph. Forum.

[20]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[21]  Ludovic Hoyet,et al.  Perceptual Evaluation of Motion Editing for Realistic Throwing Animations , 2014, TAP.

[22]  Baining Guo,et al.  Terrain runner , 2012, ACM Trans. Graph..

[23]  Dinesh Manocha,et al.  Active Animations of Reduced Deformable Models with Environment Interactions , 2017, ACM Trans. Graph..

[24]  Taesoo Kwon,et al.  Locomotion control for many-muscle humanoids , 2014, ACM Trans. Graph..

[25]  David C. Brogan,et al.  Animating human athletics , 1995, SIGGRAPH.

[26]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[27]  Yuval Tassa,et al.  Emergence of Locomotion Behaviours in Rich Environments , 2017, ArXiv.

[28]  Jovan Popovic,et al.  Simulation of Human Motion Data using Short‐Horizon Model‐Predictive Control , 2008, Comput. Graph. Forum.

[29]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[30]  David J. Fleet,et al.  Optimizing walking controllers for uncertain inputs and environments , 2010, ACM Trans. Graph..

[31]  Sunmin Lee,et al.  Learning predict-and-simulate policies from unorganized human motion data , 2019, ACM Trans. Graph..

[32]  Radu Constantinescu,et al.  Assistive devices for gait in Parkinson's disease. , 2007, Parkinsonism & related disorders.

[33]  C. Karen Liu,et al.  Optimal feedback control for character animation using an abstract model , 2010, SIGGRAPH 2010.

[34]  Kwang Won Sok,et al.  Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[35]  John McPhee,et al.  Foot Placement and Balance in 3D , 2012 .

[36]  Oussama Kanoun,et al.  Learned motion matching , 2020, ACM Trans. Graph..

[37]  Michiel van de Panne,et al.  Learning to Locomote: Understanding How Environment Design Matters for Deep Reinforcement Learning , 2020, MIG.

[38]  D. Sternad,et al.  Local dynamic stability versus kinematic variability of continuous overground and treadmill walking. , 2001, Journal of biomechanical engineering.

[39]  Kyoungmin Lee,et al.  Scalable muscle-actuated human simulation and control , 2019, ACM Trans. Graph..

[40]  Philippe Beaudoin,et al.  Generalized biped walking control , 2010, SIGGRAPH 2010.

[41]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[42]  Tong-Yee Lee,et al.  Real-Time Physics-Based 3D Biped Character Animation Using an Inverted Pendulum Model , 2010, IEEE Transactions on Visualization and Computer Graphics.

[43]  Taesoo Kwon,et al.  Control systems for human running using an inverted pendulum model and a reference motion capture sequence , 2010, SCA '10.

[44]  Tiantian Liu,et al.  Quasi-newton methods for real-time simulation of hyperelastic materials , 2017, TOGS.

[45]  Sung Yong Shin,et al.  A hierarchical approach to interactive motion editing for human-like figures , 1999, SIGGRAPH.

[46]  M W Rogers,et al.  Lateral stability during forward-induced stepping for dynamic balance recovery in young and older adults. , 2001, The journals of gerontology. Series A, Biological sciences and medical sciences.

[47]  Eric Kubica,et al.  Introduction of the Foot Placement Estimator: A Dynamic Measure of Balance for Bipedal Robotics , 2008 .

[48]  J. Hodgins,et al.  Learning to Schedule Control Fragments for Physics-Based Characters Using Deep Q-Learning , 2017 .

[49]  Aaron Hertzmann,et al.  Robust physics-based locomotion using low-dimensional planning , 2010, SIGGRAPH 2010.

[50]  Aaron Hertzmann,et al.  Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.

[51]  Hui Ma,et al.  Image Deblurring with Blurred / Noisy Image Pairs , 2013 .

[52]  Jehee Lee,et al.  Data-driven biped control , 2010, SIGGRAPH 2010.

[53]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[54]  M. V. D. Panne,et al.  SIMBICON: simple biped locomotion control , 2007, SIGGRAPH 2007.