Learning to schedule control fragments for physics-based characters using deep Q-learning

Given a robust control system, physical simulation offers the potential for interactive human characters that move in realistic and responsive ways. In this article, we describe how to learn a scheduling scheme that reorders short control fragments as necessary at runtime to create a control system that can respond to disturbances and allows steering and other user interactions. These schedulers provide robust control of a wide range of highly dynamic behaviors, including walking on a ball, balancing on a bongo board, skateboarding, running, push-recovery, and breakdancing. We show that moderate-sized Q-networks can model the schedulers for these control tasks effectively and that those schedulers can be efficiently learned by the deep Q-learning algorithm.

[1]  Sehoon Ha,et al.  Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques , 2014, ACM Trans. Graph..

[2]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[3]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[4]  Jun Morimoto,et al.  Learning from demonstration and adaptation of biped locomotion , 2004, Robotics Auton. Syst..

[5]  Baining Guo,et al.  Simulation and control of skeleton-driven soft body characters , 2013, ACM Trans. Graph..

[6]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[7]  C. Karen Liu,et al.  Optimal feedback control for character animation using an abstract model , 2010, SIGGRAPH 2010.

[8]  C. Karen Liu,et al.  Controlling physics-based characters using soft contacts , 2011, ACM Trans. Graph..

[9]  Aaron Hertzmann,et al.  Active learning for real-time motion controllers , 2007, SIGGRAPH 2007.

[10]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[11]  Doina Precup,et al.  Intra-Option Learning about Temporally Abstract Actions , 1998, ICML.

[12]  Christopher G. Atkeson,et al.  Biped walking control using a trajectory library , 2013, Robotica.

[13]  Jehee Lee,et al.  Simulating biped behaviors from human motion data , 2007, SIGGRAPH 2007.

[14]  Taesoo Kwon,et al.  Control systems for human running using an inverted pendulum model and a reference motion capture sequence , 2010, SCA '10.

[15]  Baining Guo,et al.  Improving Sampling‐based Motion Control , 2015, Comput. Graph. Forum.

[16]  Stefan Schaal,et al.  Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation , 2012, IEEE Transactions on Robotics.

[17]  Zoran Popović,et al.  Contact-aware nonlinear control of dynamic characters , 2009, SIGGRAPH 2009.

[18]  Philippe Beaudoin,et al.  Generalized biped walking control , 2010, SIGGRAPH 2010.

[19]  Nicolas Pronost,et al.  Interactive Character Animation Using Simulated Physics: A State‐of‐the‐Art Review , 2012, Comput. Graph. Forum.

[20]  Jehee Lee,et al.  Data-driven biped control , 2010, SIGGRAPH 2010.

[21]  Victor B. Zordan,et al.  Momentum control for balance , 2009, SIGGRAPH 2009.

[22]  Aaron Hertzmann,et al.  Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.

[23]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[24]  M. V. D. Panne,et al.  SIMBICON: simple biped locomotion control , 2007, SIGGRAPH 2007.

[25]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[26]  Frédo Durand,et al.  Linear Bellman combination for control of character animation , 2009, SIGGRAPH 2009.

[27]  Aaron Hertzmann,et al.  Robust physics-based locomotion using low-dimensional planning , 2010, SIGGRAPH 2010.

[28]  Jovan Popovic,et al.  Simulating 2D Gaits with a Phase-Indexed Tracking Controller , 2011, IEEE Computer Graphics and Applications.

[29]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[30]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[31]  Tianjia Shao,et al.  Sampling-based contact-rich motion control , 2010, SIGGRAPH 2010.

[32]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[33]  Eugene Fiume,et al.  Feedback control for rotational movements in feature space , 2014, Comput. Graph. Forum.

[34]  Victor B. Zordan,et al.  Control of Rotational Dynamics for Ground and Aerial Behavior , 2014, IEEE Transactions on Visualization and Computer Graphics.

[35]  Adrien Treuille,et al.  Near-optimal character animation with continuous control , 2007, SIGGRAPH 2007.

[36]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[37]  C. Karen Liu,et al.  Stable Proportional-Derivative Controllers , 2011, IEEE Computer Graphics and Applications.

[38]  Philippe Beaudoin,et al.  Robust task-based control policies for physics-based characters , 2009, SIGGRAPH 2009.

[39]  Jessy W. Grizzle,et al.  Preliminary walking experiments with underactuated 3D bipedal robot MARLO , 2014, 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[40]  K. Wampler,et al.  Optimal gait and form for animal locomotion , 2009, SIGGRAPH 2009.

[41]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[42]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[43]  C. Karen Liu,et al.  Learning bicycle stunts , 2014, ACM Trans. Graph..

[44]  C. Karen Liu,et al.  Online control of simulated humanoids using particle belief propagation , 2015, ACM Trans. Graph..

[45]  Nancy S. Pollard,et al.  Responsive characters from motion fragments , 2007, SIGGRAPH 2007.

[46]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[47]  Taesoo Kwon,et al.  Momentum-Mapped Inverted Pendulum Models for Controlling Dynamic Human Motions , 2017, ACM Trans. Graph..

[48]  David J. Fleet,et al.  Optimizing walking controllers for uncertain inputs and environments , 2010, SIGGRAPH 2010.

[49]  Zoran Popović,et al.  Motion fields for interactive character locomotion , 2010, SIGGRAPH 2010.