Terrain-adaptive locomotion skills using deep reinforcement learning

Reinforcement learning offers a promising methodology for developing skills for simulated characters, but typically requires working with sparse hand-crafted features. Building on recent progress in deep reinforcement learning (DeepRL), we introduce a mixture of actor-critic experts (MACE) approach that learns terrain-adaptive dynamic locomotion skills using high-dimensional state and terrain descriptions as input, and parameterized leaps or steps as output actions. MACE learns more quickly than a single actor-critic approach and results in actor-critic experts that exhibit specialization. Additional elements of our solution that contribute towards efficient learning include Boltzmann exploration and the use of initial actor biases to encourage specialization. Results are demonstrated for multiple planar characters and terrain classes.

[1]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[2]  David C. Brogan,et al.  Animating human athletics , 1995, SIGGRAPH.

[3]  Eugene Fiume,et al.  Limit cycle control and its application to the animation of balancing and walking , 1996, SIGGRAPH.

[4]  Geoffrey E. Hinton,et al.  NeuroAnimator: fast neural network emulation and control of physics-based models , 1998, SIGGRAPH.

[5]  Mitsuo Kawato,et al.  MOSAIC Model for Sensorimotor Learning and Control , 2001, Neural Computation.

[6]  Petros Faloutsos,et al.  Composable controllers for physics-based character animation , 2001, SIGGRAPH.

[7]  Mitsuo Kawato,et al.  Multiple Model-Based Reinforcement Learning , 2002, Neural Computation.

[8]  Jehee Lee,et al.  Precomputing avatar behavior from human motion data , 2004, SCA '04.

[9]  S. Vijayakumar,et al.  Competitive-Cooperative-Concurrent Reinforcement Learning with Importance Sampling , 2004 .

[10]  Nikolaus Hansen,et al.  The CMA Evolution Strategy: A Comparing Review , 2006, Towards a New Evolutionary Computation.

[11]  Z. Popovic,et al.  Near-optimal character animation with continuous control , 2007, ACM Trans. Graph..

[12]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[13]  Jehee Lee,et al.  Simulating biped behaviors from human motion data , 2007, SIGGRAPH 2007.

[14]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[15]  Kwang Won Sok,et al.  Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[16]  Roy Featherstone,et al.  Rigid Body Dynamics Algorithms , 2007 .

[17]  Philippe Beaudoin,et al.  Continuation methods for adapting simulated skills , 2008, ACM Trans. Graph..

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Marco da Silva,et al.  Interactive simulation of stylized human locomotion , 2008, ACM Trans. Graph..

[20]  Marco Wiering,et al.  Ensemble Algorithms in Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[21]  Philippe Beaudoin,et al.  Synthesis of constrained walking skills , 2008, SIGGRAPH Asia '08.

[22]  Victor Uc Cetina,et al.  Reinforcement learning in continuous state and action spaces , 2009 .

[23]  Philippe Beaudoin,et al.  Robust task-based control policies for physics-based characters , 2009, ACM Trans. Graph..

[24]  David J. Fleet,et al.  Optimizing walking controllers , 2009, ACM Trans. Graph..

[25]  Zoran Popovic,et al.  Compact character controllers , 2009, ACM Trans. Graph..

[26]  Frédo Durand,et al.  Linear Bellman combination for control of character animation , 2009, ACM Trans. Graph..

[27]  Zoran Popovic,et al.  Contact-aware nonlinear control of dynamic characters , 2009, ACM Trans. Graph..

[28]  M. van de Panne,et al.  Generalized biped walking control , 2010, ACM Trans. Graph..

[29]  Yoonsang Lee,et al.  Data-driven biped control , 2010, ACM Trans. Graph..

[30]  Martin de Lasa,et al.  Robust physics-based locomotion using low-dimensional planning , 2010, ACM Trans. Graph..

[31]  C. K. Liu,et al.  Optimal feedback control for character animation using an abstract model , 2010, ACM Trans. Graph..

[32]  A. Karpathy,et al.  Locomotion skills for simulated quadrupeds , 2011, SIGGRAPH 2011.

[33]  C. Karen Liu,et al.  Stable Proportional-Derivative Controllers , 2011, IEEE Computer Graphics and Applications.

[34]  Zoran Popovic,et al.  Composite control of physically simulated characters , 2011, TOGS.

[35]  Peter Stone,et al.  TEXPLORE: real-time sample-efficient reinforcement learning for robots , 2012, Machine Learning.

[36]  Sergey Levine,et al.  Continuous character control with low-dimensional embeddings , 2012, ACM Trans. Graph..

[37]  Stefan Schaal,et al.  Towards Associative Skill Memories , 2012, 2012 12th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2012).

[38]  Nicolas Pronost,et al.  Interactive Character Animation Using Simulated Physics: A State‐of‐the‐Art Review , 2012, Comput. Graph. Forum.

[39]  Darwin G. Caldwell,et al.  Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning , 2013, Robotics Auton. Syst..

[40]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[41]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[42]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[43]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[44]  Thomas B. Schön,et al.  Learning deep dynamical models from image pixels , 2014, ArXiv.

[45]  C. Karen Liu,et al.  Learning bicycle stunts , 2014, ACM Trans. Graph..

[46]  Sergey Levine,et al.  Learning Neural Network Policies with Guided Policy Search under Unknown Dynamics , 2014, NIPS.

[47]  Zoran Popovic,et al.  Motion fields for interactive character locomotion , 2010, CACM.

[48]  Sergey Levine,et al.  Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models , 2015, ArXiv.

[49]  Thomas B. Schön,et al.  Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models , 2015, ArXiv.

[50]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[51]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[52]  Glen Berseth,et al.  Dynamic terrain traversal skills using reinforcement learning , 2015, ACM Trans. Graph..

[53]  Shane Legg,et al.  Massively Parallel Methods for Deep Reinforcement Learning , 2015, ArXiv.

[54]  Zoran Popovic,et al.  Interactive Control of Diverse Complex Characters with Neural Networks , 2015, NIPS.

[55]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[56]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[57]  Razvan Pascanu,et al.  Policy Distillation , 2015, ICLR.

[58]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.

[59]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[60]  Peter Stone,et al.  Deep Reinforcement Learning in Parameterized Action Space , 2015, ICLR.

[61]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[62]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[63]  Glen Berseth,et al.  DeepLoco: dynamic locomotion skills using hierarchical deep reinforcement learning , 2017, ACM Trans. Graph..