Aerobatics control of flying creatures via self-regulated learning

Flying creatures in animated films often perform highly dynamic aerobatic maneuvers, which require their extreme of exercise capacity and skillful control. Designing physics-based controllers (a.k.a., control policies) for aerobatic maneuvers is very challenging because dynamic states remain in unstable equilibrium most of the time during aerobatics. Recently, Deep Reinforcement Learning (DRL) has shown its potential in constructing physics-based controllers. In this paper, we present a new concept, Self-Regulated Learning (SRL), which is combined with DRL to address the aerobatics control problem. The key idea of SRL is to allow the agent to take control over its own learning using an additional self-regulation policy. The policy allows the agent to regulate its goals according to the capability of the current control policy. The control and self-regulation policies are learned jointly along the progress of learning. Self-regulated learning can be viewed as building its own curriculum and seeking compromise on the goals. The effectiveness of our method is demonstrated with physically-simulated creatures performing aerobatic skills of sharp turning, rapid winding, rolling, soaring, and diving.

[1]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[2]  Zoran Popovic,et al.  Realistic modeling of bird flight animations , 2003, ACM Trans. Graph..

[3]  A. Karpathy,et al.  Locomotion skills for simulated quadrupeds , 2011, ACM Trans. Graph..

[4]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[5]  Sehoon Ha,et al.  Falling and landing motion control for character animation , 2012, ACM Trans. Graph..

[6]  Sergey Levine,et al.  Learning Robust Rewards with Adversarial Inverse Reinforcement Learning , 2017, ICLR 2017.

[7]  Jaakko Lehtinen,et al.  Online motion synthesis using sequential Monte Carlo , 2014, ACM Trans. Graph..

[8]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Emanuel Todorov,et al.  Combining the benefits of function approximation and trajectory optimization , 2014, Robotics: Science and Systems.

[10]  Glen Berseth,et al.  Terrain-adaptive locomotion skills using deep reinforcement learning , 2016, ACM Trans. Graph..

[11]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[12]  Jungdam Won,et al.  How to train your dragon , 2017, ACM Trans. Graph..

[13]  Markus H. Gross,et al.  Deformable objects alive! , 2012, ACM Trans. Graph..

[14]  C. Karen Liu,et al.  Learning symmetric and low-energy locomotion , 2018, ACM Trans. Graph..

[15]  Taesoo Kwon,et al.  Locomotion control for many-muscle humanoids , 2014, ACM Trans. Graph..

[16]  Siddhartha S. Srinivasa,et al.  DART: Dynamic Animation and Robotics Toolkit , 2018, J. Open Source Softw..

[17]  Nikolaus Hansen,et al.  Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[18]  Sehoon Ha,et al.  Iterative Training of Dynamic Skills Inspired by Human Coaching Techniques , 2014, ACM Trans. Graph..

[19]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[20]  Sungeun Kim,et al.  Data-driven biped control , 2010, ACM Trans. Graph..

[21]  Taesoo Kwon,et al.  Control systems for human running using an inverted pendulum model and a reference motion capture sequence , 2010, SCA '10.

[22]  Taku Komura,et al.  Mode-adaptive neural networks for quadruped motion control , 2018, ACM Trans. Graph..

[23]  S. Shankar Sastry,et al.  Autonomous Helicopter Flight via Reinforcement Learning , 2003, NIPS.

[24]  KangKang Yin,et al.  SIMBICON: simple biped locomotion control , 2007, ACM Trans. Graph..

[25]  Tong-Yee Lee,et al.  Real-Time Physics-Based 3D Biped Character Animation Using an Inverted Pendulum Model , 2010, IEEE Transactions on Visualization and Computer Graphics.

[26]  Aaron Hertzmann,et al.  Trajectory Optimization for Full-Body Movements with Complex Contacts , 2013, IEEE Transactions on Visualization and Computer Graphics.

[27]  Baining Guo,et al.  Terrain runner , 2012, ACM Trans. Graph..

[28]  Sergey Levine,et al.  Learning Complex Neural Network Policies with Trajectory Optimization , 2014, ICML.

[29]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[30]  Glen Berseth,et al.  DeepLoco , 2017, ACM Trans. Graph..

[31]  Jeanne Ellis Ormrod,et al.  Essentials of Educational Psychology , 2005 .

[32]  Sung Yong Shin,et al.  On‐line real‐time physics‐based predictive motion control with balance recovery , 2014, Comput. Graph. Forum.

[33]  C. K. Liu,et al.  Optimal feedback control for character animation using an abstract model , 2010, ACM Trans. Graph..

[34]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[35]  C. Karen Liu,et al.  Articulated swimming creatures , 2011, ACM Trans. Graph..

[36]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[37]  C. Karen Liu,et al.  Soft body locomotion , 2012, ACM Trans. Graph..

[38]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[39]  Jun-yong Noh,et al.  Data‐guided Model Predictive Control Based on Smoothed Contact Dynamics , 2016, Comput. Graph. Forum.

[40]  Martin de Lasa,et al.  Feature-based locomotion controllers , 2010, ACM Trans. Graph..

[41]  Vladlen Koltun,et al.  Optimizing locomotion controllers using biologically-based actuators and objectives , 2012, ACM Trans. Graph..

[42]  Sergey Levine,et al.  DeepMimic , 2018, ACM Trans. Graph..

[43]  Jernej Barbic,et al.  Deformable object animation using reduced optimal control , 2009, ACM Trans. Graph..

[44]  Geoffrey E. Hinton,et al.  NeuroAnimator: fast neural network emulation and control of physics-based models , 1998, SIGGRAPH.

[45]  Libin Liu,et al.  Guided Learning of Control Graphs for Physics-Based Characters , 2016, ACM Trans. Graph..

[46]  Jehee Lee,et al.  Simulating biped behaviors from human motion data , 2007, ACM Trans. Graph..

[47]  C. Karen Liu,et al.  Online control of simulated humanoids using particle belief propagation , 2015, ACM Trans. Graph..

[48]  A. Fischer Inverse Reinforcement Learning , 2012 .

[49]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[50]  Zoran Popovic,et al.  Discovery of complex behaviors through contact-invariant optimization , 2012, ACM Trans. Graph..

[51]  Sergey Levine,et al.  High-Dimensional Continuous Control Using Generalized Advantage Estimation , 2015, ICLR.

[52]  Demetri Terzopoulos,et al.  Artificial fishes: physics, locomotion, perception, behavior , 1994, SIGGRAPH.

[53]  David J. Fleet,et al.  Optimizing walking controllers for uncertain inputs and environments , 2010, ACM Trans. Graph..

[54]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[55]  Pieter Abbeel,et al.  Autonomous Helicopter Aerobatics through Apprenticeship Learning , 2010, Int. J. Robotics Res..

[56]  Jovan Popovic,et al.  Simulation of Human Motion Data using Short‐Horizon Model‐Predictive Control , 2008, Comput. Graph. Forum.

[57]  M. van de Panne,et al.  Generalized biped walking control , 2010, ACM Trans. Graph..

[58]  Marco da Silva,et al.  Interactive simulation of stylized human locomotion , 2008, ACM Trans. Graph..

[59]  Jun-yong Noh,et al.  Data-driven control of flapping flight , 2013, TOGS.

[60]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.